Optimizing workflows for TRN construction

Customizing the differential analysis of omics data

Key data structures explained

A key component of the package is the data structures it provides. The following are a list of the key S4 classes:

MOList: An MOList object is singly responsible for storing all the omics data and the results of each step of the analysis. While the raw data of each omic type are stored internally in the slots, all analysis results are kept as list elements that allow easy subsetting and extraction. By exploiting this feature, we can customize our analysis by extracting and manipulating the results of the analysis.
DETag: A DETag object is the core S4 class that stores the results of the differential expression analysis. It provides the users with a composite structure for exploring the results of the analysis.
TOPTag: A TOPTag object is a subclass of the DETag object. Unlike the DETag object, the TOPtag object further limits the results to a subset of genes that we are interested in. It provides an user-friendly interface for filtering and ranking top differentially expressed genes/transcripts.
PEAKTag: Similar to TOPTag, a PEAKTag object is also a subclass of the DETag object. However, it is specifically designed for handling peak annotation and motif enrichment analysis.
TRNet: Finally, as the final component of the traditional workflow, a TRNet object is the core S4 class that stores the final TRN. Although itself operates as a data storage object, it can be easily integrated with any third-party tools for further topological analysis or customized network visualization.

Let us begin with the first step of the analysis pipeline, which is to perform differential analysis on each omics data separately. We begin by exploring how these data structures facilitate flexible and detailed analysis of the results.

Exploring the results of the differential analysis

We will first construct a MOList object using the data we have prepared. We consider the age of the samples as the grouping criteria for the differential analysis.

myMOList <- MOList(
  RNAseq = RNAseq_heart,
  RNAGroupBy = RNAseq_heart_samples$Age,
  smallRNAseq = smallRNAseq_heart,
  smallRNAGroupBy = smallRNAseq_heart_samples$Age,
  proteomics = protein_heart,
  proteomicsGroupBy = protein_heart_samples$Age,
  ATACpeak1 = atacPeak1Path,
  ATACpeak2 = atacPeak2Path
)
#> RNAseq:       RNAseq data has been added.
#> Small RNAseq: Small RNAseq data has been added.
#> Proteomics:   Proteomics data has been added.
#> ATACseq:      ATAC peaks data has been added.

With the package’s one-step differential analysis function, we can easily explore the results of the differential analysis.

myMOList <- diffOmics(myMOList, program = "edgeR")
#> Performing differential analysis on RNAseq data
#> 
#> Performing differential analysis on smallRNAseq data
#> 
#> Performing differential analysis on proteomics data
#> 
#> Processing peaks...

In the IntegraTRN package, several visualization methods are provided to explore the results of the differential analysis. These visualization methods are highly efficient in generating an overview of the results, allowing users to quickly identify the key changes that happen during the biological process. However, as biologists who generated the data, we may be interested in some particular aspects of the results. Therefore, we can further customize our experience by performing easy extraction of the results.

To obtain the DETag object for a particular omics data, we can simply subset the MOList object using the name of the omics data. For example, to retrieve differential analysis result of the RNAseq data:

myRNADETag <- myMOList$DERNAseq

would work just fine. The DETag class has implemented several methods for users to easily extract the results. For example, we can directly export the results as a data frame using the exportDE function.

exportDE(myRNADETag) %>%
    head(5)
#>              logFC       pvalue         padj
#> SLC35F1  0.3301368 1.435088e-30 7.175439e-27
#> HMGA2   -0.3862208 1.161249e-29 2.463844e-26
#> CDH11   -0.2866885 1.478307e-29 2.463844e-26
#> FHL2     0.2719520 1.650073e-28 2.062591e-25
#> JAM2     0.1827767 4.298455e-27 3.991485e-24

Furthermore, in certain conditions, such as generating a heatmap, we may want to explore only the differentially expressed genes as well as a normalized count matrix for visualization purposes. Besides subsetting the data frame returned by the exportDE function, we can also further utilize the functionality of its subclass, the TOPTag object. As the name suggests, the TOP part implies that we could easily perform filtering to obtain what we consider to be the “top” differentially expressed genes. For example, we can retrieve the top 20 differentially expressed genes with any fold change, with an adjusted p-value less than 0.05 considered to be significant.

myRNATOPTag <- TOPTag(myRNADETag,
  pCutoff = 0.05,
  logFCCutoff = 0,
  topGenes = 20,
  direction = "both"
)

Similarly, we can also export the results as a data frame. Furthermore, we can see that a ranking based on piValue is also provided.

exportDE(myRNATOPTag) %>%
    head(5)
#>             logFC       pvalue         padj  piValue rank
#> SLC35F1 0.3301368 1.435088e-30 7.175439e-27 26.14415    1
#> FHL2    0.2719520 1.650073e-28 2.062591e-25 24.68559    2
#> JAM2    0.1827767 4.298455e-27 3.991485e-24 23.39887    3
#> LDHD    0.2818013 1.218915e-26 7.475262e-24 23.12637    4
#> RD3L    0.4080196 1.495052e-26 7.475262e-24 23.12637    5

In case where we want to obtain a normalized count matrix for visualization, such as when plotting a heatmap, we can also do so by a simple function call.

topRNANormCounts <- exportNormalizedCounts(myRNATOPTag)

The piValue here ranks the genes based on the adjusted p-value and the fold change, which is a standard ranking for rank-based enrichment analysis. The piValue is calculated as follows:

\[piValue = -log_{10}(padj) * sign(logFC)\]

where \(padj\) is the adjusted p-value and \(logFC\) is the log fold change.

We can then make a heatmap using the ComplexHeatmap package.

heatmapMatrix <- topRNANormCounts %>%
  as.matrix() %>%
  t() %>%
  scale() %>%
  t()
exprRange <- range(heatmapMatrix)
col <- circlize::colorRamp2(
  c(exprRange[1], 0, exprRange[2]),
  c("blue", "white", "red")
)
ComplexHeatmap::Heatmap(heatmapMatrix,
  col = col,
  show_column_names = FALSE
)

Figure 1. Heatmap of top 20 differentially expressed genes

Similarly, we can use the list of selected genes to perform gene set enrichment analysis (GSEA) (Reimand et al. 2019). While a rank based GSEA analysis utilizes all available genes, we can again utilize the TOPTag constructor to perform ranking while not filtering any genes. This can be done by setting a non-selective cutoff. Here, a topGenes = 1 means that we are selecting all genes that satisfy the cutoffs.

rankedResult <- TOPTag(myRNADETag,
  pCutoff = 1,
  logFCCutoff = 0,
  topGenes = 1,
  direction = "both"
) %>% exportDE()

This can then be used by the users as import to the GSEA analysis.

Alternative differential analysis methods

For most highly controlled experiments, the differential analysis between two conditions is fairly straightforward. However, in certain cases, especially for patient-derived samples, a much higher biological variability may be observed. How could we handle such variability? With the DETag class and the list behavior of the MOList object, this has been made possible.

In the following section, we will explore an alternative differential analysis with multiple covariates. However, any alternative differential expression analysis can be performed at the user’s discretion, and the results can be easily integrated into the MOList object.

In the case of a population-based cohort, many attributes could be contributing to the biological variability. In our example, we can see that the following attributes are available:

colnames(RNAseq_heart_samples)
#> [1] "Age"   "Sex"   "Batch"

In this case, in the differential expression analysis, we can adjust for both sex and the batch effect. Again, we use the edgeR package for the differential expression analysis.

# Create the edgeR DGEList object
dgeList <- edgeR::DGEList(
  counts = RNAseq_heart,
  group = RNAseq_heart_samples$Age
)
# Filter lowly expressed genes
keep <- edgeR::filterByExpr(dgeList)
dgeList <- dgeList[keep, ]
# Perform differential expression analysis
design <- model.matrix(~ RNAseq_heart_samples$Age +
  RNAseq_heart_samples$Sex +
  RNAseq_heart_samples$Batch)
dgeList <- edgeR::calcNormFactors(dgeList, method = "TMM") # or a compatible method
dgeList <- edgeR::estimateDisp(dgeList, design)
fit <- edgeR::glmQLFit(dgeList, design) # or other suitable models
qlf <- edgeR::glmQLFTest(fit, coef = ncol(RNAseq_heart_samples))

With complex designs, the performance of user-defined differential expression analysis may be better than the one-step differential expression analysis provided by the IntegraTRN package. As we have obtained the results of the differential expression analysis, integrating the results back into the MOList object is fairly straightforward.

To do so, we need to construct a DETag object. The core component of the DETag object is a result data frame, as the original output of edgeR or DESeq2. Additionally is a normalized count matrix.

# Obtaining the DE results and normalized count
deResult <- edgeR::topTags(qlf, n = nrow(dgeList)) %>%
    as.data.frame()
normCounts <- edgeR::cpm(dgeList) %>%
    as.matrix()

# Construct the DETag object
myNewRNADETag <- DETag(DEResult = deResult, method = "edgeR", normalizedCounts = normCounts)

With the DETag object, we can then integrate the results back into the MOList object.

myMOList$DERNAseq <- myNewRNADETag

This would further allow us to perform downstream analysis using the customized differential expression results.

For the purpose of this specific example, we will not replace the original analysis results.

While we have demonstrated the flexibility of the IntegraTRN package in case of the RNAseq data, other omics data can be handled in the same way. With these flexibility, we can not only explore the differential analysis results in a more detailed manner, but also customize the analysis that are specific to our data and biological question of interest.

TRN construction with different data availability

Once we have optimize the first part of the analysis pipeline, we can then move on to the second part, which is to construct the TRN. In this section, we will discuss how the workflow is impacted by the availability of the omics data, and how users can make use of their best judgement select the most appropriate workflow.

Overview of multi-omics network construction

In a simplistic way, network construction by integrating multiple omics data is a process of intersecting the regulatory interactions to determine the ones that are biologically relevant to the condition of interest. Having more omics data provided, the analytical workflow of the IntegraTRN package could identify the most relevant regulatory interactions with much higher confidence. However, the availability of the omics data is not always guaranteed. Therefore, the package take different approaches to handle different scenarios.

In the following sections, we will discuss the contribution of each data type to the TRN construction, and how the workflow make use of the data to construct the TRN. Here, the package focuses on creating a transcription factor - small RNA - mRNA regulatory network. The following sections describe the role of each omics data in the TRN.

Data preparation for TRN construction

Before we begin, to more properly demonstrate the TRN construction workflow, we will downsample the data to reduce the computational time. Here, we randomly select 500 genes from each of the omics data.

set.seed(91711)
# Count-based data
RNAseq_heart <- RNAseq_heart[sample(nrow(RNAseq_heart), 500), ]
smallRNAseq_heart <- smallRNAseq_heart[sample(nrow(smallRNAseq_heart), 500), ]
protein_heart <- protein_heart[sample(nrow(protein_heart), 500), ]
# Chromatin accessibility data
peak1 <- read.table(atacPeak1Path, sep = "\t", header = FALSE)
peak2 <- read.table(atacPeak2Path, sep = "\t", header = FALSE)
peak1 <- peak1[sample(nrow(peak1), 500), ]
peak2 <- peak2[sample(nrow(peak2), 500), ]
set.seed(NULL)

We will then construct a MOList object using the downsampled data and perform differential analysis on each omics data.

myMOList <- MOList(
  RNAseq = RNAseq_heart,
  RNAGroupBy = RNAseq_heart_samples$Age,
  smallRNAseq = smallRNAseq_heart,
  smallRNAGroupBy = smallRNAseq_heart_samples$Age,
  proteomics = protein_heart,
  proteomicsGroupBy = protein_heart_samples$Age,
  ATACpeak1 = peak1,
  ATACpeak2 = peak2
)
#> RNAseq:       RNAseq data has been added.
#> Small RNAseq: Small RNAseq data has been added.
#> Proteomics:   Proteomics data has been added.
#> ATACseq:      ATAC peaks data has been added.

myMOList <- diffOmics(myMOList)
#> Performing differential analysis on RNAseq data
#> 
#> Performing differential analysis on smallRNAseq data
#> 
#> Performing differential analysis on proteomics data
#> 
#> Processing peaks...

We will then begin discussing TRN construction with each component of the network starting from the targets of the regulatory interactions.

Gene Expression

The RNAseq data is the most commonly available omics data. In the case of a transcriptional regulatory network, the RNAseq data is the most direct representation of the transcriptional activity. Therefore, the network employs differentially expressed genes (DEGs) as the regulatory targets of the TRN.

When setting the cutoffs for constructing the TRN, there are several cutoffs to consider. These include the fold change, the adjusted p-value, the top number of genes, and the direction of the change.

setOmicCutoffs(
  rnaAdjPval = 0.05,
  rnaLogFC = 0.1,
  rnaTopGenes = 200
)
#> RNAseq adjusted p-value cutoff: < 0.05 
#> RNAseq log fold change cutoff: > 0.1 
#> RNAseq selecting top 200 DE genes 
#> smallRNAseq adjusted p-value cutoff: < 0.05 
#> smallRNAseq log fold change cutoff: > 0 
#> smallRNAseq selecting top 100 % of DE genes 
#> Proteomics adjusted p-value cutoff: < 0.05 
#> Proteomics log fold change cutoff: > 0 
#> ATACseq motif adjusted p-value cutoff: < 0.05 
#> ATACseq motif log fold enrichment cutoff: > 0

Here, we specifically set the cutoffs for the RNAseq data. While the setOmicCutoffs provides default cutoffs for other omics data, these cutoffs are only considered if the selected data types are available.

These parameters are directly correlated to the size of the network, and the which genes to be included defines the primary biological scope of the network.

We will demonstrate the impact of these cutoffs on the TRN construction once we start to discuss regulators of the network.

Protein Expression

Similar to the RNAseq data, the proteomics data is also a direct representation of the transcriptional activity. However, proteomics provides a much more biologically relevant representation of the transcriptional activity, since the protein level remains a downstream product of the transcriptional and post-transcriptional regulation.

Therefore, the proteomics data are used to further filter the DEGs obtained from the RNAseq data to extract the phenotypically relevant genes. However, an interesting limitation of proteomics experiments is that protein coverage is heavily affected by the sample quality as well as the technology used to perform the experiment. Therefore, during prior exploratory analysis, we may fine-tune the cutoffs for the proteomics data to obtain the most relevant target gene set while minimizing the loss of information due to technical limitations.

For example, when protein coverage is quite high, we could employ a regular cutoff. Here, we limit the network to include only target genes that shows consistent expression changes in both the transcriptomic and proteomic level.

setOmicCutoffs(
  rnaAdjPval = 0.05,
  rnaLogFC = 0.1,
  rnaTopGenes = 200,
  proteomicsAdjPval = 0.05,
  proteomicsLogFC = 0
)
#> RNAseq adjusted p-value cutoff: < 0.05 
#> RNAseq log fold change cutoff: > 0.1 
#> RNAseq selecting top 200 DE genes 
#> smallRNAseq adjusted p-value cutoff: < 0.05 
#> smallRNAseq log fold change cutoff: > 0 
#> smallRNAseq selecting top 100 % of DE genes 
#> Proteomics adjusted p-value cutoff: < 0.05 
#> Proteomics log fold change cutoff: > 0 
#> ATACseq motif adjusted p-value cutoff: < 0.05 
#> ATACseq motif log fold enrichment cutoff: > 0

However, what if our proteomics experiment did not went very well? Are those data still useful? The answer depends on the biological question of interest. If we have identified our proteins of interest in the proteomics data, we can still construct the network using both the RNAseq and proteomics data. However, the low coverage may negatively impact the differential analysis. We can either customize the differential analysis to be more sensitive to the proteomics data, or we can select genes that show consistent direction of change with the proteins.

setOmicCutoffs(
  rnaAdjPval = 0.05,
  rnaLogFC = 0.1,
  rnaTopGenes = 200,
  proteomicsAdjPval = 1,
  proteomicsLogFC = 0
)
#> RNAseq adjusted p-value cutoff: < 0.05 
#> RNAseq log fold change cutoff: > 0.1 
#> RNAseq selecting top 200 DE genes 
#> smallRNAseq adjusted p-value cutoff: < 0.05 
#> smallRNAseq log fold change cutoff: > 0 
#> smallRNAseq selecting top 100 % of DE genes 
#> Proteomics adjusted p-value cutoff: < 1 
#> Proteomics log fold change cutoff: > 0 
#> ATACseq motif adjusted p-value cutoff: < 0.05 
#> ATACseq motif log fold enrichment cutoff: > 0

For integrative use of the proteomics and RNAseq data, we need to make sure that a correct conversion between the protein IDs and the gene IDs are available. This can be done by providing a data frame that contains the conversion table.

myMOList <- setGene2Protein(myMOList, proteinGeneIDConvert)

Setting the conversion can be done any time before the TRN construction.

Since the TRN is a TF - small RNA - mRNA network, having only the mRNA targets are unable to obtain any valid regulatory relationships. Therefore, the TRN construction requires at least one of the other data types to be available.

Small RNA Expression

One of the key post-transcriptional regulatory mechanisms is the small RNA regulation. In particular, microRNAs (miRNAs) are the most well-studied small RNAs. Their regulatory mechanism is well understood, and they are known to be negatively regulating the expression of their target genes by facilitating degradation of the target mRNA or inhibiting the translation of the target mRNA.

Small RNA data can come in two forms: small RNAseq data or miRNA-target interactions that are curated in third-party databases.

Small RNAseq

The benefit of small RNAseq data is that it provides a direct representation of the small RNA expression, which also cover multiple types of small RNAs. Most of them, except for miRNAs, are not well understood. However, they represent in very high abundance in the cell, and may play a role in the regulation of the transcriptome. Therefore, the small RNAseq data relies on predicted inference of the small RNA - mRNA interactions. Using the normalized count matrix of the RNAseq and the small RNAseq data, the package performs optimal pair matching between the two datasets and utilize a random-forest based classifier to predict and select the top small RNA - mRNA interactions based on their expression patterns.

With small RNAseq data, we can leverage all these tiny details to the package.

# Annotate the small RNAs
myMOList <- annotateSmallRNA(myMOList, anno = "human")

# Perform optimal matching between the RNAseq and small RNAseq data
myMOList <- matchSamplesRNAsmallRNA(myMOList,
  sampleDFRNAseq = RNAseq_heart_samples,
  sampleDFSmallRNAseq = smallRNAseq_heart_samples
)

# Define a cutoff for RNAseq and small RNAseq
setOmicCutoffs(
  rnaAdjPval = 0.05,
  rnaLogFC = 0,
  rnaTopGenes = 0.2,
  smallRNAAdjPval = 0.05,
  smallRNALogFC = 0,
  smallRNATopGenes = 0.2
)
#> RNAseq adjusted p-value cutoff: < 0.05 
#> RNAseq log fold change cutoff: > 0 
#> RNAseq selecting top 20 % of DE genes 
#> smallRNAseq adjusted p-value cutoff: < 0.05 
#> smallRNAseq log fold change cutoff: > 0 
#> smallRNAseq selecting top 20 % of DE genes 
#> Proteomics adjusted p-value cutoff: < 0.05 
#> Proteomics log fold change cutoff: > 0 
#> ATACseq motif adjusted p-value cutoff: < 0.05 
#> ATACseq motif log fold enrichment cutoff: > 0

miRNA-target interactions

On the other hand, miRNA-target interactions are curated from third-party databases. Since miRNAs have been well studied, the curated interactions has a broad coverage of the miRNA - mRNA interactions that have been experimentally validated. However, these interactions are not necessarily relevant to the biological process of interest, or not even related to the tissue/cell type of interest. Therefore, by providing the miRNA-target interactions, the package utilizes the differential expression analysis results and predicted miRNA expression to select the most relevant miRNA - mRNA interactions.

While fetching externally curated miRNA-target interactions is not a direct functionality of the package, it is straightforward as part of the pipeline. With the differential expression results, we can export a list of names of the target genes of interest, which are usually the inputs of most database searches.

geneList <- exportDE(myMOList$DERNAseq) %>%
    rownames()
geneList[1:12]
#>  [1] "CTSH"     "CLMP"     "ZNF491"   "SORL1"    "DOCK9"    "TMEM167B"
#>  [7] "BOD1"     "AFTPH"    "BTBD19"   "RPGR"     "ZNF222"   "NOC2L"

Several databases are available for fetching miRNA-target interactions. For example, miRNet (Chang et al. 2020) provides a comprehensive list of miRNA-target interactions with simply fetching using a list of target gene names. Similarly, additional databases such as miRDB (Chen and Wang 2020) can also be searched, where users can integrate the results from multiple databases to obtain a more comprehensive, experimentally validated list of miRNA-target interactions.

Alternatively, with a similar approach, we can also fetch the miRNA-target using differentially expressed miRNAs. This can be done by exporting the differentially expressed miRNAs and use them as input for the database search.

smallRNAs <- getDESmallRNA(myMOList,
  padj = 0.05,
  log2fc = 0,
  type = "miRNA"
)
head(smallRNAs, 5)
#>                     logFC       pvalue         padj  piValue rank
#> hsa-mir-145     0.2215842 9.707373e-25 1.932544e-22 21.71387    1
#> hsa-miR-499a-5p 0.1936384 1.285506e-24 1.932544e-22 21.71387    2
#> hsa-miR-30d-3p  0.2098948 5.795504e-23 5.227545e-21 20.28170    3
#> hsa-miR-101-3p  0.3008562 1.988497e-22 1.494687e-20 19.82545    4
#> hsa-miR-373-3p  0.2931234 5.555387e-18 2.505480e-16 15.60111    5
#>                 transcriptType
#> hsa-mir-145              miRNA
#> hsa-miR-499a-5p          miRNA
#> hsa-miR-30d-3p           miRNA
#> hsa-miR-101-3p           miRNA
#> hsa-miR-373-3p           miRNA

As an example, we use the multiMiR package (Ru et al. 2014) to fetch the miRNA-target interactions from multiple databases, namely miRTarBase (Huang et al. 2020), miRecords (Xiao et al. 2009), and TarBase (Sethupathy, Corda, and Hatzigeorgiou 2006).

mir2geneMultiMiR <- multiMiR::get_multimir(
  org = "hsa",
  mirna = rownames(smallRNAs),
  table = "validated",
  summary = FALSE
)
mir2geneMultiMiR <- mir2geneMultiMiR@data

To avoid unnecessary database query, here we load an example of the results, where we randomly selected 500 interactions from the multiMiR results.

data("mir2geneMultiMiR")

Since in the example data we used the hsa nomenclature for small RNAs and gene symbols for the target genes, we can extract only the information relevant to us.

mir2geneMultiMiR <- mir2geneMultiMiR %>%
  dplyr::select(mature_mirna_id, target_symbol) %>%
  dplyr::rename(
    regulator = mature_mirna_id,
    target = target_symbol
  )

We can then combine the regulatory interactions identified from the target gene centric approach and the miRNA centric approach.

# Combine the two sets of interactions
miR2Genes <- as.data.frame(miR2Genes)
miR2Genes <- rbind(miR2Genes, mir2geneMultiMiR)

# Remove any potential duplications across databases
miR2Genes <- miR2Genes %>%
  dplyr::distinct() %>%
  as.list()

After obtaining a comprehensive list of miRNA-target interactions, we can then use the list to construct the TRN. To do so, we need to load the miRNA-target interactions into the MOList object.

myMOList <- loadExtInteractions(myMOList,
  miR2Genes = miR2Genes
)

TRN construction with different levels of biological scope

As we have discussed earlier, the cutoffs for the RNAseq data as well as whether to select genes that show consistent expression changes in the transcriptomic and proteomic level are important factors that define the primary biological scope of the network. With the small RNA regulators available, we can visualize the impact of these cutoffs on the TRN construction.

Let us define a less stringent cutoff first:

omiCutoffs <- setOmicCutoffs(
  rnaAdjPval = 0.05,
  rnaLogFC = 0,
  rnaTopGenes = 0.5,
  smallRNAAdjPval = 0.05,
  smallRNALogFC = 0,
  smallRNATopGenes = 0.5
)

Let’s construct the TRN without considering the protein expression.

Without protein, the network we obtained is fairly large and complicated to interpret.

plotNetwork(myTRN1, interactive = TRUE)

Figure 2. TRN constructed with without considering protein expression

Alternatively, we can enforce the network to only include genes that show consistent expression changes in both the transcriptomic and proteomic level.

omiCutoffs <- setOmicCutoffs(
  rnaAdjPval = 0.05,
  rnaLogFC = 0.1,
  rnaTopGenes = 0.5,
  smallRNAAdjPval = 0.05,
  smallRNALogFC = 0,
  smallRNATopGenes = 0.5,
  proteomicsAdjPval = 0.05,
  proteomicsLogFC = 0
)

The network we obtained is much more stringent in explaining what are the most important regulations that directly explains the phenotypic changes between the biological conditions of interests.

Let’s see the network.

plotNetwork(myTRN2, interactive = TRUE)

Figure 3. TRN constructed with coherent protein expression

Besides filtering the target genes, we can also limit our scope to the most important miRNA - target gene regulation. Since the biological function of miRNA are known, we could infer further biological insights from a miRNA-target gene centric network.

Let’s see the network.

plotNetwork(myTRN3, interactive = TRUE)

Figure 4. miRNA-target gene TRN constructed with coherent protein expression

Interestingly, we can easily identify modules at this level of biological scope.

In the end, we can export the network into a third-party tool for further topological analysis or customized network visualization.

writeTRN(myTRN3, "myTRN3.graphml", format = "graphml") # Cytoscape format

Chromatin Accessibility and Transcription Factor Binding

The chromatin accessibility data is a direct representation of the transcription factor (TF) binding. Therefore, the chromatin accessibility data is used to further construct the TRN by selecting the TFs that are found to be differentially enriched across the conditions of interest.

ATACseq data is used in conjunction with the TF-target interactions to construct the TRN. Similar to the curated miRNA-target interactions, curated the TF-target interactions also face limitations in terms of relevance to the biological process of interest. Therefore, the package utilizes the differential accessibility analysis with motif enrichment analysis to select the most relevant TF - target interactions.

To do so, we need to load the TF-target interactions into the MOList object.

myMOList <- loadExtInteractions(myMOList,
  tf2Genes = tf2Genes
)

The logic behind integrating ATACseq with TF-target interactions is similar to that of integrating small RNAseq with miRNA-target interactions. However, instead of using tree-based classifier to predict the regulatory interactions between RNAseq and small RNAseq, ATACseq data alone is sufficient to identify enriched transcription factor motifs. Therefore, with the background of the curated TF-target interactions, we can isolate the differentially enriched motifs from the biological conditions and extract regulatory interactions that are relevant to the biological process of interest.

Summary of the TRN construction workflow

In general, the construction of the TRN is a process of intersecting the regulatory interactions that are determined to be biologically relevant to the condition of interest. The workflow of the IntegraTRN package is designed to be flexible to accommodate different availability of the omics data. However, with the ease of obtaining externally curated interactions, it is highly recommended that users utilize public databases to obtain a comprehensive list of interactions to establish a global context. Then, the package can be used to select the most relevant interactions to the biological process of interest.

References

Aharon-Yariv, Adar, Yaxu Wang, Abdalla Ahmed, and Paul Delgado-Olguı́n. 2023. “Integrated Small RNA, mRNA and Protein Omics Reveal a miRNA Network Orchestrating Metabolic Maturation of the Developing Human Heart.” BMC Genomics 24 (1): 1–18.

Chang, Le, Guangyan Zhou, Othman Soufan, and Jianguo Xia. 2020. “miRNet 2.0: Network-Based Visual Analytics for miRNA Functional Analysis and Systems Biology.” Nucleic Acids Research 48 (W1): W244–51.

Chen, Yuhao, and Xiaowei Wang. 2020. “miRDB: An Online Database for Prediction of Functional microRNA Targets.” Nucleic Acids Research 48 (D1): D127–31.

Huang, Hsi-Yuan, Yang-Chi-Dung Lin, Jing Li, Kai-Yao Huang, Sirjana Shrestha, Hsiao-Chin Hong, Yun Tang, et al. 2020. “miRTarBase 2020: Updates to the Experimentally Validated microRNA–Target Interaction Database.” Nucleic Acids Research 48 (D1): D148–54.

Reimand, Jüri, Ruth Isserlin, Veronique Voisin, Mike Kucera, Christian Tannus-Lopes, Asha Rostamianfar, Lina Wadi, et al. 2019. “Pathway Enrichment Analysis and Visualization of Omics Data Using g: Profiler, GSEA, Cytoscape and EnrichmentMap.” Nature Protocols 14 (2): 482–517.

Ru, Yuanbin, Katerina J Kechris, Boris Tabakoff, Paula Hoffman, Richard A Radcliffe, Russell Bowler, Spencer Mahaffey, et al. 2014. “The multiMiR r Package and Database: Integration of microRNA–Target Interactions Along with Their Disease and Drug Associations.” Nucleic Acids Research 42 (17): e133–33.

Sethupathy, Praveen, Benoit Corda, and Artemis G Hatzigeorgiou. 2006. “TarBase: A Comprehensive Database of Experimentally Supported Animal microRNA Targets.” Rna 12 (2): 192–97.

Villanueva, Randle Aaron M, and Zhuo Job Chen. 2019. “Ggplot2: Elegant Graphics for Data Analysis.” Taylor & Francis.

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Xiao, Feifei, Zhixiang Zuo, Guoshuai Cai, Shuli Kang, Xiaolian Gao, and Tongbin Li. 2009. “miRecords: An Integrated Resource for microRNA–Target Interactions.” Nucleic Acids Research 37 (suppl1): D105–10.

Session information

sessionInfo("IntegraTRN")
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22621)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                    LC_CTYPE=English_Canada.utf8   
#> [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
#> [5] LC_TIME=English_Canada.utf8    
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> character(0)
#> 
#> other attached packages:
#> [1] IntegraTRN_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>   [1] fs_1.6.3                               
#>   [2] matrixStats_1.0.0                      
#>   [3] bitops_1.0-7                           
#>   [4] enrichplot_1.22.0                      
#>   [5] DirichletMultinomial_1.44.0            
#>   [6] devtools_2.4.5                         
#>   [7] TFBSTools_1.40.0                       
#>   [8] HDO.db_0.99.1                          
#>   [9] httr_1.4.7                             
#>  [10] RColorBrewer_1.1-3                     
#>  [11] doParallel_1.0.17                      
#>  [12] profvis_0.3.8                          
#>  [13] tools_4.3.1                            
#>  [14] backports_1.4.1                        
#>  [15] utf8_1.2.4                             
#>  [16] R6_2.5.1                               
#>  [17] DT_0.30                                
#>  [18] sm_2.2-5.7.1                           
#>  [19] lazyeval_0.2.2                         
#>  [20] GetoptLong_1.0.5                       
#>  [21] urlchecker_1.0.1                       
#>  [22] withr_2.5.2                            
#>  [23] prettyunits_1.2.0                      
#>  [24] gridExtra_2.3                          
#>  [25] cli_3.6.1                              
#>  [26] Biobase_2.62.0                         
#>  [27] formatR_1.14                           
#>  [28] datasets_4.3.1                         
#>  [29] scatterpie_0.2.1                       
#>  [30] sass_0.4.7                             
#>  [31] base_4.3.1                             
#>  [32] readr_2.1.4                            
#>  [33] Rsamtools_2.18.0                       
#>  [34] yulab.utils_0.1.0                      
#>  [35] R.utils_2.12.2                         
#>  [36] DOSE_3.28.0                            
#>  [37] vioplot_0.4.0                          
#>  [38] sessioninfo_1.2.2                      
#>  [39] plotrix_3.8-3                          
#>  [40] BSgenome_1.70.1                        
#>  [41] grDevices_4.3.1                        
#>  [42] limma_3.58.1                           
#>  [43] rstudioapi_0.15.0                      
#>  [44] RSQLite_2.3.3                          
#>  [45] generics_0.1.3                         
#>  [46] gridGraphics_0.5-1                     
#>  [47] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
#>  [48] shape_1.4.6                            
#>  [49] BiocIO_1.12.0                          
#>  [50] gtools_3.9.4                           
#>  [51] dplyr_1.1.3                            
#>  [52] GO.db_3.18.0                           
#>  [53] Matrix_1.6-1.1                         
#>  [54] fansi_1.0.5                            
#>  [55] S4Vectors_0.40.1                       
#>  [56] abind_1.4-5                            
#>  [57] R.methodsS3_1.8.2                      
#>  [58] lifecycle_1.0.4                        
#>  [59] yaml_2.3.7                             
#>  [60] edgeR_4.0.1                            
#>  [61] SummarizedExperiment_1.32.0            
#>  [62] gplots_3.1.3                           
#>  [63] qvalue_2.34.0                          
#>  [64] SparseArray_1.2.0                      
#>  [65] BiocFileCache_2.10.1                   
#>  [66] grid_4.3.1                             
#>  [67] blob_1.2.4                             
#>  [68] promises_1.2.1                         
#>  [69] crayon_1.5.2                           
#>  [70] miniUI_0.1.1.1                         
#>  [71] lattice_0.21-8                         
#>  [72] methods_4.3.1                          
#>  [73] cowplot_1.1.1                          
#>  [74] annotate_1.80.0                        
#>  [75] GenomicFeatures_1.54.1                 
#>  [76] KEGGREST_1.42.0                        
#>  [77] pillar_1.9.0                           
#>  [78] knitr_1.45                             
#>  [79] ComplexHeatmap_2.18.0                  
#>  [80] fgsea_1.28.0                           
#>  [81] GenomicRanges_1.54.1                   
#>  [82] rjson_0.2.21                           
#>  [83] boot_1.3-28.1                          
#>  [84] codetools_0.2-19                       
#>  [85] fastmatch_1.1-4                        
#>  [86] glue_1.6.2                             
#>  [87] ggfun_0.1.3                            
#>  [88] data.table_1.14.8                      
#>  [89] remotes_2.4.2.1                        
#>  [90] Rdpack_2.6.9000                        
#>  [91] vctrs_0.6.4                            
#>  [92] png_0.1-8                              
#>  [93] treeio_1.26.0                          
#>  [94] poweRlaw_0.70.6                        
#>  [95] gtable_0.3.4                           
#>  [96] cachem_1.0.8                           
#>  [97] xfun_0.40                              
#>  [98] rbibutils_2.2.16                       
#>  [99] S4Arrays_1.2.0                         
#> [100] mime_0.12                              
#> [101] tidygraph_1.2.3                        
#> [102] pracma_2.4.2                           
#> [103] survival_3.5-5                         
#> [104] iterators_1.0.14                       
#> [105] statmod_1.5.0                          
#> [106] MatchIt_4.5.5                          
#> [107] stabs_0.6-4                            
#> [108] interactiveDisplayBase_1.40.0          
#> [109] ellipsis_0.3.2                         
#> [110] rlemon_0.2.1                           
#> [111] nlme_3.1-162                           
#> [112] utils_4.3.1                            
#> [113] ggtree_3.10.0                          
#> [114] usethis_2.2.2                          
#> [115] bit64_4.0.5                            
#> [116] progress_1.2.2                         
#> [117] filelock_1.0.2                         
#> [118] GENIE3_1.24.0                          
#> [119] GenomeInfoDb_1.38.0                    
#> [120] rprojroot_2.0.4                        
#> [121] bslib_0.5.1                            
#> [122] KernSmooth_2.23-21                     
#> [123] colorspace_2.1-0                       
#> [124] seqLogo_1.68.0                         
#> [125] BiocGenerics_0.48.1                    
#> [126] DBI_1.1.3                              
#> [127] DESeq2_1.42.0                          
#> [128] tidyselect_1.2.0                       
#> [129] processx_3.8.2                         
#> [130] bit_4.0.5                              
#> [131] compiler_4.3.1                         
#> [132] curl_5.1.0                             
#> [133] optmatch_0.10.6                        
#> [134] glmnet_4.1-8                           
#> [135] xml2_1.3.5                             
#> [136] desc_1.4.2                             
#> [137] DelayedArray_0.28.0                    
#> [138] shadowtext_0.1.2                       
#> [139] rtracklayer_1.62.0                     
#> [140] scales_1.2.1                           
#> [141] caTools_1.18.2                         
#> [142] ChIPseeker_1.38.0                      
#> [143] monaLisa_1.8.0                         
#> [144] callr_3.7.3                            
#> [145] rappdirs_0.3.3                         
#> [146] stringr_1.5.0                          
#> [147] digest_0.6.33                          
#> [148] shinyBS_0.61.1                         
#> [149] rmarkdown_2.25                         
#> [150] XVector_0.42.0                         
#> [151] htmltools_0.5.6.1                      
#> [152] pkgconfig_2.0.3                        
#> [153] MatrixGenerics_1.14.0                  
#> [154] highr_0.10                             
#> [155] dbplyr_2.4.0                           
#> [156] fastmap_1.1.1                          
#> [157] rlang_1.1.1                            
#> [158] GlobalOptions_0.1.2                    
#> [159] htmlwidgets_1.6.2                      
#> [160] shiny_1.7.5.1                          
#> [161] jquerylib_0.1.4                        
#> [162] farver_2.1.1                           
#> [163] zoo_1.8-12                             
#> [164] jsonlite_1.8.7                         
#> [165] BiocParallel_1.36.0                    
#> [166] R.oo_1.25.0                            
#> [167] GOSemSim_2.28.0                        
#> [168] RCurl_1.98-1.13                        
#> [169] magrittr_2.0.3                         
#> [170] GenomeInfoDbData_1.2.11                
#> [171] ggplotify_0.1.2                        
#> [172] patchwork_1.1.3                        
#> [173] munsell_0.5.0                          
#> [174] Rcpp_1.0.11                            
#> [175] ape_5.7-1                              
#> [176] viridis_0.6.4                          
#> [177] chk_0.9.1                              
#> [178] stringi_1.7.12                         
#> [179] ggraph_2.1.0                           
#> [180] zlibbioc_1.48.0                        
#> [181] MASS_7.3-60                            
#> [182] AnnotationHub_3.10.0                   
#> [183] plyr_1.8.9                             
#> [184] pkgbuild_1.4.2                         
#> [185] parallel_4.3.1                         
#> [186] HPO.db_0.99.2                          
#> [187] ggrepel_0.9.4                          
#> [188] CNEr_1.38.0                            
#> [189] Biostrings_2.70.1                      
#> [190] graphlayouts_1.0.2                     
#> [191] splines_4.3.1                          
#> [192] hms_1.1.3                              
#> [193] circlize_0.4.15                        
#> [194] locfit_1.5-9.8                         
#> [195] ps_1.7.5                               
#> [196] igraph_1.5.1                           
#> [197] reshape2_1.4.4                         
#> [198] biomaRt_2.58.0                         
#> [199] stats4_4.3.1                           
#> [200] pkgload_1.3.3                          
#> [201] TFMPvalue_0.0.9                        
#> [202] BiocVersion_3.18.0                     
#> [203] XML_3.99-0.15                          
#> [204] evaluate_0.23                          
#> [205] multiMiR_1.24.0                        
#> [206] BiocManager_1.30.22                    
#> [207] tzdb_0.4.0                             
#> [208] foreach_1.5.2                          
#> [209] tweenr_2.0.2                           
#> [210] httpuv_1.6.11                          
#> [211] networkD3_0.4                          
#> [212] graphics_4.3.1                         
#> [213] tidyr_1.3.0                            
#> [214] purrr_1.0.2                            
#> [215] polyclip_1.10-6                        
#> [216] clue_0.3-65                            
#> [217] ggplot2_3.4.4                          
#> [218] ggforce_0.4.1                          
#> [219] xtable_1.8-4                           
#> [220] restfulr_0.0.15                        
#> [221] tidytree_0.4.5                         
#> [222] MPO.db_0.99.7                          
#> [223] roxygen2_7.2.3                         
#> [224] later_1.3.1                            
#> [225] viridisLite_0.4.2                      
#> [226] tibble_3.2.1                           
#> [227] aplot_0.2.2                            
#> [228] memoise_2.0.1                          
#> [229] AnnotationDbi_1.64.1                   
#> [230] GenomicAlignments_1.38.0               
#> [231] IRanges_2.36.0                         
#> [232] stats_4.3.1                            
#> [233] cluster_2.1.4