Understanding the molecular mechanisms underlying complex diseases or
during the course of development has been a major goal in biomedical
research. The regulation of the transcriptome, both at the
transcriptional and post- transcriptional levels, is a key component of
these mechanisms. However, the complexity of the regulatory networks, in
particular as it involves multiple epigenetic players, renders the
dissection of these mechanisms challenging. In particular, the most
common may to construct a transcriptional regulatory network (TRN)
involves using known interactions at a certain condition. However, this
method is insufficient to explain the dynamics of the TRN in terms of
the changes observed between biological conditions. In this vignette, we
demonstrate how to use the IntegraTRN
package to construct a TRN using multi-omics data.
The package IntegraTRN
is designed to
integrate differential analysis based on multiple omics data and
construct a TRN based on both curated and predicted interactions that
underlie the differential expression of genes. In the mean time, the
curation of multi-omics data could be difficult, so the package provides
a flexibility to allow users to decide what data can be integrated based
on their own knowledge. The analysis pipeline can be primarily divided
into two parts: 1) differential analysis of omics data; and 2)
construction of the TRN. In this vignette, we will demonstrate how to
use the package to construct a TRN using
transcriptomic, small RNAomic, and
chromatin accessibility data.
See help(package = "IntegraTRN")
for more information
about the package and citation("IntegraTRN")
for how to
cite the package. To download the package, please use the following
command:
require("devtools")
devtools::install_github("j-y26/IntegraTRN", build_vignettes = TRUE)
library("IntegraTRN")
To list all available functions in the package, use the following command:
The package has provided datasets for demonstration purposes. The datasets are fully simulated or partially simulated based on real data (Aharon-Yariv et al. 2023). Here is an overview of the basic omics data used in this vignette:
data("RNAseq_heart") # RNAseq count matrix
data("RNAseq_heart_samples") # RNAseq sample information
data("smallRNAseq_heart") # small RNAseq count matrix
data("smallRNAseq_heart_samples") # small RNAseq sample information
These are RNAseq and small RNAseq data from human fetal heart tissues, as well as the information about the samples.
Please use ?<dataset>
to see the details of each
dataset.
We will now illustrate how to use the package to construct a TRN using these datasets.
The first step of the analysis pipeline is to perform differential analysis separately on each omic data. During the differential analysis, we can retrieve insights into the “changes” that happen during the biological process separately for each omic data. This exploratory analysis is essential to understand what alterations in the transcriptome can explain difference in the phenotype of interest, and what variations in any of the regulatory players could potentially contribute to these changes.
MOList
object using multi
omics dataTo get started, we first need to generate a
MOList
object. The
MOList
object stands for
multi-omics list
, which is the core of
entire analysis pipeline as it preserves our input data and the results
of the differential analysis in a centralized object. The
MOList
object can be generated using the
MOList
function.
The MOList
constructor has a large
flexibility in terms of what input omics data are available. At the
minimum, when constructing a MOList
object
for the first time, we need to provide the RNAseq data and the sample
grouping information. For any count-based omics data (RNAseq, small
RNAseq, and proteomics), the constructor takes a numeric matrix of
counts, where each row represents a gene or a protein, and each column
represents a sample. The grouping information can be any vector of the
same length as the number of samples, in the same order to the columns
of the count matrix. A continuous or a binary variable can be used.
However, we need to make sure we are supplying the same grouping
criteria for all omics data. Here we create a
myMOList
using only RNAseq and small
RNAseq data, using the age
of the samples
as the grouping criteria for downstream differential analysis.
myMOList <- MOList(
RNAseq = RNAseq_heart,
RNAGroupBy = RNAseq_heart_samples$Age,
smallRNAseq = smallRNAseq_heart,
smallRNAGroupBy = smallRNAseq_heart_samples$Age
)
#> RNAseq: RNAseq data has been added.
#> Small RNAseq: Small RNAseq data has been added.
#> Proteomics: Optional proteomics data has NOT been provided.
#> ATACseq: Optional ATAC peaks data has NOT been provided.
As long as a MOList
object is
constructed with RNAseq data, additional omics data can be added to the
existing object using the same syntax. Any existing data can also be
replaced by new data using the same syntax.
Specifically for ATACseq data, we need to provide the path to the
peak files instead of the count matrix. Since most peak callers output
peak files in BED
format, we can simply provide a path to a BED file, for each
condition. Our package has already provided two simulated peak files in
the extdata
folder.
atacPeak1Path <- system.file("extdata", "peak1.bed", package = "IntegraTRN")
atacPeak2Path <- system.file("extdata", "peak2.bed", package = "IntegraTRN")
A consensus peak file should be provided for each condition. When
biological replicates are used for the ATACseq experiment, the peak
files should be merged into one and filtered for blacklisted regions
before being provided to the constructor (Amemiya, Kundaje, and Boyle 2019). Here, we
simply use the previously created myMOList
object to append the ATACseq data.
Once the MOList
object is created, we
can perform differential analysis on each omics data separately.
However, since differential analysis on count-based data is different
that that of the epigenomic data, the process is slightly different.
For count-based omics data, we can use the
diffOmics
function to perform differential
analysis. To simplify the process, the
diffOmics
function essentially performs
differential expression analysis on all available count based omics data
in the MOList
object. Since we can specify
to use either DESeq2
or
edgeR
for the differential analysis, our
underlying assumption is that the expression data follows a negative
binomial distribution (Love, Huber, and Anders
2014) (Robinson, McCarthy, and Smyth
2010). Furthermore, the differential expression analysis utilizes
a generalized linear model (GLM) to allow adjustment on covariates. In
particular, the function allows adjustment on the batch effect, which is
a common issue in next-generation sequencing (NGS) data.
myMOList <- diffOmics(myMOList,
rnaseqBatch = RNAseq_heart_samples$Batch,
program = "DESeq2"
)
#> Performing differential analysis on RNAseq data
#>
#> Performing differential analysis on smallRNAseq data
#>
#> Processing peaks...
The batch
argument follow the same
criteria as the grouping information provided during the construction of
the MOList
object. Therefore, this
argument is not limited to the batch effect, but can be any covariate
that we want to adjust for.
# NOT run, just illustrate the usage of the batch argument
myMOList <- diffOmics(myMOList,
smallRnaBatch = smallRNAseq_heart_samples$Sex,
program = "edgeR"
)
To explore the results of the differential analysis, several
visualization methods are provided. First, to see the distribution of
differentially expressed mRNAs as well as exploring a optimal cutoff for
defining genes as differentially expressed, we can visualize the RNAseq
data using a volcano plot. We can specify the cutoffs for the adjusted
p-value and the log2 fold change (log2FC) to define differentially
expressed genes. Here, we define log2FC > 0.1
and
adjusted p-value < 0.05
, which is the default, as the
cutoffs.
Figure 1. Volcano plot of RNAseq data
The plot also directly indicates the number of up- and down-regulated genes.
How do we efficiently explore the raw differential expression
results? The IntegraTRN
package has
conveniently defined a set of data structures, primarily S4 classes and
methods, that allows users to easily explore the results. The previously
mentioned MOList
object essentially works
as a list, which contains differential results stored in a
DETag
object for each omics data. This
means that we can easily retrieve the differential expression results
via simple subsetting.
A simple snap shot of the DETag
object
can be directly invoked.
myRNADETag
#> A DETag S4 object
#> Method for differential analysis: DESeq2
#>
#> Number of genes: 5000
#> Number of genes with adjusted p-value < 0.05: 3223
#>
#> A snap shot for differential analysis results:
#> baseMean log2FoldChange lfcSE stat pvalue
#> B4GALT3 958.13887 -0.0078640979 0.009959527 -0.78960552 4.297582e-01
#> SLC26A11 458.38254 -0.0019483551 0.009851067 -0.19778113 8.432163e-01
#> BEAN1 114.23147 -0.0006515378 0.020920562 -0.03114342 9.751552e-01
#> STAT3 2121.37501 -0.0343187743 0.008212107 -4.17904601 2.927345e-05
#> STEAP3 56.69544 0.0311278206 0.016709689 1.86286052 6.248189e-02
#> PIP5K1C 1552.90544 -0.0978374269 0.012828549 -7.62653887 2.411402e-14
#> padj
#> B4GALT3 4.955699e-01
#> SLC26A11 8.747057e-01
#> BEAN1 9.810414e-01
#> STAT3 8.402252e-05
#> STEAP3 9.057972e-02
#> PIP5K1C 3.367881e-13
#>
#> To access the full results, use the exportDE S4 method
To further obtain the full results of the differential expression
analysis, we can use the exportDE
function. This function returns a data frame containing the outputs of
DESeq2
or
edgeR
, with choice of using a consistent
interface.
exportDE(myRNADETag, original = TRUE) %>%
head(5)
#> baseMean log2FoldChange lfcSE stat pvalue
#> B4GALT3 958.13887 -0.0078640979 0.009959527 -0.78960552 4.297582e-01
#> SLC26A11 458.38254 -0.0019483551 0.009851067 -0.19778113 8.432163e-01
#> BEAN1 114.23147 -0.0006515378 0.020920562 -0.03114342 9.751552e-01
#> STAT3 2121.37501 -0.0343187743 0.008212107 -4.17904601 2.927345e-05
#> STEAP3 56.69544 0.0311278206 0.016709689 1.86286052 6.248189e-02
#> padj
#> B4GALT3 4.955699e-01
#> SLC26A11 8.747057e-01
#> BEAN1 9.810414e-01
#> STAT3 8.402252e-05
#> STEAP3 9.057972e-02
By having original = TRUE
, the data
frame remains exactly the same as the
DESeq2
output.
Some times, we may also want to rank the genes to perform gene set
enrichment analysis (GSEA). (Reimand et al.
2019) To this end, we can make use of the subclass of
DETag
, which is the
TOPTag
object. It automatically performs
piValue transformation (Reimand et al.
2019) and ranking based on a user-specified criteria. For
example, let’s retrieve genes that have at least 0.1 log2 fold change
and select the top 200 ranked genes.
myRNATOPTag <- TOPTag(myRNADETag,
pCutoff = 0.05,
logFCCutoff = 0.1,
topGenes = 200,
direction = "both"
)
To see a snap shot:
myRNATOPTag
#> A TOPTag object.
#> The top 200 differential genes are selected.
#>
#> A snapshot of the top up-regulated genes:
#> logFC pvalue padj piValue rank
#> SLC35F1 0.3168227 6.830735e-61 1.707684e-57 56.76759 1
#> JAM2 0.1955125 2.053533e-51 2.566917e-48 47.59059 2
#> FHL2 0.2654241 6.220696e-49 6.220696e-46 45.20616 3
#> FBXW7 0.2408516 6.853848e-44 4.283655e-41 40.36819 4
#> CNOT11 0.1348388 4.780297e-43 2.655720e-40 39.57582 5
#>
#> A snapshot of the top down-regulated genes:
#> logFC pvalue padj piValue rank
#> RAI1 -0.1236128 4.680130e-10 3.175122e-09 -8.498240 -1
#> ATP8B2 -0.1022981 4.871540e-10 3.291581e-09 -8.482595 -2
#> SORCS2 -0.1413809 5.456246e-10 3.652106e-09 -8.437457 -3
#> PRRC2B -0.1897221 5.944744e-10 3.931709e-09 -8.405419 -4
#> SLC47A1 -0.1106573 6.664062e-10 4.384252e-09 -8.358105 -5
#>
#> Method for differential expression analysis: DESeq2
#> Log fold change cutoff: 0.1
#> Adjusted p-value cutoff: 0.05
Due to the inheritance and object-oriented nature of the S4 class
system, we can handle the TOPTag
object in
the same way as the DETag
object.
# Export the desired results with rank information
myTopRNAResults <- exportDE(myRNATOPTag)
# Make numeric values easier to read
myTopRNAResults[, 1:4] <- format(myTopRNAResults[, 1:4], digits = 4, scientific = TRUE)
head(myTopRNAResults)
#> logFC pvalue padj piValue rank
#> SLC35F1 3.168e-01 6.831e-61 1.708e-57 5.677e+01 1
#> JAM2 1.955e-01 2.054e-51 2.567e-48 4.759e+01 2
#> FHL2 2.654e-01 6.221e-49 6.221e-46 4.521e+01 3
#> FBXW7 2.409e-01 6.854e-44 4.284e-41 4.037e+01 4
#> CNOT11 1.348e-01 4.780e-43 2.656e-40 3.958e+01 5
#> LDHD 2.558e-01 2.245e-41 9.356e-39 3.803e+01 6
Here, a rank of 1 and -1 indicates the most up- and down-regulated gene.
For publication purposes, we can also label some genes that we are interested in on the volcano plot. Let’s first retrieve a few differentially expressed genes
genesToLabel <- exportDE(myRNADETag) %>%
dplyr::filter(padj < 0.05, abs(logFC) > 0.1) %>%
dplyr::arrange(desc(logFC)) %>%
rownames()
# Get an up- and a down- regulated gene
# For example use the 10th gene with highest fold change
genesToLabel <- genesToLabel[c(10, length(genesToLabel) - 10)]
Then we again plot the volcano plot, with some genes labeled. We can
make use of the highlight
argument to
specify the genes to label.
Figure 2. Volcano plot of RNAseq data with some genes labelled
Before moving on to small RNAseq data, we first need to annotate the small RNAs for their types. To do so, we need to provide a data frame of small RNA annotation, as in the name of the transcripts for each type of small RNA. The following is an example:
smallRNAAnnotation <- data.frame(
transcript = c("hsa-miR-1-3p", "hsa-miR-1-5p", "hsa-miR-2-5p"),
type = c("miRNA", "miRNA", "miRNA")
)
To annotate the small RNAseq data, we can use the
annotateSmallRNA
function.
# NOT run, just illustrate the usage of the smallRNAAnnotation argument
myMOList <- annotateSmallRNA(myMOList, anno = smallRNAAnnotation)
We need to make sure that the names of the transcripts are consistent with the names in the count matrix, since different annotation databases may provide different names for the same transcript.
Alternatively, the package internally provides an annotation database for human small RNAs. This database is curated from several small RNA annotation databases, including miRBase, circBase, piRBase, piRNABank, GtRNAdb, GENCODE, and piRNACluster. Since we are dealing with human data, we can make use of this internal database.
The annotateSmallRNA
function will
automatically checks for the coverage of each small RNA type and will
alert the user if the annotation does not cover all existing
transcripts.
Similar to the RNAseq data, we can visualize the distribution of differentially expressed small RNAs using a volcano plot.
Figure 3. Volcano plot of small RNAseq data
Here, we use the default P-value cutoff, which is 0.05
.
In the mean time, the separately colors the up and down regulated small
RNAs, with the number of each annotated respectively.
On the other hand, small RNAs are usually a lot more complex and diverse than mRNAs. Each type of small RNA has its own function and mechanism of action. Therefore, we can also specifically color each type of small RNAs on the volcano plot, using a more specialized function.
By default, the plotVolcanoSmallRNA
function generates a volcano plot with each type of small RNA colored.
By default, it utilizes the BuPu
palette
defined in the RColorBrewer
package (Neuwirth 2022).
Figure 4. Volcano plot of small RNAseq data with each type of small RNA coloured
Alternatively, we can also specify a color scheme using the
colScheme
argument.
# NOT run, just illustrate the usage of the colScheme argument
plotVolcanoSmallRNA(myMOList,
adjP = 0.05,
log2FC = 0,
colScheme = RColorBrewer::brewer.pal(6, "Set1"))
Furthermore, although the function of microRNAs (miRNAs) is well studied, the understanding of other types of small RNAs such as piwi-interacting RNAs (piRNAs) and small nucleolar RNAs (snoRNAs) is still limited. Therefore, we can assess the contribution of each type of small RNA to the differences observed between samples. To do so, we can perform a principal component analysis (PCA) on each type of small RNA separately.
The plotSmallRNAPCAs
function returns a
list of PCA plots, one for each type of small RNA. We can check the
length of the list to see how many PCA plots are generated.
In this case, we have 6 PCA plots, one for each small RNA type. We can plot the PCA plot for miRNA using the following command:
Figure 5. PCA plot of miRNA
Or, we can generate a combined PCA plot for all small RNA types.
pcaPlotList$miRNA + pcaPlotList$circRNA + pcaPlotList$piRNA + pcaPlotList$snoRNA +
pcaPlotList$snRNA + pcaPlotList$tRNA
Figure 6. Combined PCA plot of all small RNA types
Analysis of epigenomic data requires several more steps than that of the count-based omics data. In the case of ATACseq data, genomic regions are profiled for their accessibility to the hyperactive Tn5 transposase. This accessibility corresponds to the chromatin state of the region, which can be either open or closed for DNA binding proteins, particularly transcription factors (TFs). Therefore, to analyze differential chromatin accessibility, we need to annotate the peaks with the genes that are nearby. Furthermore, since we are interested in transcriptional regulation, we can parse through the open chromatin regions and identify the TF binding motifs that are enriched in these regions. Differentially enriched motifs can then be used to infer the TFs that are potentially involved in the regulation of the differentially expressed genes.
To do so, we first need additional annotation information. Here, we
make use of the annotation database from the
TxDb.Hsapiens.UCSC.hg38.knownGene
package,
which is a TxDb
object that contains the
genomic coordinates of all known genes in the hg38
genome
assembly (Team and Maintainer 2023). We
also need to provide a more general annotation database, which is the
org.Hs.eg.db
package to extract the gene
information (Carlson 2023). Finally,
scanning the open chromatin regions for TF binding motifs requires
identifying the sequence of the open chromatin regions. Here, we use the
corresponding genome FASTA database from the
BSgenome.Hsapiens.UCSC.hg38
package (Pagès 2023).
# Annotate ATAC Peaks
library("TxDb.Hsapiens.UCSC.hg38.knownGene")
library("org.Hs.eg.db")
library("BSgenome.Hsapiens.UCSC.hg38")
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
annoDb <- "org.Hs.eg.db"
bsgenome <- BSgenome.Hsapiens.UCSC.hg38
Motif enrichment analysis would also require a known set of position
weight matrices (PWMs) for all known curated TFs. The PWM is a matrix
that contains the probability of each nucleotide (A, T, C, G) at every
position of the binding motif. The
IntegraTRN
package has already provided a
list of PWMs from the JASPAR2022
database
for vertebrates. We can load the PWMs using the following command:
This is a PWMatrixList
object defined
in the TFBSTools
package (Tan and Lenhard 2016). This package also
provides additional functionality to retrieve PWMs from other databases
or tools, such as HOMER etc. Here, we will use our package-provided PWMs
from JASPAR2022
(Castro-Mondragon et al. 2022).
The annotateATACPeaksMotif
function
performs both peak annotation and motif enrichment analysis via a single
call. Here, we define the promoter region as the region
3kb upstream and 3kb downstream of the
transcription start site.
myMOList <- annotateATACPeaksMotif(myMOList,
tssRegion = c(-3000, 3000),
TxDb = txdb,
annoDb = annoDb,
bsgenome = bsgenome,
pwmL = jasparVertebratePWM
)
After performing peak annotation and motif enrichment analysis, we can explore the analysis via several methods.
First, let us take a look at the annotation on the peaks for their genomic features. This pie chart shows the percentage of peaks that are annotated as each genomic feature.
Figure 7. Annotation of ATAC peaks
To further access the quality of the ATACseq experiment, we can plot the coverage of the ATACseq peaks on the genome. This plot shows the where the peaks are located in each chromosome.
Figure 8. ATACseq coverage
On the left of the plots shows all differentially enriched motifs.
Beyond globally exploring the ATACseq data with the above plots, we
can also inspect the annotation on each individual peak. Once again, the
IntegraTRN
package utilizes the S4
inheritance system to create a PEAKTag
object once motif enrichment analysis is performed. This object
implements the generic as.data.frame
method, conveniently allowing us to export the results as a data frame
to see where each individual peaks are located.
myMotifs <- as.data.frame(myMOList$DEATAC)
head(myMotifs[, 1:8]) # Many annotations, see the first 8 attributes
#> seqnames start end width strand Condition annotation geneChr
#> 1 chr1 1597754 1597883 130 * - Promoter (2-3kb) 1
#> 2 chr1 7783291 7783789 499 * - Promoter (<=1kb) 1
#> 3 chr1 7783791 7784289 499 * - Promoter (<=1kb) 1
#> 4 chr1 7784291 7784789 499 * - Promoter (<=1kb) 1
#> 5 chr1 9996206 9996704 499 * - Promoter (<=1kb) 1
#> 6 chr1 9996706 9997204 499 * - Promoter (<=1kb) 1
Then we can explore the key part of the analysis, which is the
differentially enriched motifs. Typically, the differential motif
enrichment can be very stringent especially when the ATACseq libraries
are not sequenced very deeply (less than 100 million reads) (Yan et al. 2020). Therefore, we can use a more
relaxed cutoff for the p-value. The
plotATACMotifHeatmap
function by default
utilizes adjusted p-value, but when a unadjusted p-value is provided, it
will use the unadjusted one and ignore the default. Since our ATACseq
data is simulated, here we use a relaxed cutoff.
Figure 9. Motif enrichment on ATACseq peaks
Exploring the differential analysis results can provide insights into the changes that happen during the biological process. In addition, by exploring these results, have a better understanding of the data and the quality of the experiment. These understanding should allow us to make better decisions on deciding the cutoffs used for selecting genes and transcripts for constructing the TRN.
Now we can move on to the second part of the analysis pipeline, which is to construct the TRN.
However, before directly construct the TRN, some data preprocessing is required. First, the package analyzes predicted interactions by using a tree-based algorithm to effectively predict regulatory interactions based on co-expression between genes and small RNAs (Huynh-Thu et al. 2010). However, this algorithm requires that the samples between the RNAseq and small RNAseq data are matched so that co-expression profiles can be extracted. Therefore, we need to match the samples between the RNAseq and small RNAseq data (Stuart et al. 2011).
We can perform an one-to-one optimal matching between the samples
using the matchSamplesRNAsmallRNA
function. When we have
further information about the samples for RNAseq and small RNAseq, we
can supply the information as data frames. The algorithm by default
extracts all common variables between the two data frames and use them
for matching. Alternatively, we can also specify the variables used for
matching using the varMatch
argument.
Here, we use all common variables, which are Age
and
Sex
for matching.
myMOList <- matchSamplesRNAsmallRNA(myMOList,
sampleDFRNAseq = RNAseq_heart_samples,
sampleDFSmallRNAseq = smallRNAseq_heart_samples
)
Let us take a look at the matched samples.
exportMatchResult(myMOList) %>%
head(5) # For the first 5 matches
#> RNAseq SmallRNAseq
#> 1 sample_48 sample_1
#> 2 sample_51 sample_2
#> 3 sample_5 sample_3
#> 4 sample_49 sample_4
#> 5 sample_10 sample_5
There are a total of 37
matches.
Depending on the biological question, we may want to use externally
curated interactions to be part of the TRN. For example, the miRNet
tools provides a comprehensive database of miRNA-target interactions,
but the information are usually not tissue-specific or all relevant to
the biological question (Chang et al.
2020). However, IntegraTRN
provides
a flexibility to allow users to import externally curated interactions,
and during the construction of the TRN, these external interactions are
intersected with the predicted interactions as well as the differential
genes/enriched motifs to provide a more focused TRN.
Let us load some simulated miRNA-target interactions and TF-target interactions. These data has been provided as part of the package. These interactions are in the adjacency list format, where it is a list of two vectors, regulator and target.
As mentioned previously, the cutoffs used for selecting genes and transcripts for constructing the TRN are important. Due to the large number of omics data, a separate function is provide for users to set the cutoffs. Here we use the cutoffs defined earlier in the differential analysis.
omiCutoffs <- setOmicCutoffs(
rnaAdjPval = 0.05,
rnaLogFC = 0.1,
rnaTopGenes = 50,
smallRNAAdjPval = 0.05,
smallRNALogFC = 0.1,
smallRNATopGenes = 50,
atacMotifPval = 0.01
)
This can also be easily checked.
omiCutoffs
#> RNAseq adjusted p-value cutoff: < 0.05
#> RNAseq log fold change cutoff: > 0.1
#> RNAseq selecting top 50 DE genes
#> smallRNAseq adjusted p-value cutoff: < 0.05
#> smallRNAseq log fold change cutoff: > 0.1
#> smallRNAseq selecting top 50 DE genes
#> Proteomics adjusted p-value cutoff: < 0.05
#> Proteomics log fold change cutoff: > 0
#> ATACseq motif p-value cutoff: < 0.01
#> ATACseq motif log fold enrichment cutoff: > 0
Finally, we can construct the TRN using the
constructTRN
function. This function
contains the logic to deal with different availability of the omics data
as well as the availability of external interactions. We can specify
whether we want to focus only on the up or down regulated genes, or
both. We also need to specify whether we want to use the predicted
interactions or not.
Here, we construct a network using both predicted and curated interactions, for all differentially expressed genes that are both up- and down-regulated. When predicted interactions are used, we can make the process faster by using multiple threads, and a random seed can be specified or reproducibility (Huynh-Thu et al. 2010).
myTRNet <- constructTRN(myMOList,
omiCutoffs = omiCutoffs,
smallRNAtypes = "all",
targetDirection = "both",
predicted = TRUE,
nthreads = 1,
seed = 123
)
#> Constructing a combined expression matrix for GENIE3...
#> Criteria for defining the top differentially expressed genes:
#> log2 fold change cutoff: 0.1
#> adjusted p-value cutoff: 0.05
#> top 50 genes selected
#> Criteria for defining the top differentially expressed small RNAs:
#> log2 fold change cutoff: 0.1
#> adjusted p-value cutoff: 0.05
#> top 50 small RNAs selected
#> Running GENIE3 for predicted interactions...
#> Number of trees: 1000
#> Number of threads: 1
#> Tree method: RF
#> This may take a while...
We can easily plot the network using the
plotNetwork
function. We can decide
whether we want an interactive network. For large networks, the
interactive network provides a better way to explore the interactions
(Allaire et al. 2017).
Figure 10. The Transcriptional Regulatory Network
When hovering over the nodes, we can see the name of the element. The each type of the element (target gene, miRNA, piRNA, TF etc.) are color coded. The layout of the network is computed by a force-directed algorithm, but we can simply drag the nodes to better visualize the interactions.
We can also plot a static network using the
plotNetwork
function. But this time, we
only plot the up-regulated genes and miRNAs with a more stringent
cutoff.
Let us construct a new TRN.
myTRNet2 <- constructTRN(myMOList,
omiCutoffs = omiCutoffs2,
smallRNAtypes = "miRNA",
targetDirection = "up",
predicted = TRUE
)
#> Constructing a combined expression matrix for GENIE3...
#> Criteria for defining the top differentially expressed genes:
#> log2 fold change cutoff: 0.2
#> adjusted p-value cutoff: 0.01
#> top 50 genes selected
#> Criteria for defining the top differentially expressed small RNAs:
#> log2 fold change cutoff: 0.1
#> adjusted p-value cutoff: 0.05
#> top 50 small RNAs selected
#> Running GENIE3 for predicted interactions...
#> Number of trees: 1000
#> Number of threads: 1
#> Tree method: RF
#> This may take a while...
Then we can plot the network as a static figure (Csardi, Nepusz, et al. 2006).
Figure 11. The Transcriptional Regulatory Network of Up-regulated Genes
Some times it is hard to use the default parameters to plot a
publication-level network. However, users can customize the network
using the igraph
package. We can extract
the network as an igraph
object and
performs further customization (Csardi, Nepusz,
et al. 2006).
exportIgraph(myTRNet2)
#> IGRAPH 8d8a9d9 DN-B 57 119 --
#> + attr: name (v/c), type (v/c)
#> + edges from 8d8a9d9 (vertex names):
#> [1] hsa-miR-1301-3p ->KLHL40 hsa-mir-183 ->KBTBD12
#> [3] hsa-miR-323a-3p ->TSPO hsa-miR-1301-3p ->LRRC14B
#> [5] hsa-miR-6747-3p ->COX7C hsa-miR-1910-5p ->NRAP
#> [7] hsa-miR-31-5p ->COX7C hsa-miR-323a-3p ->DBP
#> [9] hsa-miR-323a-3p ->C1QL1 hsa-miR-1301-3p ->CEBPD
#> [11] hsa-miR-31-5p ->CEBPD hsa-miR-323a-3p ->FAU
#> [13] hsa-miR-323a-3p ->G0S2 hsa-mir-130b ->LGALS3
#> [15] hsa-mir-183 ->FAM129A hsa-miR-6747-3p ->CEBPD
#> + ... omitted several edges
sessionInfo("IntegraTRN")
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22621)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=English_Canada.utf8
#> [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C
#> [5] LC_TIME=English_Canada.utf8
#>
#> time zone: America/Toronto
#> tzcode source: internal
#>
#> attached base packages:
#> character(0)
#>
#> other attached packages:
#> [1] IntegraTRN_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.6.3
#> [2] matrixStats_1.0.0
#> [3] bitops_1.0-7
#> [4] enrichplot_1.22.0
#> [5] DirichletMultinomial_1.44.0
#> [6] devtools_2.4.5
#> [7] TFBSTools_1.40.0
#> [8] HDO.db_0.99.1
#> [9] httr_1.4.7
#> [10] RColorBrewer_1.1-3
#> [11] doParallel_1.0.17
#> [12] profvis_0.3.8
#> [13] tools_4.3.1
#> [14] backports_1.4.1
#> [15] utf8_1.2.4
#> [16] R6_2.5.1
#> [17] DT_0.30
#> [18] sm_2.2-5.7.1
#> [19] lazyeval_0.2.2
#> [20] GetoptLong_1.0.5
#> [21] urlchecker_1.0.1
#> [22] withr_2.5.2
#> [23] prettyunits_1.2.0
#> [24] gridExtra_2.3
#> [25] cli_3.6.1
#> [26] Biobase_2.62.0
#> [27] formatR_1.14
#> [28] datasets_4.3.1
#> [29] scatterpie_0.2.1
#> [30] labeling_0.4.3
#> [31] sass_0.4.7
#> [32] base_4.3.1
#> [33] readr_2.1.4
#> [34] Rsamtools_2.18.0
#> [35] yulab.utils_0.1.0
#> [36] R.utils_2.12.2
#> [37] DOSE_3.28.0
#> [38] vioplot_0.4.0
#> [39] sessioninfo_1.2.2
#> [40] plotrix_3.8-3
#> [41] BSgenome_1.70.1
#> [42] grDevices_4.3.1
#> [43] limma_3.58.1
#> [44] rstudioapi_0.15.0
#> [45] RSQLite_2.3.3
#> [46] generics_0.1.3
#> [47] gridGraphics_0.5-1
#> [48] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
#> [49] shape_1.4.6
#> [50] BiocIO_1.12.0
#> [51] gtools_3.9.4
#> [52] dplyr_1.1.3
#> [53] GO.db_3.18.0
#> [54] Matrix_1.6-1.1
#> [55] fansi_1.0.5
#> [56] S4Vectors_0.40.1
#> [57] abind_1.4-5
#> [58] R.methodsS3_1.8.2
#> [59] lifecycle_1.0.4
#> [60] yaml_2.3.7
#> [61] edgeR_4.0.1
#> [62] SummarizedExperiment_1.32.0
#> [63] gplots_3.1.3
#> [64] qvalue_2.34.0
#> [65] SparseArray_1.2.0
#> [66] BiocFileCache_2.10.1
#> [67] grid_4.3.1
#> [68] blob_1.2.4
#> [69] promises_1.2.1
#> [70] crayon_1.5.2
#> [71] miniUI_0.1.1.1
#> [72] lattice_0.21-8
#> [73] methods_4.3.1
#> [74] cowplot_1.1.1
#> [75] annotate_1.80.0
#> [76] GenomicFeatures_1.54.1
#> [77] KEGGREST_1.42.0
#> [78] pillar_1.9.0
#> [79] knitr_1.45
#> [80] ComplexHeatmap_2.18.0
#> [81] fgsea_1.28.0
#> [82] GenomicRanges_1.54.1
#> [83] rjson_0.2.21
#> [84] boot_1.3-28.1
#> [85] codetools_0.2-19
#> [86] fastmatch_1.1-4
#> [87] glue_1.6.2
#> [88] ggfun_0.1.3
#> [89] data.table_1.14.8
#> [90] remotes_2.4.2.1
#> [91] Rdpack_2.6.9000
#> [92] vctrs_0.6.4
#> [93] png_0.1-8
#> [94] treeio_1.26.0
#> [95] poweRlaw_0.70.6
#> [96] gtable_0.3.4
#> [97] cachem_1.0.8
#> [98] xfun_0.40
#> [99] rbibutils_2.2.16
#> [100] S4Arrays_1.2.0
#> [101] mime_0.12
#> [102] tidygraph_1.2.3
#> [103] pracma_2.4.2
#> [104] survival_3.5-5
#> [105] iterators_1.0.14
#> [106] statmod_1.5.0
#> [107] MatchIt_4.5.5
#> [108] stabs_0.6-4
#> [109] interactiveDisplayBase_1.40.0
#> [110] ellipsis_0.3.2
#> [111] rlemon_0.2.1
#> [112] nlme_3.1-162
#> [113] utils_4.3.1
#> [114] ggtree_3.10.0
#> [115] usethis_2.2.2
#> [116] bit64_4.0.5
#> [117] progress_1.2.2
#> [118] filelock_1.0.2
#> [119] GENIE3_1.24.0
#> [120] GenomeInfoDb_1.38.0
#> [121] rprojroot_2.0.4
#> [122] bslib_0.5.1
#> [123] KernSmooth_2.23-21
#> [124] colorspace_2.1-0
#> [125] seqLogo_1.68.0
#> [126] BiocGenerics_0.48.1
#> [127] DBI_1.1.3
#> [128] DESeq2_1.42.0
#> [129] tidyselect_1.2.0
#> [130] processx_3.8.2
#> [131] bit_4.0.5
#> [132] compiler_4.3.1
#> [133] curl_5.1.0
#> [134] optmatch_0.10.6
#> [135] glmnet_4.1-8
#> [136] xml2_1.3.5
#> [137] desc_1.4.2
#> [138] DelayedArray_0.28.0
#> [139] shadowtext_0.1.2
#> [140] rtracklayer_1.62.0
#> [141] scales_1.2.1
#> [142] caTools_1.18.2
#> [143] ChIPseeker_1.38.0
#> [144] monaLisa_1.8.0
#> [145] callr_3.7.3
#> [146] rappdirs_0.3.3
#> [147] stringr_1.5.0
#> [148] digest_0.6.33
#> [149] shinyBS_0.61.1
#> [150] rmarkdown_2.25
#> [151] XVector_0.42.0
#> [152] htmltools_0.5.6.1
#> [153] pkgconfig_2.0.3
#> [154] MatrixGenerics_1.14.0
#> [155] highr_0.10
#> [156] dbplyr_2.4.0
#> [157] fastmap_1.1.1
#> [158] rlang_1.1.1
#> [159] GlobalOptions_0.1.2
#> [160] htmlwidgets_1.6.2
#> [161] shiny_1.7.5.1
#> [162] jquerylib_0.1.4
#> [163] farver_2.1.1
#> [164] zoo_1.8-12
#> [165] jsonlite_1.8.7
#> [166] BiocParallel_1.36.0
#> [167] R.oo_1.25.0
#> [168] GOSemSim_2.28.0
#> [169] RCurl_1.98-1.13
#> [170] magrittr_2.0.3
#> [171] GenomeInfoDbData_1.2.11
#> [172] ggplotify_0.1.2
#> [173] patchwork_1.1.3
#> [174] munsell_0.5.0
#> [175] Rcpp_1.0.11
#> [176] ape_5.7-1
#> [177] viridis_0.6.4
#> [178] chk_0.9.1
#> [179] stringi_1.7.12
#> [180] ggraph_2.1.0
#> [181] zlibbioc_1.48.0
#> [182] MASS_7.3-60
#> [183] org.Hs.eg.db_3.18.0
#> [184] AnnotationHub_3.10.0
#> [185] plyr_1.8.9
#> [186] pkgbuild_1.4.2
#> [187] parallel_4.3.1
#> [188] HPO.db_0.99.2
#> [189] ggrepel_0.9.4
#> [190] CNEr_1.38.0
#> [191] Biostrings_2.70.1
#> [192] graphlayouts_1.0.2
#> [193] splines_4.3.1
#> [194] hms_1.1.3
#> [195] circlize_0.4.15
#> [196] BSgenome.Hsapiens.UCSC.hg38_1.4.5
#> [197] locfit_1.5-9.8
#> [198] ps_1.7.5
#> [199] igraph_1.5.1
#> [200] reshape2_1.4.4
#> [201] biomaRt_2.58.0
#> [202] stats4_4.3.1
#> [203] pkgload_1.3.3
#> [204] TFMPvalue_0.0.9
#> [205] BiocVersion_3.18.0
#> [206] XML_3.99-0.15
#> [207] evaluate_0.23
#> [208] BiocManager_1.30.22
#> [209] tzdb_0.4.0
#> [210] foreach_1.5.2
#> [211] tweenr_2.0.2
#> [212] httpuv_1.6.11
#> [213] networkD3_0.4
#> [214] graphics_4.3.1
#> [215] tidyr_1.3.0
#> [216] purrr_1.0.2
#> [217] polyclip_1.10-6
#> [218] clue_0.3-65
#> [219] ggplot2_3.4.4
#> [220] ggforce_0.4.1
#> [221] xtable_1.8-4
#> [222] restfulr_0.0.15
#> [223] tidytree_0.4.5
#> [224] MPO.db_0.99.7
#> [225] roxygen2_7.2.3
#> [226] later_1.3.1
#> [227] viridisLite_0.4.2
#> [228] TxDb.Hsapiens.UCSC.hg38.knownGene_3.18.0
#> [229] tibble_3.2.1
#> [230] aplot_0.2.2
#> [231] memoise_2.0.1
#> [232] AnnotationDbi_1.64.1
#> [233] GenomicAlignments_1.38.0
#> [234] IRanges_2.36.0
#> [235] stats_4.3.1
#> [236] cluster_2.1.4