Resources
Publicly Available Data
Some useful data are provided as resources. These resources are kept in its original format and are not modified. Please refer to the original publishing source for more information about the data.
These files name the chromosomes based on the UCSC genome browser convention. For users requiring the chromosomes to be named based on the Ensembl convention, we have also parsed the files for mm10. Chromosome names are available in the UCSC format and the Ensembl format.
All these files have been placed under utils/resources for convenience.
Compiled Resources
For convenience, some resources are compiled and provided here.
For interactively querying the biomaRt database in R, how different values are encoded in the database is presented here:
Normalizing reads when generating bigWig files is a common practice. An effective genome size is sometimes required. Depending on whether multi-mapping reads are included, the effective genome size can be calculated differently. DeepTools has provided some values to commonly used genomes. This information can be found here:
When multi-mapping reads are included, we use the non-N bases in the genome:
| Genome | Effective Genome Size |
|---|---|
| mm10/GRCm38 | 2652783500 |
| hg38/GRCh38 | 2913022398 |
When multi-mapping reads are excluded, we use the uniquely mappable genome size:
| Read Length | mm10/GRCm38 | hg38/GRCh38 |
|---|---|---|
| 50 | 2308125299 | 2701495711 |
| 75 | 2407883243 | 2747877702 |
| 100 | 2467481008 | 2805636231 |
| 150 | 2494787038 | 2862010428 |
| 200 | 2520868989 | 2887553103 |
| 250 | 2538590322 | 2898802627 |