RNA sequencing data analysis
RNA sequencing data analysis brings to light the intricate mechanisms of gene regulation.
Transcriptome-wide analyses of gene expression are extremely popular among researchers studying gene regulation in biological systems ranging from single cells to tissues and complex microbiomes. RNA-seq data allows for a wide range of analyses to address countless research questions across the fields of biology and biomedicine.
Below we present some of the most common analyses we perform on RNA-seq data. The explorative, differential expression and pathway analyses largely apply to other high-throughput expression data as well, such as expression microarray or proteomic data.
We hope that the examples below inspire you to appreciate just how rich the world of RNA-sequencing is. If you are planning an RNA-seq experiment and wish to learn how we can help you to get the most out of your data, leave us a message and we will book you a short call with our expert.
Leave us a short description of your bioinformatics needs and we will be in touch very soon!
Exploratory gene expression analysis
Every RNA-seq expression study incorporates an exploratory analysis. After the raw sequencing reads of an RNA-seq experiment have been quality controlled and gene counts derived, the data set is visualized using Principal Component Analysis (PCA) and expression heatmaps to unveil its general patterns. These visualizations help us answer questions such as:
- Do the biological replicates resemble each other with regards to their expression profiles?
- Do distinct sample groups (e.g., different tissues, treatments or time points) form separate clusters?
- Are there outlier samples?
Differential expression analysis
Differential expression analysis is a statistical comparison of two sample groups. It results in differential expression statistics for each detected transcript, such as the fold change and statistical significance. These statistics are typically visualized using a volcano plot. The genes which are found to be up- or down-regulated can be further visualized as heatmaps or boxplots, for instance.
As a statistical analysis, this phase of an expression study benefits from the statistical power brought by biological replicates. Three biological replicates per condition is a common “rule-of-thumb” minimum, but it only allows for reliable detection of genes with relatively large expression differences. With a careful experimental design and sufficient sample size, subtler differences can be detected and confounding factors controlled for.
Pathway analysis puts genes from a differential expression analysis into broader biological context. Simple pathway analyses compare the up- and down-regulated genes statistically to predetermined gene lists. These lists are annotated to biologically meaningful terms, such as a biological process, signaling pathway or a specific disease.
Such analyses may rely either on over-representation analysis or gene set enrichment analysis, which both result in a list of enriched gene sets with relevant statistics and annotations.
More mechanistic pathway analyses rely on experimentally validated interactions between genes. They enable identifying not just which pathways are represented by the differentially expressed genes, but also shed light on whether the pathways are activated or inhibited, and by which genes.
For the more avanced pathway analyses, we use Ingenuity Pathway Analysis (IPA, QIAGEN). IPA enables a wide range of in-depth analyses into known and novel gene regulatory networks.
For non-model organisms, and those with very dynamic genomes, i.e. microbes, we typically start RNA sequencing data analysis with assembling a transcriptome de novo and annotating it using homologues of related species and computational gene predictions.
A new reference transcriptome is an invaluable resource for your further research, and that of the entire research community. Once a high-quality reference transcriptome has been established, the door opens to most downstream analyses which are routinely used with model organisms.
Single-cell RNA-sequencing (scRNA-seq) experiments allow for cataloguing cell types and uncovering differentiation trajectories at a scale and resolution unmatched by bulk RNA sequencing.
Used particularly to study the composition and development of complex tissues, scRNA-seq data sets typically comprise thousands of individual cells. Most approaches used to analyze bulk RNA-seq data can be tailored for single-cell RNA-seq data as well.
MicroRNA data analysis
Small RNA-sequencing enables studying various species of short RNAs, and microRNAs in particular. MicroRNA-seq analysis is largely similar to that of mRNAs, but pathway and regulatory analyses make use of predicted and/or previously validated microRNA target genes.
Sequencing both mRNA and small RNA from the matched samples enables estimating the regulatory relationship between microRNAs and their targets. To identify genes subject to microRNA-mediated regulation in a given condition, argonaute CLIP-sequencing (and related protocols) can be employed.
Alternative splicing analysis
In addition to studying expression on the level of genes, RNA-sequencing allows for a more detailed view: splice-variant level expression. Reliable identification of alternative splicing events benefits from deeper sequencing than the typical gene-level expression analysis.
Depending on the quantity and quality of the data, alternative splicing analyses may focus on quantifying expression levels of known, previously annotated splice isoforms, or on detecting novel splicing events as well.
Fusion gene detection
In cancer, certain structural variants are known to cause fusion genes. Two separate genes fused together in the DNA may lead to a fusion transcript. The fusion transcript, in turn, may lead to a fusion protein with a novel, potentially cancer-driving combination of regulation and function.
Fusion genes can be detected from RNA-sequencing data with tools that identify and analyze discordantly mapping RNA-seq reads or read pairs.
Performing RNA-seq and epigenomic sequencing (such as ChIP or ATAC-seq) on the same samples enables integrative analyses to study gene regulatory programs genome-wide.
Regulatory connections can be identified between enhancers and their target genes, as well as transcription factors and their targets, building on evidence from both gene expression and the epigenomic status of regulatory elements.
Meet some of our transcriptomics experts
As a scientist, I specialize in cellular differentiation and RNA biology. I have been studying the interplay between transcriptional regulators and non-coding RNAs in a multitude of biomedical contexts, including mesenchymal stem cell differentiation, endothelial cell differentiation, atherosclerosis and leukemia.
From the methodological perspective, I have been trained as a comprehensive systems biologist and generated NGS datasets myself in my research projects. I am experienced in analyzing sequencing data from RNA (mRNA-seq, short RNA-seq, scRNA-seq, GRO-seq, TT-seq) and DNA libraries (ChIP-seq, ATAC-seq, CITE-seq) and in integrating different data modalities to gain ever deeper insight into complex systems.
I specialize in gene and genome regulation, particularly in immunology, cancer research, DNA repair and cellular senescence.
For over 10 years, I have developed and applied computational pipelines to analyze data from transcriptomic and epigenomic sequencing assays, including scRNA-seq, scATAC-seq, spatial transcriptomics, ChIP-seq, RNA-seq, GRO-seq, ATAC-seq, CAGE-seq, XR-seq, DRIP-seq, BLISS-seq, Damage-seq, INI-seq, and HiC.
I have enjoyed working in multidisciplinary teams — as a bioinformatician, postdoc researcher, head of a single cell NGS bioinformatics facility and, most recently, as a project manager at Genevia.
I am a senior bioinformatician with extensive experience in analysing data types including mRNA-seq, smallRNA-seq, DNA-seq, ChIP-seq, proteomics and methylation data.
During the years working on customer projects I have gained experience in e.g. gene and protein expression analysis, germline and somatic variant analysis, genome-wide association studies and polygenic risk score analysis, population genomics, epigenetic analysis, using public ‘omics datasets such as TCGA and applying machine learning methods for patient prognosis.
I am also very experienced in leveraging the various pathway and gene-regulatory analyses of the Ingenuity Pathway Analysis (IPA) software. In my data analysis role, I also benefit from my 10+ years of experience as a wet-lab molecular biologist as well as my experience in biocuration and database content creation.
The examples above showcased some of the more typical computational analyses with RNA-seq data. The opportunitites with RNA-seq, however, are nearly endless, as is our team's experience!
Learn more about RNA-seq data analysis
References and customer cases
Selected publications from our customers
- Singh, A. et al. (2022). Urolithin A improves muscle strength, exercise performance, and biomarkers of mitochondrial health in a randomized trial in middle-aged adults. Cell reports. Medicine, 3(5), 100633. https://doi.org/10.1016/j.xcrm.2022.100633
- Pihlström, S. et al. (2022). A multi-omics study to characterize the transdifferentiation of human dermal fibroblasts to osteoblast-like cells. Frontiers in molecular biosciences, 9, 1032026. https://doi.org/10.3389/fmolb.2022.1032026
- Tusup, M. et al. (2022). Epitranscriptomics modifier ...* indirectly triggers Toll-like receptor 3 and can enhance immune infiltration in tumors. Molecular therapy : the journal of the American Society of Gene Therapy, 30(3), 1163–1170. https://doi.org/10.1016/j.ymthe.2021.09.022
- Cramer, M. et al. (2022). Transcriptomic Regulation of Macrophages by Matrix-Bound Nanovesicle-Associated Interleukin-33. Tissue engineering. Part A, 28(19-20), 867–878. https://doi.org/10.1089/ten.TEA.2022.0006
- Pommergaard, H. C. et al. (2022). Aldehyde dehydrogenase expression may be a prognostic biomarker and associated with liver cirrhosis in patients resected for hepatocellular carcinoma. Surgical oncology, 40, 101677. https://doi.org/10.1016/j.suronc.2021.101677
- Song, J. et al. (2022). The ubiquitin-ligase TRAF6 and TGFβ type I receptor form a complex with Aurora kinase B contributing to mitotic progression and cytokinesis in cancer cells. EBioMedicine, 82, 104155. https://doi.org/10.1016/j.ebiom.2022.104155
- Martins, R. R. et al. (2022). Trancriptomic signatures of telomerase-dependent and -independent ageing, in the zebrafish gut and brain. bioRxiv 2022.05.24.493215; doi: 101677. https://doi.org/10.1101/2022.05.24.493215
- Kundu, S. et al. (2021). Common and mutation specific phenotypes of KRAS and BRAF mutations in colorectal cancer cells revealed by integrative -omics analysis. Journal of experimental & clinical cancer research : CR, 40(1), 225. https://doi.org/10.1186/s13046-021-02025-2
- Pommergaard, H. C. et al. (2021). Peroxisome proliferator-activated receptor activity correlates with poor survival in patients resected for hepatocellular carcinoma. Journal of hepato-biliary-pancreatic sciences, 28(4), 327–335. https://doi.org/10.1002/jhbp.745
- Lehto, T. K. et al. (2021). Transcript analysis of commercial prostate cancer risk stratification panels in hard-to-predict grade group 2-4 prostate cancers. The Prostate, 81(7), 368–376. https://doi.org/10.1002/pros.24108
- Hussey, G. S. et al. (2020). Lipidomics and RNA sequencing reveal a novel subpopulation of nanovesicle within extracellular matrix biomaterials. Science advances, 6(12), eaay4361. https://doi.org/10.1126/sciadv.aay4361
- Oksanen, M. et al. (2020). NF-E2-related factor 2 activation boosts antioxidant defenses and ameliorates inflammatory and amyloid properties in human Presenilin-1 mutated Alzheimer's disease astrocytes. Glia, 68(3), 589–599. https://doi.org/10.1002/glia.23741
- Lemke, P. et al. (2020). Transcriptome Analysis of Solanum Tuberosum Genotype RH89-039-16 in Response to Chitosan. Frontiers in plant science, 11, 1193. https://doi.org/10.3389/fpls.2020.01193
- Tiihonen, J. et al. (2020). Neurobiological roots of psychopathy. Molecular psychiatry, 25(12), 3432–3441. https://doi.org/10.1038/s41380-019-0488-z
- Gurvich, O. L. et al. (2020). Transcriptomics uncovers substantial variability associated with alterations in manufacturing processes of macrophage cell therapy products. Scientific reports, 10(1), 14049. https://doi.org/10.1038/s41598-020-70967-2
- Gabriel, M. et al. (2020). A relational database to identify differentially expressed genes in the endometrium and endometriosis lesions. Scientific data, 7(1), 284. https://doi.org/10.1038/s41597-020-00623-x
- Tiihonen, J. et al. (2019). Sex-specific transcriptional and proteomic signatures in schizophrenia. Nature communications, 10(1), 3933. https://doi.org/10.1038/s41467-019-11797-3
- Tarkkonen, K et al. (2017). Comparative analysis of osteoblast gene expression profiles and Runx2 genomic occupancy of mouse and human osteoblasts in vitro. Gene, 626, 119–131. https://doi.org/10.1016/j.gene.2017.05.028
- Sugano, Y. et al. (2017). Comparative transcriptomic analysis identifies evolutionarily conserved gene products in the vertebrate renal distal convoluted tubule. Pflugers Archiv : European journal of physiology, 469(7-8), 859–867. https://doi.org/10.1007/s00424-017-2009-8
* Names of pharmaceuticals removed to comply with regulation in certain countries
Selected publications from our team
- Korvenlaita, N. et al. (2023). Dynamic release of neuronal extracellular vesicles containing miR-21a-5p is induced by hypoxia. Journal of extracellular vesicles, 12(1), e12297. https://doi.org/10.1002/jev2.12297https://doi.org/10.1002/jev2.12297
- Saralahti, A. K. et al. (2023). Characterization of the innate immune response to Streptococcus pneumoniae infection in zebrafish. PLoS genetics, 19(1), e1010586. Advance online publication. https://doi.org/10.1371/journal.pgen.1010586
- Armaka, M. et al. (2022). Single-cell multimodal analysis identifies common regulatory programs in synovial fibroblasts of rheumatoid arthritis patients and modeled TNF-driven arthritis. Genome medicine, 14(1), 78. https://doi.org/10.1186/s13073-022-01081-3
- Pham, T. et al. (2022). Modeling human extraembryonic mesoderm cells using naive pluripotent stem cells. Cell stem cell, 29(9), 1346–1365.e10. https://doi.org/10.1016/j.stem.2022.08.001
- Zijlmans, D. W. et al. (2022). Integrated multi-omics reveal polycomb repressive complex 2 restricts human trophoblast induction. Nature cell biology, 24(6), 858–871. https://doi.org/10.1038/s41556-022-00932-w
- Fanourgakis, S. et al. (2022). Histone H2Bub1 dynamics in the 5' region of active genes are tightly linked to the UV-induced transcriptional response. Comput. Struct. Biotechnol. J. https://doi.org/10.1016/j.csbj.2022.12.013
- Smith, C. et al. (2022). A comparative transcriptomic analysis of ...* n-like peptide-1 receptor- and glucose-dependent insulinotropic polypeptide-expressing cells in the hypothalamus. Appetite, 174, 106022. https://doi.org/10.1016/j.appet.2022.106022
- Cao, S. et al. (2022). Estimation of tumor cell total mRNA expression in 15 cancer types predicts disease progression. Nature biotechnology, 10.1038/s41587-022-01342-x. Advance online publication. https://doi.org/10.1038/s41587-022-01342-x
- Rodriguez-Martinez, A. et al. (2022). Novel ZNF414 activity characterized by integrative analysis of ChIP-exo, ATAC-seq and RNA-seq data. Biochimica et biophysica acta. Gene regulatory mechanisms, 1865(3), 194811. Advance online publication. https://doi.org/10.1016/j.bbagrm.2022.194811
- Aakula, A. et al. (2022). RAS and PP2A activities converge on epigenetic gene regulation. bioRxiv 2022.05.11.491459; doi: https://doi.org/10.1101/2022.05.11.491459
Detsika, M. G., et al. (2022) Upregulation of CD55 complement regulator in distinct PBMC subpopulations of COVID-19 patients is associated with suppression of interferon responses. bioRxiv 2022.10.07.510750; doi: https://doi.org/10.1101/2022.10.07.510750
Roos, K. et al. (2022). Single-cell RNA-seq analysis and cell-cluster deconvolution of the human preovulatory follicular fluid cells provide insights into the pathophysiology of ovarian hyporesponse. Frontiers in endocrinology, 13, 945347. https://doi.org/10.3389/fendo.2022.945347
Pellegrinelli, V. et al. (2022). Dysregulation of macrophage PEPD in obesity determines adipose tissue fibro-inflammation and insulin resistance. Nature metabolism, 4(4), 476–494. https://doi.org/10.1038/s42255-022-00561-5
Kukkonen, K. et al. (2022). Nonmalignant AR-positive prostate epithelial cells and cancer cells respond differently to androgen. Endocrine-related cancer, 29(12), 717–733. https://doi.org/10.1530/ERC-22-0108
Georgolopoulos, G. et al. (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation. Nature communications, 12(1), 6790. https://doi.org/10.1038/s41467-021-27159-x
- Taavitsainen, S. et al. (2021). Single-cell ATAC and RNA sequencing reveal pre-existing and persistent cells associated with prostate cancer relapse. Nature communications, 12(1), 5307. https://doi.org/10.1038/s41467-021-25624-1
- Linna-Kuosmanen, S. et al. (2021). NRF2 is a key regulator of endothelial microRNA expression under proatherogenic stimuli. Cardiovascular research, 117(5), 1339–1357. https://doi.org/10.1093/cvr/cvaa219
- Moreau, P. R. et al. (2021). Profiling of Primary and Mature miRNA Expression in Atherosclerosis-Associated Cell Types. Arteriosclerosis, thrombosis, and vascular biology, 41(7), 2149–2167. https://doi.org/10.1161/ATVBAHA.121.315579
- Zannikou, M. et al. (2021). MAP3K8 Regulates Cox-2-Mediated Prostaglandin E2 Production in the Lung and Suppresses Pulmonary Inflammation and Fibrosis. Journal of immunology (Baltimore, Md. : 1950), 206(3), 607–620. https://doi.org/10.4049/jimmunol.2000862
- Carobbio, S. et al. (2021). Unraveling the Developmental Roadmap toward Human Brown Adipose Tissue. Stem cell reports, 16(3), 641–655. https://doi.org/10.1016/j.stemcr.2021.01.013
- Filppu, P. et al. (2021). CD109-GP130 interaction drives glioblastoma stem cell plasticity and chemoresistance through STAT3 activity. JCI insight, 6(9), e141486. https://doi.org/10.1172/jci.insight.141486
- Verta, J. P. et al. (2021). Genetic Drift Dominates Genome-Wide Regulatory Evolution Following an Ancient Whole-Genome Duplication in Atlantic Salmon. Genome biology and evolution, 13(5), evab059. https://doi.org/10.1093/gbe/evab059
- Alvarez-Guaita, A. et al. (2021). Phenotypic characterization of Adig null mice suggests roles for adipogenin in the regulation of fat mass accrual and leptin secretion. Cell reports, 34(10), 108810. https://doi.org/10.1016/j.celrep.2021.108810
- Hall, Z. et al. (2021). Lipid Remodeling in Hepatocyte Proliferation and Hepatocellular Carcinoma. Hepatology (Baltimore, Md.), 73(3), 1028–1044. https://doi.org/10.1002/hep.31391
- Viana, J. et al. (2020). ...* -induced transcriptional changes in the zebrafish brain. NPJ schizophrenia, 6(1), 3. https://doi.org/10.1038/s41537-019-0092-x
- Liakos, A. et al. (2020). Continuous transcription initiation guarantees robust repair of all transcribed genes and regulatory regions. Nature communications, 11(1), 916. https://doi.org/10.1038/s41467-020-14566-9
- Harjula, S. E. et al. (2020). Characterization of immune response against Mycobacterium marinum infection in the main hematopoietic organ of adult zebrafish (Danio rerio). Developmental and comparative immunology, 103, 103523. https://doi.org/10.1016/j.dci.2019.103523
- Morianos, I. et al. (2020). Activin-A limits Th17 pathogenicity and autoimmune neuroinflammation via CD39 and CD73 ectonucleotidases and Hif1-α-dependent pathways. Proceedings of the National Academy of Sciences of the United States of America, 117(22), 12269–12280. https://doi.org/10.1073/pnas.1918196117
- Mehtonen, J. et al. (2020). Single cell characterization of B-lymphoid differentiation and leukemic cell states during chemotherapy in ETV6-RUNX1-positive pediatric leukemia identifies drug-targetable transcription factor activities. Genome medicine, 12(1), 99. https://doi.org/10.1186/s13073-020-00799-2
- Lu, Y. et al. (2020). Interleukin-33 Signaling Controls the Development of Iron-Recycling Macrophages. Immunity, 52(5), 782–793.e5. https://doi.org/10.1016/j.immuni.2020.03.006
- Viiri, L. E. et al. (2019). Extensive reprogramming of the nascent transcriptome during iPSC to hepatocyte differentiation. Scientific reports, 9(1), 3562. https://doi.org/10.1038/s41598-019-39215-0
- Pölönen, P. et al. (2019). Hemap: An Interactive Online Resource for Characterizing Molecular Phenotypes across Hematologic Malignancies. Cancer research, 79(10), 2466–2479. https://doi.org/10.1158/0008-5472.CAN-18-2970
- Adriaenssens, A. E. et al. (2019). Glucose-Dependent Insulinotropic Polypeptide Receptor-Expressing Cells in the Hypothalamus Regulate Food Intake. Cell metabolism, 30(5), 987–996.e6. https://doi.org/10.1016/j.cmet.2019.07.013
- Roberts, G. P. et al. (2019). Comparison of Human and Murine Enteroendocrine Cells by Transcriptomic and Peptidomic Profiling. Diabetes, 68(5), 1062–1072. https://doi.org/10.2337/db18-0883
- Moreau, P. R. et al. (2018). Transcriptional Profiling of Hypoxia-Regulated Non-coding RNAs in Human Primary Endothelial Cells. Frontiers in cardiovascular medicine, 5, 159. https://doi.org/10.3389/fcvm.2018.00159
- Bouvy-Liivrand, M. et al. (2017). Analysis of primary microRNA loci from nascent transcriptomes reveals regulatory domains governed by chromatin architecture. Nucleic acids research, 45(17), 9837–9849. https://doi.org/10.1093/nar/gkx680
- Lavigne, M. D. et al. (2017). Global unleashing of transcription elongation waves in response to genotoxic stress restricts somatic mutation rate. Nature communications, 8(1), 2076. https://doi.org/10.1038/s41467-017-02145-4
- Sin, C. et al. (2016). Quantitative assessment of ribosome drop-off in E. coli. Nucleic acids research, 44(6), 2528–2537. https://doi.org/10.1093/nar/gkw137
* Names of pharmaceuticals removed to comply with regulation in certain countries
Leave your email address here with a brief description of your needs, and we will contact you to get things moving forward!