RNA sequencing data analysis

RNA sequencing data analysis brings to light the intricate mechanisms of gene regulation.

Transcriptome-wide analyses of gene expression are extremely popular among researchers studying gene regulation in biological systems ranging from single cells to tissues and complex microbiomes. RNA-seq data allows for a wide range of analyses to address countless research questions across the fields of biology and biomedicine.

Below we present some of the most common analyses we perform on RNA-seq data. The explorative, differential expression and pathway analyses largely apply to other high-throughput expression data as well, such as expression microarray or proteomic data.

We hope that the examples below inspire you to appreciate just how rich the world of RNA-sequencing is. If you are planning an RNA-seq experiment and wish to learn how we can help you to get the most out of your data, leave us a message and we will book you a short call with our expert.

Leave us a short description of your bioinformatics needs and we will be in touch very soon!

Exploratory gene expression analysis

Every RNA-seq expression study incorporates an exploratory analysis. After the raw sequencing reads of an RNA-seq experiment have been quality controlled and gene counts derived, the data set is visualized using Principal Component Analysis (PCA) and expression heatmaps to unveil its general patterns. These visualizations help us answer questions such as:

  • Do the biological replicates resemble each other with regards to their expression profiles?
  • Do distinct sample groups (e.g., different tissues, treatments or time points) form separate clusters?
  • Are there outlier samples?

Differential expression analysis

Differential expression analysis is a statistical comparison of two sample groups. It results in differential expression statistics for each detected transcript, such as the fold change and statistical significance. These statistics are typically visualized using a volcano plot. The genes which are found to be up- or down-regulated can be further visualized as heatmaps or boxplots, for instance.

As a statistical analysis, this phase of an expression study benefits from the statistical power brought by biological replicates. Three biological replicates per condition is a common “rule-of-thumb” minimum, but it only allows for reliable detection of genes with relatively large expression differences. With a careful experimental design and sufficient sample size, subtler differences can be detected and confounding factors controlled for.

Pathway analysis

Pathway analysis puts genes from a differential expression analysis into broader biological context. Simple pathway analyses compare the up- and down-regulated genes statistically to predetermined gene lists. These lists are annotated to biologically meaningful terms, such as a biological process, signaling pathway or a specific disease.

Such analyses may rely either on over-representation analysis or gene set enrichment analysis, which both result in a list of enriched gene sets with relevant statistics and annotations.

More mechanistic pathway analyses rely on experimentally validated interactions between genes. They enable identifying not just which pathways are represented by the differentially expressed genes, but also shed light on whether the pathways are activated or inhibited, and by which genes.

For the more avanced pathway analyses, we use Ingenuity Pathway Analysis (IPA, QIAGEN). IPA enables a wide range of in-depth analyses into known and novel gene regulatory networks.

Transcriptome assembly

For non-model organisms, and those with very dynamic genomes, i.e. microbes, we typically start RNA sequencing data analysis with assembling a transcriptome de novo and annotating it using homologues of related species and computational gene predictions.

A new reference transcriptome is an invaluable resource for your further research, and that of the entire research community. Once a high-quality reference transcriptome has been established, the door opens to most downstream analyses which are routinely used with model organisms.

Single-cell expression analysis

Single-cell RNA-sequencing (scRNA-seq) experiments allow for cataloguing cell types and uncovering differentiation trajectories at a scale and resolution unmatched by bulk RNA sequencing.

Used particularly to study the composition and development of complex tissues, scRNA-seq data sets typically comprise thousands of individual cells. Most approaches used to analyze bulk RNA-seq data can be tailored for single-cell RNA-seq data as well.

MicroRNA data analysis

Small RNA-sequencing enables studying various species of short RNAs, and microRNAs in particular. MicroRNA-seq analysis is largely similar to that of mRNAs, but pathway and regulatory analyses make use of predicted and/or previously validated microRNA target genes.

Sequencing both mRNA and small RNA from the matched samples enables estimating the regulatory relationship between microRNAs and their targets. To identify genes subject to microRNA-mediated regulation in a given condition, argonaute CLIP-sequencing (and related protocols) can be employed.

Alternative splicing analysis

In addition to studying expression on the level of genes, RNA-sequencing allows for a more detailed view: splice-variant level expression. Reliable identification of alternative splicing events benefits from deeper sequencing than the typical gene-level expression analysis.

Depending on the quantity and quality of the data, alternative splicing analyses may focus on quantifying expression levels of known, previously annotated splice isoforms, or on detecting novel splicing events as well.

Fusion gene detection

In cancer, certain structural variants are known to cause fusion genes. Two separate genes fused together in the DNA may lead to a fusion transcript. The fusion transcript, in turn, may lead to a fusion protein with a novel, potentially cancer-driving combination of regulation and function.

Fusion genes can be detected from RNA-sequencing data with tools that identify and analyze discordantly mapping RNA-seq reads or read pairs.

Learn more

The examples above showcased some of the more typical computational analyses with RNA-seq data. The opportunitites with RNA-seq, however, are nearly endless, as is our team's experience!

Learn more about RNA-seq data analysis

References and customer cases

Selected publications from our customers

  • Singh, A. et al. (2022). Urolithin A improves muscle strength, exercise performance, and biomarkers of mitochondrial health in a randomized trial in middle-aged adults. Cell reports. Medicine, 3(5), 100633. https://doi.org/10.1016/j.xcrm.2022.100633
  • Tusup, M. et al. (2022). Epitranscriptomics modifier *** indirectly triggers Toll-like receptor 3 and can enhance immune infiltration in tumors. Molecular therapy : the journal of the American Society of Gene Therapy, 30(3), 1163–1170. https://doi.org/10.1016/j.ymthe.2021.09.022
  • Pommergaard, H. C. et al. (2022). Aldehyde dehydrogenase expression may be a prognostic biomarker and associated with liver cirrhosis in patients resected for hepatocellular carcinoma. Surgical oncology, 40, 101677. https://doi.org/10.1016/j.suronc.2021.101677
  • Pommergaard, H. C. et al. (2021). Peroxisome proliferator-activated receptor activity correlates with poor survival in patients resected for hepatocellular carcinoma. Journal of hepato-biliary-pancreatic sciences, 28(4), 327–335. https://doi.org/10.1002/jhbp.745
  • Lehto, T. K. et al. (2021). Transcript analysis of commercial prostate cancer risk stratification panels in hard-to-predict grade group 2-4 prostate cancers. The Prostate, 81(7), 368–376. https://doi.org/10.1002/pros.24108
  • Hussey, G. S. et al. (2020). Lipidomics and RNA sequencing reveal a novel subpopulation of nanovesicle within extracellular matrix biomaterials. Science advances, 6(12), eaay4361. https://doi.org/10.1126/sciadv.aay4361
  • Oksanen, M. et al. (2020). NF-E2-related factor 2 activation boosts antioxidant defenses and ameliorates inflammatory and amyloid properties in human Presenilin-1 mutated Alzheimer's disease astrocytes. Glia, 68(3), 589–599. https://doi.org/10.1002/glia.23741
  • Lemke, P. et al. (2020). Transcriptome Analysis of Solanum Tuberosum Genotype RH89-039-16 in Response to Chitosan. Frontiers in plant science, 11, 1193. https://doi.org/10.3389/fpls.2020.01193
  • Tiihonen, J. et al. (2020). Neurobiological roots of psychopathy. Molecular psychiatry, 25(12), 3432–3441. https://doi.org/10.1038/s41380-019-0488-z
  • Gurvich, O. L. et al. (2020). Transcriptomics uncovers substantial variability associated with alterations in manufacturing processes of macrophage cell therapy products. Scientific reports, 10(1), 14049. https://doi.org/10.1038/s41598-020-70967-2
  • Gabriel, M. et al. (2020). A relational database to identify differentially expressed genes in the endometrium and endometriosis lesions. Scientific data, 7(1), 284. https://doi.org/10.1038/s41597-020-00623-x
  • Tiihonen, J. et al. (2019). Sex-specific transcriptional and proteomic signatures in schizophrenia. Nature communications, 10(1), 3933. https://doi.org/10.1038/s41467-019-11797-3
  • Tarkkonen, K et al. (2017). Comparative analysis of osteoblast gene expression profiles and Runx2 genomic occupancy of mouse and human osteoblasts in vitro. Gene, 626, 119–131. https://doi.org/10.1016/j.gene.2017.05.028
  • Sugano, Y. et al. (2017). Comparative transcriptomic analysis identifies evolutionarily conserved gene products in the vertebrate renal distal convoluted tubule. Pflugers Archiv : European journal of physiology, 469(7-8), 859–867. https://doi.org/10.1007/s00424-017-2009-8

Selected publications from our team

  • Rodriguez-Martinez, A. et al. (2022). Novel ZNF414 activity characterized by integrative analysis of ChIP-exo, ATAC-seq and RNA-seq data. Biochimica et biophysica acta. Gene regulatory mechanisms, 1865(3), 194811. Advance online publication. https://doi.org/10.1016/j.bbagrm.2022.194811
  • Taavitsainen, S. et al. (2021). Single-cell ATAC and RNA sequencing reveal pre-existing and persistent cells associated with prostate cancer relapse. Nature communications, 12(1), 5307. https://doi.org/10.1038/s41467-021-25624-1
  • Armaka M. et al. (2021). Single-cell chromatin and transcriptome dynamics of Synovial Fibroblasts transitioning from homeostasis to pathology in modelled TNF-driven arthritis. bioRxiv 2021.08.27.457747. doi: https://doi.org/10.1101/2021.08.27.457747
  • Linna-Kuosmanen, S. et al. (2021). NRF2 is a key regulator of endothelial microRNA expression under proatherogenic stimuli. Cardiovascular research, 117(5), 1339–1357. https://doi.org/10.1093/cvr/cvaa219
  • Moreau, P. R. et al. (2021). Profiling of Primary and Mature miRNA Expression in Atherosclerosis-Associated Cell Types. Arteriosclerosis, thrombosis, and vascular biology, 41(7), 2149–2167. https://doi.org/10.1161/ATVBAHA.121.315579
  • Zannikou, M. et al. (2021). MAP3K8 Regulates Cox-2-Mediated Prostaglandin E2 Production in the Lung and Suppresses Pulmonary Inflammation and Fibrosis. Journal of immunology (Baltimore, Md. : 1950), 206(3), 607–620. https://doi.org/10.4049/jimmunol.2000862
  • Filppu, P. et al. (2021). CD109-GP130 interaction drives glioblastoma stem cell plasticity and chemoresistance through STAT3 activity. JCI insight, 6(9), e141486. https://doi.org/10.1172/jci.insight.141486
  • Verta, J. P. et al. (2021). Genetic Drift Dominates Genome-Wide Regulatory Evolution Following an Ancient Whole-Genome Duplication in Atlantic Salmon. Genome biology and evolution, 13(5), evab059. https://doi.org/10.1093/gbe/evab059
  • Liakos, A. et al. (2020). Continuous transcription initiation guarantees robust repair of all transcribed genes and regulatory regions. Nature communications, 11(1), 916. https://doi.org/10.1038/s41467-020-14566-9
  • Harjula, S. E. et al. (2020). Characterization of immune response against Mycobacterium marinum infection in the main hematopoietic organ of adult zebrafish (Danio rerio). Developmental and comparative immunology, 103, 103523. https://doi.org/10.1016/j.dci.2019.103523
  • Morianos, I. et al. (2020). Activin-A limits Th17 pathogenicity and autoimmune neuroinflammation via CD39 and CD73 ectonucleotidases and Hif1-α-dependent pathways. Proceedings of the National Academy of Sciences of the United States of America, 117(22), 12269–12280. https://doi.org/10.1073/pnas.1918196117
  • Mehtonen, J. et al. (2020). Single cell characterization of B-lymphoid differentiation and leukemic cell states during chemotherapy in ETV6-RUNX1-positive pediatric leukemia identifies drug-targetable transcription factor activities. Genome medicine, 12(1), 99. https://doi.org/10.1186/s13073-020-00799-2
  • Viiri, L. E. et al. (2019). Extensive reprogramming of the nascent transcriptome during iPSC to hepatocyte differentiation. Scientific reports, 9(1), 3562. https://doi.org/10.1038/s41598-019-39215-0
  • Pölönen, P. et al. (2019). Hemap: An Interactive Online Resource for Characterizing Molecular Phenotypes across Hematologic Malignancies. Cancer research, 79(10), 2466–2479. https://doi.org/10.1158/0008-5472.CAN-18-2970
  • Moreau, P. R. et al. (2018). Transcriptional Profiling of Hypoxia-Regulated Non-coding RNAs in Human Primary Endothelial Cells. Frontiers in cardiovascular medicine, 5, 159. https://doi.org/10.3389/fcvm.2018.00159
  • Bouvy-Liivrand, M. et al. (2017). Analysis of primary microRNA loci from nascent transcriptomes reveals regulatory domains governed by chromatin architecture. Nucleic acids research, 45(17), 9837–9849. https://doi.org/10.1093/nar/gkx680
  • Lavigne, M. D. et al. (2017). Global unleashing of transcription elongation waves in response to genotoxic stress restricts somatic mutation rate. Nature communications, 8(1), 2076. https://doi.org/10.1038/s41467-017-02145-4

Browse all

Contact us

Leave your email address here with a brief description of your needs, and we will contact you to get things moving forward!

Antti Ylipää
Antti Ylipää CEO, co-founder Genevia Technologies Oy +358 40 747 7672


Genevia RNA-Seq Bioinformatics Grant 2022