Single-cell RNA sequencing data analysis

Single-cell RNA sequencing enables cataloging and studying cellular identities at a scale and resolution unmatched by bulk sequencing.

Single-cell RNA sequencing (scRNA-seq) is one of the most rapidly advancing and diversifying technologies in molecular biology. The ability to study gene expression on the resolution of single cells has been as transformative as the advent of bulk RNA-sequencing previously.

In addition to single-cell RNA-seq, a number of other next-generation sequencing (NGS) -based assays have been adapted to single-cell protocols. These include genomic, proteomic and epigenetic assays, notably single-cell ATAC-sequencing, which is commonly performed in conjunction with scRNA-seq.

Platforms and protocols for scRNA-seq vary in their throughput (number of cells) and transcript coverage (3'/5' tag-based vs whole-transcript). Our team has experience working with several technologies, such as 10X Genomics, Drop-Seq, BD Rhapsody system and protocols of the CEL-Seq and Smart-Seq families.

Here we present typical single-cell analyses, focusing on scRNA-seq but covering also its integration with other common single-cell assays. We also list single-cell papers that we have published.

Leave us a short description of your bioinformatics needs and we will be in touch very soon!

Quality control and preprocessing

Like with any NGS data, the analysis of single-cell sequencing data starts with quality control and preprocessing.

Raw sequencing reads are quality-tested and metrics such as cell quality, accuracy, and diversity are generated. Reads are then aligned to an applicable reference genome or transcriptome, and additional metrics such as the number of cells, reads per cell, genes per cell, sequencing saturation and fraction of mitochondrial transcripts are plotted and inspected.

These QC metrics inform us about the total quality of the libraries and the usability of the samples and enable identifying and removing low-quality cells.

Further preprocessing is often carried out to remove unwanted signal, or noise, from certain downstream analyses. These include

imputation to estimate read counts for dropouts, or genes with zero transcripts due to technical, rather than biological, reasons,
normalization to remove biases due to e.g., differences in cell sizes, and
reducing the data to representative variables such as highly-variable genes or principal components.

Exploratory analysis

Preprocessed single-cell RNA-seq data is clustered to identify groups of similar cells and visualized using non-linear dimensionality reduction algorithms such tSNE and UMAP and correlation heatmaps to unveil general patterns of cell heterogeneity.

These visualizations help us answer technical questions such as:

Do the biological replicates resemble each other?
Are there outlier samples or cells?
Are the cell clusters distinct?

...and biological questions such as:

How heterogeneous are the underlying cell types/states?
Do distinct samples (e.g., different tissues, treatments or time points) form separate clusters?

Cell type identification

Identifying and characterizing cell types (and more refined cell states) is the most central part of most single-cell projects.

It all starts with identifying features (e.g., genes, proteins, accessible regions) that are specific to each cell cluster. These markers are defined by differential expression (DE) comparison of each cell cluster and the remaining ones, yielding DE statistics such as fold change and statistical significance.

The cluster markers can be visualized using scatter plots, violin plots, and heatmaps.

Markers are further annotated to biologically meaningful terms, such as a biological processes, signaling pathways or a specific disease. Such analyses may rely either on over-representation analysis or gene set enrichment analysis, which both result in a list of enriched gene sets with relevant statistics and annotations.

Single-cell datasets are typically also integrated with publicly available data in order to exploit the cell-type information from already annotated datasets or cell atlases. This enables transferring cell labels into the analyzed dataset.

The transferred cell labels and identified markers and their annotations are used, together with prior information on cell-type/state markers, to identify the captured cell types.

Trajectory analysis

In addition to characterizing distinct cellular identities, single-cell data lends itself to identifying continuums of gradual change in cell state, or trajectories. Uncovering such continuums is also called pseudotime analysis — while all cells are sampled at the same time point, individual cells may represent different stages in a temporal process such as differentiation.

De novo reconstruction of lineage differentiation and cell maturation trajectories allow exploring cellular dynamics, delineation of cell developmental lineages, and characterization of transition between cell states along a latent pseudotime dimension.

An ensemble of trajectory inference algorithms may be used for robust identification of root and terminal cellular states, branching points, and lineages. Single cells are ranked across deterministic or probabilistic lineages, and their ranking indicates their progression in a dynamic process of interest.

This type of analysis may also utilize the ratio of processed and unprocessed transcripts to infer whether a gene's expression is increasing or decreasing in a given cell. Combining this information from all quantified genes at a given state enables inferring the direction and pace of change in states. This is called RNA velocity analysis.

Integrative single-cell analyses

Integrative single-cell analyses bring different datasets, including different data types and species together. This enables more accurate and detailed cell labeling and mechanistic insight into gene regulation in the studied system. Such analyses rely on common properties, or anchors, between the datasets, such as matched features (e.g., genes or homologues) or matched cells.

Integrating multiple single-cell RNA-seq datasets

Perhaps the most common integration of single-cell datasets takes place between scRNA-seq datasets from different sources or technology platforms. Using genes as anchors, a successful integration removes the technical bias while retaining biological variance of the datasets.

Combining different scRNA-seq datasets is particularly helpful when there is a well-characterized public expression atlas available for a relevant tissue or organism.

Integrating single-cell RNA-seq and epigenomics

Integrating single-cell RNA-seq data with single-cell ATAC-seq or single-cell methylation data often relies on matched cells as anchors (when the measurements derive from the same cells as in, e.g., 10X Genomics Multiome technology).

Combining expression data with chromatin accessibility or methylation profiles enables more robust identification of cell types and allows for quantifying the effect of chromatin state to expression in individual cell types.

Integrating single-cell RNA-seq and proteomics

Since proteins, rather than transcripts, are key drivers of cellular functions, single-cell proteomics complements scRNA-seq experiments with more accurate estimates of cells functional states.

Single-cell proteomic profiling (CITE-seq, flow cytometry, mass cytometry, and mass spectrometry) comes in different degrees of throughput (number of proteins quantified) and can be targeted specifically to surface proteins, as in CITE-seq which involves a panel surface proteins quantified from cells with matched scRNA-seq reads.

Surface proteins are particularly useful in cell type identification, while the inclusion of cytosolic proteins enable better characterization of pathway and gene-regulatory activities.

Cross-species integration

Cross-species integrative analysis enables the identification of cell-type phylogenies that define the relationships of evolutionary and developmental mechanisms between different organisms. Shared homologues are used as anchors in cross-species integration.

This is particularly helpful when a disease/organ is better characterized on a single-cell resolution in an animal model than in human.

Ligand-receptor analysis

Ligand-receptor (LR) analysis uncovers cell-cell interactions that coordinate homeostasis, development, and other system-level functions. Changes and dysfunction in such interactions may go unnoticed in an analysis limited to the internal state of individual cells or cell types.

Ligand-receptor analysis identifies and quantifies intercellular interactions based on the expression of known receptors and their ligands. The interactions may take place within or between tissues, and the strength of this interaction is compared between biological conditions of interest, such as patient groups, disease states, and treatments.

Spatial transcriptomic analysis

Spatial transcriptomic assays quantify gene expression by spatial location within a tissue. The analysis of spatial transcriptomic data can reveal the spatial organization of a tissue from larger spatial domains down to the cell-type and molecular level.

The research questions addressed by spatial transcriptomics typically involve those of changes in tissue composition in developmental or pathological processes or interactions between cell types in complex tissues such as tumor microenvironments.

Meet some of our single-cell experts

I am a bioinformatician with a multidisciplinary background in developing and applying novel spatial and single-cell transcriptomics methods to study the biology and pathology of diverse tissue types.

I co-developed the spatial transcriptomics method that was later commercialized by 10x Genomics as Visium—the most widely used technology for spatially resolved expression studies today. I also developed VASA-Seq, a new method for single-cell total transcriptome sequencing.

Beyond method development, I have extensive experience analyzing spatial and single-cell data and supporting researchers across various fields in designing and applying such experiments.

Dr. Fredrik Salmén Senior Bioinformatician Genevia Technologies Oy

I am a senior bioinformatician and cellular physiologist with over seven years of experience in analyzing diverse biological datasets, particularly next-generation sequencing. My expertise spans a wide range of bioinformatic tools and includes in-depth knowledge of various data types such as single-cell RNA-seq, RNA-seq, spatial transcriptomics, and advanced image analysis (confocal and super-resolution).

I have worked with data from human, mouse, rat, and various cell lines. Throughout my academic career, I have applied pathway and network analysis to reveal metabolic changes across multiple disease states. My strong background in data analysis, coupled with extensive experience in experimental cellular physiology, positions me uniquely to derive meaningful biological insights from large datasets.

Dr. Patricia Thomas Senior Bioinformatician Genevia Technologies Oy

I am a bioinformatics scientist specializing in oncology and human health data analysis, with over five years of experience in handling various omics data types. This includes bulk data (whole-exome, RNA-seq) and single-cell and spatial data (scRNA-seq, spatial transcriptomics, single-cell proteomics), as well as expertise in the experimental techniques to generate these types of data.

Throughout my career, I have successfully applied omics analysis to cancer and aging-related diseases, securing public grants, publishing in journals, and presenting at international conferences. I am highly motivated to solve complex challenges and advance healthcare research.

Dr. Alba Machado-Lopez Senior Bioinformatician Genevia Technologies Oy

I am a senior bioinformatics scientist with over 10 years of experience in analyzing a wide range of next-generation sequencing (NGS) data types, including spatial transcriptomics, single-cell RNA-seq, bulk RNA-seq, ChIP-seq, CUT&Tag, ATAC-seq, single-cell ATAC-seq, MeDIP-seq, and BS-seq.

With a background in both mathematics and biology, I am well-equipped to analyze and interpret complex biological datasets. My work spans various fields, with significant contributions in immunology and oncology research.

Dr. Giulia Barbiera Senior Bioinformatician Genevia Technologies Oy

I am a senior bioinformatics scientist with over 8 years of experience specializing in tumor biology and transcriptomics. My expertise includes the analysis of RNA-seq and single-cell RNA-seq data, as well as deep learning-based algorithms in image analysis.

I have applied my expertise to various biomedical applications, particularly in gastrointestinal diseases, tumor biology, and chronic inflammation.

Dr. Eva Domènech-Moreno Senior Bioinformatician Genevia Technologies Oy

I am a senior bioinformatics scientist with over 10 years of experience in immuno-oncology and tumor evolution, specializing in single-cell data analysis and scientific software engineering. My work centers on integrating and analyzing multimodal data, developing analysis workflows, creating scientific software, and modeling tumor evolution.

I have deep expertise in bulk and single-cell sequencing (WES/WGS, RNA-seq, ctDNA) and 3+ years of experience with spatial omics (transcriptomic and imaging-based). During my Ph.D. and postdoctoral research, I led multiple single-cell studies that resulted in high-impact publications and contributed to two cancer atlases (NSCLC and CRC).

Dr. Georgios Fotakis Senior Bioinformatician Genevia Technologies Oy

Learn more

References and customer cases

Selected publications from our customers

Kaiser, J., et al. (2026). Developmental molecular signatures define de novo cortico-brainstem circuit for skilled forelimb movement. Nature communications, 10.1038/s41467-026-73476-4. Advance online publication. https://doi.org/10.1038/s41467-026-73476-4
Hunold, P. et al. (2025). DynaTag for efficient mapping of transcription factors in low-input samples and at single-cell resolution. Nature communications, 16(1), 6585. https://doi.org/10.1038/s41467-025-61797-9
Velecela, V. et al. (2025). WNT inhibition primes the transcriptional landscape of mesoderm to initiate a phased ventricular cardiomyocyte specification programme. bioRxiv 2025.11.11.687613; doi: https://doi.org/10.1101/2025.11.11.687613
Roland, V. et al. (2025). Functional architecture of cardiac TF regulatory landscapes in control of mammalian heart development. bioRxiv 2025.12.19.695499; doi: https://doi.org/10.64898/2025.12.19.695499
Micoli, E. et al. (2025). A single-cell transcriptomic atlas of developing inhibitory neurons reveals expanding and contracting modes of diversification. bioRxiv 2025.02.19.636192; doi: https://doi.org/10.1101/2025.02.19.636192
Chang, Y. T. et al. (2024). MHC-I upregulation safeguards neoplastic T cells in the skin against NK cell-mediated eradication in mycosis fungoides. Nature communications, 15(1), 752. https://doi.org/10.1038/s41467-024-45083-8
Fisher, J. et al. (2024). Cortical somatostatin long-range projection neurons and interneurons exhibit divergent developmental trajectories. Neuron, 112(4), 558–573.e8. https://doi.org/10.1016/j.neuron.2023.11.013

Selected publications from our team

Domènech-Moreno, E. et al. (2025). Interleukin-11 expressed in the polyp-enriched fibroblast subset is a potential therapeutic target in Peutz-Jeghers syndrome. The Journal of pathology, 10.1002/path.6408. Advance online publication. https://doi.org/10.1002/path.6408
Salcher, S. et al. (2024). Comparative analysis of 10X Chromium vs. BD Rhapsody whole transcriptome single-cell sequencing technologies in complex human tissues. Heliyon, 10(7), e28358. https://doi.org/10.1016/j.heliyon.2024.e28358
Punzon-Jimenez, P. et al. (2024). Effect of aging on the human myometrium at single-cell resolution. Nature communications, 15(1), 945. https://doi.org/10.1038/s41467-024-45143-z
Kiviaho, A. et al. (2024). Single cell and spatial transcriptomics highlight the interaction of club-like cells with immunosuppressive myeloid cells in prostate cancer. Nature communications, 15(1), 9949. https://doi.org/10.1038/s41467-024-54364-1
Andersson-Rolf, A. et al. (2024). Long-term in vitro expansion of a human fetal pancreas stem cell that generates all three pancreatic cell lineages. Cell, S0092-8674(24)01254-6. Advance online publication. https://doi.org/10.1016/j.cell.2024.10.044
Eigentler, A. et al. (2024). Glucocorticoid treatment influences prostate cancer cell growth and the tumor microenvironment via altered glucocorticoid receptor signaling in prostate fibroblasts. Oncogene, 43(4), 235–247. https://doi.org/10.1038/s41388-023-02901-5
Marteau, V. et al. (2024). Single-cell integration and multi-modal profiling reveals phenotypes and spatial organization of neutrophils in colorectal cancer. bbioRxiv 2024.08.26.609563; doi: https://doi.org/10.1101/2024.08.26.609563
Caronni, N. et al. (2023). IL-1β+ macrophages fuel pathogenic inflammation in pancreatic cancer. Nature, 623(7986), 415–422. https://doi.org/10.1038/s41586-023-06685-2
de Sande, A. H. et al. (2023). Cell-type-specific characterization of miRNA gene dynamics in immune cell subpopulations during aging and atherosclerosis disease development at single-cell resolution. bioRxiv 2023.10.09.561173; doi: https://doi.org/10.1101/2023.10.09.561173
Salcher, S. et al. (2022). High-resolution single-cell atlas reveals diversity and plasticity of tissue-resident neutrophils in non-small cell lung cancer. Cancer cell, 40(12), 1503–1520.e8. https://doi.org/10.1016/j.ccell.2022.10.008
van Leeuwen, W. et al. (2022). Identification of the stress granule transcriptome via RNA-editing in single cells and in vivo. Cell reports methods, 2(6), 100235. https://doi.org/10.1016/j.crmeth.2022.100235
Salmen, F. et al. (2022). High-throughput total RNA sequencing in single cells using VASA-seq. Nature biotechnology, 40(12), 1780–1793. https://doi.org/10.1038/s41587-022-01361-8
Pham, T. et al. (2022). Modeling human extraembryonic mesoderm cells using naive pluripotent stem cells. Cell stem cell, 29(9), 1346–1365.e10. https://doi.org/10.1016/j.stem.2022.08.001
Montaldo, E. et al. (2022). Cellular and transcriptional dynamics of human neutrophils at steady state and upon stress. Nature immunology, 23(10), 1470–1483. https://doi.org/10.1038/s41590-022-01311-1
Roos, K. et al. (2022). Single-cell RNA-seq analysis and cell-cluster deconvolution of the human preovulatory follicular fluid cells provide insights into the pathophysiology of ovarian hyporesponse. Frontiers in endocrinology, 13, 945347. https://doi.org/10.3389/fendo.2022.945347
Smith, C. et al. (2022). A comparative transcriptomic analysis of glucagon-like peptide-1 receptor- and glucose-dependent insulinotropic polypeptide-expressing cells in the hypothalamus. Appetite, 174, 106022. https://doi.org/10.1016/j.appet.2022.106022
Heidegger, I. et al. (2022). Comprehensive characterization of the prostate tumor microenvironment identifies CXCR4/CXCL12 crosstalk as a novel antiangiogenic therapeutic target in prostate cancer. Molecular cancer, 21(1), 132. https://doi.org/10.1186/s12943-022-01597-7
Andersson, A. et al. (2021). Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nature communications, 12(1), 6012. https://doi.org/10.1038/s41467-021-26271-2
Namboori, S. C. et al. (2021). Single-cell transcriptomics identifies master regulators of neurodegeneration in SOD1 ALS iPSC-derived motor neurons. Stem cell reports, 16(12), 3020–3035. https://doi.org/10.1016/j.stemcr.2021.10.010
Taavitsainen, S. et al. (2021). Single-cell ATAC and RNA sequencing reveal pre-existing and persistent cells associated with prostate cancer relapse. Nature communications, 12(1), 5307. https://doi.org/10.1038/s41467-021-25624-1
Georgolopoulos, G. et al. (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation. Nature communications, 12(1), 6790. https://doi.org/10.1038/s41467-021-27159-x
Cilenti, F. et al. (2021). A PGE2-MEF2A axis enables context-dependent control of inflammatory gene expression. Immunity, 54(8), 1665–1682.e14. https://doi.org/10.1016/j.immuni.2021.05.016
Sturm, G. Et al. (2020). Scirpy: a Scanpy extension for analyzing single-cell T-cell receptor-sequencing data. Bioinformatics (Oxford, England), 36(18), 4817–4818. https://doi.org/10.1093/bioinformatics/btaa611
Mehtonen, J. et al. (2020). Single cell characterization of B-lymphoid differentiation and leukemic cell states during chemotherapy in ETV6-RUNX1-positive pediatric leukemia identifies drug-targetable transcription factor activities. Genome medicine, 12(1), 99. https://doi.org/10.1186/s13073-020-00799-2
Ballesteros, I. et al. (2020). Co-option of Neutrophil Fates by Tissue Environments. Cell, 183(5), 1282–1297.e18. https://doi.org/10.1016/j.cell.2020.10.003
Asp, M. et al. (2019). A Spatiotemporal Organ-Wide Gene Expression and Cell Atlas of the Developing Human Heart. Cell, 179(7), 1647–1660.e19. https://doi.org/10.1016/j.cell.2019.11.025
Adriaenssens, A. E. et al. (2019). Glucose-Dependent Insulinotropic Polypeptide Receptor-Expressing Cells in the Hypothalamus Regulate Food Intake. Cell metabolism, 30(5), 987–996.e6. https://doi.org/10.1016/j.cmet.2019.07.013
Escobar, G. et al. (2018). Interferon gene therapy reprograms the leukemia microenvironment inducing protective immunity to multiple tumor antigens. Nature communications, 9(1), 2896. https://doi.org/10.1038/s41467-018-05315-0
Norelli, M. et al. (2018). Monocyte-derived IL-1 and IL-6 are differentially required for cytokine-release syndrome and neurotoxicity due to CAR T cells. Nature medicine, 24(6), 739–748. https://doi.org/10.1038/s41591-018-0036-4
Ståhl, P. L. et al. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science (New York, N.Y.), 353(6294), 78–82. https://doi.org/10.1126/science.aaf2403

Browse all

Contact us

Leave your email address here with a brief description of your needs, and we will contact you to get things moving forward!

Antti Ylipää CEO, co-founder Genevia Technologies Oy +358 40 747 7672