Single-cell RNA sequencing data analysis

scrna_seq_banner.png

Single-cell RNA sequencing enables cataloging and studying cellular identities at a scale and resolution unmatched by bulk sequencing.

Single-cell RNA sequencing (scRNA-seq) is one of the most rapidly advancing and diversifying technologies in molecular biology. The ability to study gene expression on the resolution of single cells has been as transformative as the advent of bulk RNA-sequencing previously.

In addition to single-cell RNA-seq, a number of other next-generation sequencing (NGS) -based assays have been adapted to single-cell protocols. These include genomic, proteomic and epigenetic assays, notably single-cell ATAC-sequencing, which is commonly performed in conjunction with scRNA-seq.

Platforms and protocols for scRNA-seq vary in their throughput (number of cells) and transcript coverage (3'/5' tag-based vs whole-transcript). Our team has experience working with several technologies, such as 10X Genomics, Drop-Seq, BD Rhapsody system and protocols of the CEL-Seq and Smart-Seq families.

Here we present typical single-cell analyses, focusing on scRNA-seq but covering also its integration with other common single-cell assays. We also list single-cell papers that we have published.

Leave us a short description of your bioinformatics needs and we will be in touch very soon!

Quality control and preprocessing

Like with any NGS data, the analysis of single-cell sequencing data starts with quality control and preprocessing.

Raw sequencing reads are quality-tested and metrics such as cell quality, accuracy, and diversity are generated. Reads are then aligned to an applicable reference genome or transcriptome, and additional metrics such as the number of cells, reads per cell, genes per cell, sequencing saturation and fraction of mitochondrial transcripts are plotted and inspected.

These QC metrics inform us about the total quality of the libraries and the usability of the samples and enable identifying and removing low-quality cells.

Further preprocessing is often carried out to remove unwanted signal, or noise, from certain downstream analyses. These include

  • imputation to estimate read counts for dropouts, or genes with zero transcripts due to technical, rather than biological, reasons,
  • normalization to remove biases due to e.g., differences in cell sizes, and
  • reducing the data to representative variables such as highly-variable genes or principal components.

scrna_seq_qc.pngscrna_seq_imputation.png

Exploratory analysis

Preprocessed single-cell RNA-seq data is clustered to identify groups of similar cells and visualized using non-linear dimensionality reduction algorithms such tSNE and UMAP and correlation heatmaps to unveil general patterns of cell heterogeneity.

These visualizations help us answer technical questions such as:

  • Do the biological replicates resemble each other?
  • Are there outlier samples or cells?
  • Are the cell clusters distinct?

...and biological questions such as:

  • How heterogeneous are the underlying cell types/states?
  • Do distinct samples (e.g., different tissues, treatments or time points) form separate clusters?

scrna_seq_exploratory_analysis.png

Cell type identification

Identifying and characterizing cell types (and more refined cell states) is the most central part of most single-cell projects.

It all starts with identifying features (e.g., genes, proteins, accessible regions) that are specific to each cell cluster. These markers are defined by differential expression (DE) comparison of each cell cluster and the remaining ones, yielding DE statistics such as fold change and statistical significance.

The cluster markers can be visualized using scatter plots, violin plots, and heatmaps.

Markers are further annotated to biologically meaningful terms, such as a biological processes, signaling pathways or a specific disease. Such analyses may rely either on over-representation analysis or gene set enrichment analysis, which both result in a list of enriched gene sets with relevant statistics and annotations.

Single-cell datasets are typically also integrated with publicly available data in order to exploit the cell-type information from already annotated datasets or cell atlases. This enables transferring cell labels into the analyzed dataset.

The transferred cell labels and identified markers and their annotations are used, together with prior information on cell-type/state markers, to identify the captured cell types.

scrna_seq_marker_detection.pngscrna_seq_marker_visualization.png

Trajectory analysis

In addition to characterizing distinct cellular identities, single-cell data lends itself to identifying continuums of gradual change in cell state, or trajectories. Uncovering such continuums is also called pseudotime analysis — while all cells are sampled at the same time point, individual cells may represent different stages in a temporal process such as differentiation.

De novo reconstruction of lineage differentiation and cell maturation trajectories allow exploring cellular dynamics, delineation of cell developmental lineages, and characterization of transition between cell states along a latent pseudotime dimension.

An ensemble of trajectory inference algorithms may be used for robust identification of root and terminal cellular states, branching points, and lineages. Single cells are ranked across deterministic or probabilistic lineages, and their ranking indicates their progression in a dynamic process of interest.

This type of analysis may also utilize the ratio of processed and unprocessed transcripts to infer whether a gene's expression is increasing or decreasing in a given cell. Combining this information from all quantified genes at a given state enables inferring the direction and pace of change in states. This is called RNA velocity analysis.

scrna_seq_trajectory_analysis.png

Integrative single-cell analyses

Integrative single-cell analyses bring different datasets, including different data types and species together. This enables more accurate and detailed cell labeling and mechanistic insight into gene regulation in the studied system. Such analyses rely on common properties, or anchors, between the datasets, such as matched features (e.g., genes or homologues) or matched cells.

Integrating multiple single-cell RNA-seq datasets

Perhaps the most common integration of single-cell datasets takes place between scRNA-seq datasets from different sources or technology platforms. Using genes as anchors, a successful integration removes the technical bias while retaining biological variance of the datasets.

Combining different scRNA-seq datasets is particularly helpful when there is a well-characterized public expression atlas available for a relevant tissue or organism.

Integrating single-cell RNA-seq and epigenomics

Integrating single-cell RNA-seq data with single-cell ATAC-seq or single-cell methylation data often relies on matched cells as anchors (when the measurements derive from the same cells as in, e.g., 10X Genomics Multiome technology).

Combining expression data with chromatin accessibility or methylation profiles enables more robust identification of cell types and allows for quantifying the effect of chromatin state to expression in individual cell types.

Read more about integrating epigenomics and transcriptomics

Integrating single-cell RNA-seq and proteomics

Since proteins, rather than transcripts, are key drivers of cellular functions, single-cell proteomics complements scRNA-seq experiments with more accurate estimates of cells functional states.

Single-cell proteomic profiling (CITE-seq, flow cytometry, mass cytometry, and mass spectrometry) comes in different degrees of throughput (number of proteins quantified) and can be targeted specifically to surface proteins, as in CITE-seq which involves a panel surface proteins quantified from cells with matched scRNA-seq reads.

Surface proteins are particularly useful in cell type identification, while the inclusion of cytosolic proteins enable better characterization of pathway and gene-regulatory activities.

Cross-species integration

Cross-species integrative analysis enables the identification of cell-type phylogenies that define the relationships of evolutionary and developmental mechanisms between different organisms. Shared homologues are used as anchors in cross-species integration.

This is particularly helpful when a disease/organ is better characterized on a single-cell resolution in an animal model than in human.

scrna_seq_integrative_analysis.png

Ligand-receptor analysis

Ligand-receptor (LR) analysis uncovers cell-cell interactions that coordinate homeostasis, development, and other system-level functions. Changes and dysfunction in such interactions may go unnoticed in an analysis limited to the internal state of individual cells or cell types.

Ligand-receptor analysis identifies and quantifies intercellular interactions based on the expression of known receptors and their ligands. The interactions may take place within or between tissues, and the strength of this interaction is compared between biological conditions of interest, such as patient groups, disease states, and treatments.

scrna_seq_ligand_receptor_analysis.png

Spatial transcriptomic analysis

Spatial transcriptomic assays quantify gene expression by spatial location within a tissue. The analysis of spatial transcriptomic data can reveal the spatial organization of a tissue from larger spatial domains down to the cell-type and molecular level.

The research questions addressed by spatial transcriptomics typically involve those of changes in tissue composition in developmental or pathological processes or interactions between cell types in complex tissues such as tumor microenvironments.

Read more about spatial analyses

spatial_transcriptomics_data_analysis.png

Meet some of our single-cell experts

I am a bioinformatician with a multidisciplinary background in developing and applying novel spatial and single-cell transcriptomics methods to study the biology and pathology of diverse tissue types.

I co-developed the spatial transcriptomics method that was later commercialized by 10x Genomics as Visium—the most widely used technology for spatially resolved expression studies today. I also developed VASA-Seq, a new method for single-cell total transcriptome sequencing.

Beyond method development, I have extensive experience analyzing spatial and single-cell data and supporting researchers across various fields in designing and applying such experiments.

Fredrik Salmén
Fredrik Salmén Scientific Project Manager Genevia Technologies Oy

I am a senior bioinformatician and cellular physiologist with over seven years of experience in analyzing diverse biological datasets, particularly next-generation sequencing. My expertise spans a wide range of bioinformatic tools and includes in-depth knowledge of various data types such as single-cell RNA-seq, RNA-seq, spatial transcriptomics, and advanced image analysis (confocal and super-resolution).

I have worked with data from human, mouse, rat, and various cell lines. Throughout my academic career, I have applied pathway and network analysis to reveal metabolic changes across multiple disease states. My strong background in data analysis, coupled with extensive experience in experimental cellular physiology, positions me uniquely to derive meaningful biological insights from large datasets.

Patricia Thomas
Patricia Thomas Scientific Project Manager Genevia Technologies Oy

I am a bioinformatics scientist specializing in oncology and human health data analysis, with over five years of experience in handling various omics data types. This includes bulk data (whole-exome, RNA-seq) and single-cell and spatial data (scRNA-seq, spatial transcriptomics, single-cell proteomics), as well as expertise in the experimental techniques to generate these types of data.

Throughout my career, I have successfully applied omics analysis to cancer and aging-related diseases, securing public grants, publishing in journals, and presenting at international conferences. I am highly motivated to solve complex challenges and advance healthcare research.

Alba Machado
Alba Machado Scientific Project Manager Genevia Technologies Oy

I am a senior bioinformatics scientist with over 10 years of experience in analyzing a wide range of next-generation sequencing (NGS) data types, including spatial transcriptomics, single-cell RNA-seq, bulk RNA-seq, ChIP-seq, CUT&Tag, ATAC-seq, single-cell ATAC-seq, MeDIP-seq, and BS-seq.

With a background in both mathematics and biology, I am well-equipped to analyze and interpret complex biological datasets. My work spans various fields, with significant contributions in immunology and oncology research.

Giulia Barbiera
Giulia Barbiera Scientific Project Manager Genevia Technologies Oy

I am a senior bioinformatics scientist with over 8 years of experience specializing in tumor biology and transcriptomics. My expertise includes the analysis of RNA-seq and single-cell RNA-seq data, as well as deep learning-based algorithms in image analysis.

I have applied my expertise to various biomedical applications, particularly in gastrointestinal diseases, tumor biology, and chronic inflammation.

Eva Domènech-Moreno
Eva Domènech-Moreno Scientific Project Manager Genevia Technologies Oy

Learn more

References and customer cases

Selected publications from our customers

  • Chang, Y. T. et al. (2024). MHC-I upregulation safeguards neoplastic T cells in the skin against NK cell-mediated eradication in mycosis fungoides. Nature communications, 15(1), 752. https://doi.org/10.1038/s41467-024-45083-8
  • Fisher, J. et al. (2024). Cortical somatostatin long-range projection neurons and interneurons exhibit divergent developmental trajectories. Neuron, 112(4), 558–573.e8. https://doi.org/10.1016/j.neuron.2023.11.013

Selected publications from our team

  • Punzon-Jimenez, P. et al. (2024). Effect of aging on the human myometrium at single-cell resolution. Nature communications, 15(1), 945. https://doi.org/10.1038/s41467-024-45143-z
  • Kiviaho, A. et al. (2024). Androgen deprivation therapy-resistant club cells are linked to myeloid cell-driven immunosuppression in the prostate tumor microenvironment. bioRxiv 2024.03.25.586330; doi: https://doi.org/10.1101/2024.03.25.586330
  • Domènech-Moreno, E. et al. (2024). Identification of a targetable ST2-expressing fibroblast subset driving Peutz-Jeghers syndrome polyposis. bioRxiv 2023.11.29.568817; doi: https://doi.org/10.1101/2023.11.29.568817
  • Caronni, N. et al. (2023). IL-1β+ macrophages fuel pathogenic inflammation in pancreatic cancer. Nature, 623(7986), 415–422. https://doi.org/10.1038/s41586-023-06685-2
  • de Sande, A. H. et al. (2023). Cell-type-specific characterization of miRNA gene dynamics in immune cell subpopulations during aging and atherosclerosis disease development at single-cell resolution. bioRxiv 2023.10.09.561173; doi: https://doi.org/10.1101/2023.10.09.561173
  • van Leeuwen, W. et al. (2022). Identification of the stress granule transcriptome via RNA-editing in single cells and in vivo. Cell reports methods, 2(6), 100235. https://doi.org/10.1016/j.crmeth.2022.100235
  • Salmen, F. et al. (2022). High-throughput total RNA sequencing in single cells using VASA-seq. Nature biotechnology, 40(12), 1780–1793. https://doi.org/10.1038/s41587-022-01361-8
  • Pham, T. et al. (2022). Modeling human extraembryonic mesoderm cells using naive pluripotent stem cells. Cell stem cell, 29(9), 1346–1365.e10. https://doi.org/10.1016/j.stem.2022.08.001
  • Montaldo, E. et al. (2022). Cellular and transcriptional dynamics of human neutrophils at steady state and upon stress. Nature immunology, 23(10), 1470–1483. https://doi.org/10.1038/s41590-022-01311-1
  • Roos, K. et al. (2022). Single-cell RNA-seq analysis and cell-cluster deconvolution of the human preovulatory follicular fluid cells provide insights into the pathophysiology of ovarian hyporesponse. Frontiers in endocrinology, 13, 945347. https://doi.org/10.3389/fendo.2022.945347
  • Smith, C. et al. (2022). A comparative transcriptomic analysis of glucagon-like peptide-1 receptor- and glucose-dependent insulinotropic polypeptide-expressing cells in the hypothalamus. Appetite, 174, 106022. https://doi.org/10.1016/j.appet.2022.106022
  • Andersson, A. et al. (2021). Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nature communications, 12(1), 6012. https://doi.org/10.1038/s41467-021-26271-2
  • Namboori, S. C. et al. (2021). Single-cell transcriptomics identifies master regulators of neurodegeneration in SOD1 ALS iPSC-derived motor neurons. Stem cell reports, 16(12), 3020–3035. https://doi.org/10.1016/j.stemcr.2021.10.010
  • Taavitsainen, S. et al. (2021). Single-cell ATAC and RNA sequencing reveal pre-existing and persistent cells associated with prostate cancer relapse. Nature communications, 12(1), 5307. https://doi.org/10.1038/s41467-021-25624-1
  • Georgolopoulos, G. et al. (2021). Discrete regulatory modules instruct hematopoietic lineage commitment and differentiation. Nature communications, 12(1), 6790. https://doi.org/10.1038/s41467-021-27159-x
  • Cilenti, F. et al. (2021). A PGE2-MEF2A axis enables context-dependent control of inflammatory gene expression. Immunity, 54(8), 1665–1682.e14. https://doi.org/10.1016/j.immuni.2021.05.016
  • Mehtonen, J. et al. (2020). Single cell characterization of B-lymphoid differentiation and leukemic cell states during chemotherapy in ETV6-RUNX1-positive pediatric leukemia identifies drug-targetable transcription factor activities. Genome medicine, 12(1), 99. https://doi.org/10.1186/s13073-020-00799-2
  • Ballesteros, I. et al. (2020). Co-option of Neutrophil Fates by Tissue Environments. Cell, 183(5), 1282–1297.e18. https://doi.org/10.1016/j.cell.2020.10.003
  • Asp, M. et al. (2019). A Spatiotemporal Organ-Wide Gene Expression and Cell Atlas of the Developing Human Heart. Cell, 179(7), 1647–1660.e19. https://doi.org/10.1016/j.cell.2019.11.025
  • Adriaenssens, A. E. et al. (2019). Glucose-Dependent Insulinotropic Polypeptide Receptor-Expressing Cells in the Hypothalamus Regulate Food Intake. Cell metabolism, 30(5), 987–996.e6. https://doi.org/10.1016/j.cmet.2019.07.013
  • Escobar, G. et al. (2018). Interferon gene therapy reprograms the leukemia microenvironment inducing protective immunity to multiple tumor antigens. Nature communications, 9(1), 2896. https://doi.org/10.1038/s41467-018-05315-0
  • Norelli, M. et al. (2018). Monocyte-derived IL-1 and IL-6 are differentially required for cytokine-release syndrome and neurotoxicity due to CAR T cells. Nature medicine, 24(6), 739–748. https://doi.org/10.1038/s41591-018-0036-4
  • Ståhl, P. L. et al. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science (New York, N.Y.), 353(6294), 78–82. https://doi.org/10.1126/science.aaf2403

Browse all

Contact us

Leave your email address here with a brief description of your needs, and we will contact you to get things moving forward!

Antti Ylipää
Antti Ylipää CEO, co-founder Genevia Technologies Oy +358 40 747 7672