Single-Cell RNA Sequencing Data Analysis
Molecular biology has never been more high-throughput: advances in technology allow for larger sample sizes, more data modalities and better coverage of measured molecules. Perhaps the strongest trend, however, is that of increasing resolution in detecting cellular identities. Rather than studying bulk tissues, we can now catalog individual cell types therein, and uncover the variation within and between minute populations, down to the idiosyncrasies of individual cells. The new workhorse is single-cell RNA sequencing.
The transformation of RNA-Seq
The world of biological research has been dramatically enhanced through the use of RNA sequencing (RNA-Seq), which in the last decade has become a standard experimental practice for highly accurate and high-throughput transcriptome analysis. The first implementation of this technology involved sequencing together the mRNA from all cells in a sample, commonly called bulk RNA-Seq, and deriving an average expression estimate for each gene.
Bulk RNA-Seq is useful for making broad comparisons of gene expression between tissues and disease states, but more subtle signals, found in complex tissues comprised of multiple cell types, are lost. In contrast, single-cell RNA sequencing (scRNA-Seq) enables interrogation of the transcriptome of individual cells, adding the ability to identify new cell types and functions, outline transcriptional programs, and reveal cell development pathways.
The first high-throughput scRNA-Seq technology, Drop-Seq, was introduced in 2015. Today, various single-cell protocols are available, relying on cell isolation techniques such as microfluidics, fluorescence-activated cell sorting (FACS) and laser-capture microdissection. Depending on the platform, isolated cells are encapsulated in separate wells or liquid droplets in which the library prep reactions then take place in parallel.
Some technologies, like the widely used Chromium platform of 10x Genomics, are designed to handle high numbers of cells in a cost-efficient manner, while others, such as the family of Smart-Seq protocols, are geared towards producing richer sequencing data from fewer cells. The sequencing of prepared libraries is typically carried out on Illumina devices.
Multitudes of expression
The high number of sampled units in single-cell RNA sequencing — as opposed to bulk RNA-Seq — can result in thousands of times more expression values. Correspondingly, this requires more computational resources to perform the subsequent analyses as well as expertise and tools specific to single-cell data. Importantly, the complexity of single-cell library preparation means that quality control and pre-processing require more attention to ensure none of the steps producing the data are introducing too much unwanted bias or noise.
Single-cell gene expression data allows for analyses which are simply not possible with bulk RNA-Seq; the most significant of these are identification of cell types and inference of paths between distinct cellular identities. The former relies on cluster analysis and the latter on trajectory analysis, both of which are at the core of a typical analysis workflow of scRNA-seq data, as shown in the figure below.
In clustering, the cells are grouped based on their expression profiles, and these groups are then annotated by known cell-type marker genes, or characterized as novel subtypes of a known cell type. Trajectory inference uncovers the gradual changes in gene expression between cell types, such as in cellular differentiation, and pinpoints the regulatory genes activated at different stages along the way.
Most analyses available for bulk RNA-Seq, such as differential expression and pathway analyses, are applicable for single-cell data as well. However, so called 3’mRNA sequencing, such as that of the aforementioned Chromium platform, does not allow for analyses that require full transcript coverage, ruling out reliable detection of isoform- or allele-specific expression. For such analyses, Smart-Seq3 or other full-transcript protocols are required.
Single-cell RNA-Seq possibilities for the taking
Due to time, budget, or expertise considerations, designing and performing novel scRNA-Seq analyses addressing a targeted research question might not always be an option. However, large numbers of open-access experimental datasets already exist, at various stages of data refinement. One important example is the Single Cell Expression Atlas, which currently catalogs more than 150 scRNA-Seq experiments across various tissues and species, and allows direct downloading of experimental data representing more than 2 million cells. Other, more targeted, resources can also be mined for high-quality expression data, for example the Allen Brain Atlas.
With more than 850 single-cell RNA-seq experiments having been performed to date, the possibilities for finding data relevant to your research are significant. Apart from stand-alone analyses of public data sets, openly available data can also be analyzed in conjunction with your own data, supporting or augmenting the findings.
When planning an scRNA-Seq project, it is important to be aware of the available tools and pertinent open-access data, as well as the possibilities and limitations of the single-cell platform utilized. Helping you plan an experiment and tailoring a bioinformatics workflow to your needs is an essential part of Genevia Technologies’ Bioinformatics as a Service.
Read more about single-cell bioinformatics, or leave us a message below if you wish to discuss your single-cell plans with us!