Mutation analysis in cancer research

Cancer research is the giant within life sciences. The amount of funding, published research and generated data dwarfs all other areas in biology and medicine. For this reason, many computational analyses that are widely used across biology were first developed to study cancer, and mutations in particular. In this overview, we introduce typical mutation analyses in cancer research.


As in any biomedical research, bioinformatics is routinely employed in cancer research to distill novel insight from large sets of molecular and clinical data. Given the rapid pace at which high-throughput measurements have been adopted in both basic biology and clinical research, bioinformatics has become an integral part of cancer research. More and more research groups in the field are hybrid labs consisting of both “normal” and computational biologists.

Due to the vanishing demarcation between wet and dry biology, bioinformatics is not only used to distill insight from data, but also to drive research. Research grants are increasingly awarded to cancer research which involves the development — rather than mere application — of computational methods.

We focus here on established computational methods to study the biology and treatment of cancer from the perspective of mutations. These analyses aim to answer questions such as:

  • Which mutagenic processes or mutations lead to cancer?
  • Which mutations equip cancer with capacities such as treatment resistance?
  • Which mutations predict a patient’s course of disease?


Mutation analyses in cancer research identify relevant alterations in the DNA sequence of tumor cells. In addition to identifying individual mutations, patterns of mutations can be studied to enable identifying the cause of the mutations themselves or the clonal structure of a cancer.

The underlying data in mutation analyses most often comes from next-generation DNA-sequencing (NGS) experiments. DNA-sequencing can be applied in a range of experimental settings, of which we cover the most common here.

Tumor vs normal

The classic setting for mutation analyses involves sequencing the DNA of surgically harvested cancer cells in addition to normal, non-cancer cells from the same patient. Normal cells, typically lymphocytes from a blood sample, are a valuable patient-specific control against which the tumor DNA sequence is computationally compared. Sequencing a number of patients in this manner enables associating mutations to a clinical variable such as disease subtype or treatment response.

Treated vs untreated

Another typical experiment contrasts treated cancer cells to untreated. In such a setting, mutation analysis may identify mutations that are caused by a treatment (such as radiotherapy) or mutations that enable the cancer cells to resist treatment. Experiments of this type commonly rely on animal models or cell lines.

Population-scale and family studies

A third group of experiments aims at identifying germline variants that predispose to cancer. Unlike the somatic mutations that notoriously mark cancer, germline variants are present in all of an individual’s cells and are thus passed on to offspring. They are present in a population as alternative alleles, or polymorphisms. Identifying such genetic determinants of tumorigenesis typically involves either population-scale genotyping or sequencing families with hereditary cancer.

The types of DNA-sequencing

The type of DNA sequencing that is applied affects the ensuing bioinformatics primarily through limiting the analyses to particular genomic regions. Whole-genome sequencing (WGS) enables analysis of mutations anywhere in the genome, including the vast intergenic non-coding regions. Whole-exome sequencing (WES) limits the analysis to mutations in the protein-coding part of the genome. Targeted sequencing further limits the analysis to predetermined loci — a panel of known cancer genes, for instance.

Identifying and annotating mutations

The raw data from a DNA-sequencing experiment is quality-controlled and aligned against a reference genome. Variants may then be identified (or “called”) using a mutation caller pipeline. When calling somatic mutations specifically, the mutation caller takes both the tumor and normal DNA-seq data as input to distinguish between somatic and germline mutations.

A mutation caller is designed to look for specific type of mutation, such as small variants, copy-number variants or structural variants. Small variants comprise substitutions, insertions and deletions of one or a few nucleotides. Copy-number variants are amplification or deletion events affecting larger chunks of DNA. Structural variants include even more complex DNA alterations such as inter-chromosomal translocations and inversions of DNA segments.

Identified mutations can be annotated with their variant allele frequency, population allele frequency (in the case of germline variants), effect on the amino acid sequence and predicted pathogenicity. Such annotations are crucial for selecting relevant mutations for various downstream analyses.

Associating mutations to clinical and phenotypic variables

The key part in a mutation analysis workflow is visualizing identified mutations and associating them to other variables. Typical visualizations include oncoplots (or waterfall plots) which show the mutational statuses of multiple genes across analyzed patients and lollipop plots, which highlight the positions of mutations along the amino acid sequence of a mutated (and protein-coding) gene.

Statistical tests can be used to compare mutated genes in different sample groups (different cancer types, primary vs metastatic tumors, before vs after treatment etc.) Mutation frequencies, odds ratios and p-values are typical statistics reported in such analyses. Similarly, mutations can be associated to continuous variables such as the patient’s age, tumor size or level of a blood biomarker.

Survival analysis can be used to associate mutations to clinical endpoints such as death from cancer or relapse. Survival analyses rely on Kaplan-Meier estimators, Cox regression or machine learning approaches. (Learn more about survival analyses.)

Mutational signature analysis

The frequencies of different types of nucleotide substitutions observed in a tumor’s DNA carries information on their cause. Simply, one type of mutagen may cause predominantly T>A substitutions whereas another one may cause G>C substitutions. Comparing the patterns of observed substitution frequencies enables quantifying previously characterized mutational signatures in a tumor. This yields insight into the etiology of the cancer, and mutational signatures are potential prognostic markers in their own right.

Clonality analysis

Cancer is a dynamic population of tumor cells which multiplies and spreads within the body, given the possibility. Just how a cancer achieves the capacity to increase proliferation, evade treatment and metastasize can be studied by building a family tree of the cancer cells. When tumor samples are sequenced from multiple time points or metastases from the same patient, it is possible to study the clonal structure of the cancer, and further, associate the emergence of daughter clones (equipped with specific mutations) to disease progression events.

Clonality analysis based on bulk tissue DNA sequencing is susceptible to the data quality and sequencing depth. A good approach is to identify mutations first using whole-genome or exome sequencing and then perform ultra-deep targeted sequencing of the most prevalent mutations to acquire accurate variant allele estimates of the mutations.

What else?

Mutation analyses range wider and dive deeper than we managed to cover in this brief overview. Topics such as integrating mutational data with other high-throughput data modalities, predicting neo-epitopes, detecting mutations from cell-free DNA and interpreting non-coding mutations will warrant introductions of their own!

Meanwhile, you might wish to learn more about our background in cancer bioinformatics through our publications and other references — the roots of Genevia Technologies lie quite deep in computational cancer research.

If you are planning sequencing experiments to study mutations, drop us a message and we will be happy to discuss possible analyses in detail!


Genevia RNA-Seq Bioinformatics Grant 2022