DNA sequencing data analysis

Understand the effects of genomic variation and mutations with DNA sequencing data analysis

We routinely analyze whole genome, whole exome and targeted re-sequencing data of well-characterized organisms, such as humans, with an aim to mapping and identifying small genetic variation (single nucleotide polymorphisms, and short insertions and deletions (indels)) using established best practice methods. In order to help our clients interpret their variant data, we continuously develop our DNA sequencing data analysis and variant annotation pipeline so as to include more and more information on each identified variant. For example, pathogenicity predictions and minor allele frequencies in databases such as 1000 Genomes and GnomAD provide an excellent way of filtering irrelevant variation from your results.

Our clients working in oncology are also interested in characterizing somatic mutations that are not limited to small-scale genomic events. We have worked on characterizing gene copy number variation in tumor and cancer cell line samples from both microarray and NGS data, and integrating them into expression data in order to quantify oncogenic gene dosage effects in different tumor types. We have also concentrated on developing DNA sequencing data analysis pipelines in order to discover copy number-neutral genomic rearrangements leading to novel oncogenic fusion genes.

Our DNA sequencing data analysis workflow also adapts to assembling genomes of non-model organisms. We produce genome assemblies based on WGS data that are then computationally post-processed to achieve the best possible quality. The assembled genomes are then annotated using gene prediction, automated homology searches using genome databases, and gene annotation transfer from closely related organisms. Thorough annotation of novel genomes ensures the best possible starting point for transcriptome studies on these organisms.

Read more about typical DNA sequencing data analyses:

  • Variant calling A properly filtered list of variants helps you focus on relevant findings
    • Our statistical approaches to variant calling employ current best practices that result in a reliable set of variants. Natural variants and single nucleotide polymorphisms can be called against any reference genome in any organism or even a genomic ensemble compiled from individual genomes from sequencing projects for better representation. In addition to high confidence variants, we report regions of low coverage where the variant caller was not able to determine the sequence of samples. Whole genome, whole exome or targeted DNA-sequencing all enable variant calling equally well. The lists of variants can be further combined, compared and filtered in order to find disease-causing de novo germ line variants in trio studies, for example.


      • Full variant lists for all samples with evidence from data-based evidence
      • Filtered variant lists based on any criteria (e.g. germ line control for mutations)
      • Low-coverage regions where variants could not be called
  • Variant annotation Turn a list of variants into genomic information with relevant annotations
    • Genetic variants are annotated with information regarding their location in the genome, variant type (homozygous/heterozygous), evidence from data (supporting reads), functional classification for exonic variants, amino acid changes in all isoforms, database identifiers for known variants, observed minor allele frequencies in several genome databases, or even your own data. We also provide pathogenicity predictions for each exonic variant using several types of prediction software. Flexible ranking and filtering of the variants based on these annotations enables easy interpretation of complex genomic data for a geneticist or a physician.


      • Functional and location annotation for every variant
      • Minor allele frequencies in relevant databases
      • Database identifiers for known variants
      • Pathogenicity predictions
  • Copy number analysis Explain regulatory and phenotypic differences with aberrant gene copy numbers
    • Gene copy numbers can be deduced from sequencing data using our statistical approaches for analyzing both coverage information and allele frequency information. The analysis yields copy numbers for each chromosome-scale segment, gene, and exon independently. Gene copy numbers can be further integrated into expression data, for example, in order to find significant gene dosage effects.


      • Copy number for each chromosome
      • Gene copy number for each gene
      • Copy number for each exon
  • Genomic rearrangements Whole genome sequencing enables you to see every aberration in your genomes
    • Whole genome sequencing data coupled with mate pair information from paired-end sequencing can be used to study copy number-neutral genomic rearrangements such as inversions and translocations. These can result in fusion genes that are critically linked to formation of cancer, for example. We deliver a report of the altered genome structure with ranked fusion genes that can be validated with RNA-sequencing data.


      • List of potential fusion genes
      • List of all rearrangements
  • Genome assembly and refinement Accurate genome builds act as the perfect starting point for any study
    • For simpler organisms, we offer assembly of their genomes de novo based on DNA-sequencing data. Our approach is based on building a consensus assembly from outputs of several assembly tools, and then running computational post-assembly improvement software. If a draft genome exists, we can refine it computationally by joining contigs and resolving errors using improvement tools or additional DNA-seq or RNA-seq data.


      • Assembled contigs in FASTA format
      • Computationally-refined genome assembly
      • Quality estimation scores
  • Genome annotation Computational annotations let you pinpoint new genes in your genomes
    • Assembled genomes can always be annotated using gene and oriC prediction software and/or based on RNA-seq data. We predict gene identities for all putative genes by comparing their sequence to several genome databases. For genes with less sequence similarity, functions can be predicted by identified functional domains. If annotated genomes for close relatives exist, we can improve the annotation by transferring gene information to the unannotated genome using sequence alignment based approaches. The result is a comprehensive list of genes with their specific coordinates in the genome.


      • Loci of predicted genes
      • Fully annotated genes based on homology searches
      • Validated genes based on RNA-seq data
  • Cell-free DNA biomarker discovery Sequencing analysis of liquid biopsies for development of diagnostics
    • Circulating cell-free DNA has potential uses in non-invasive genomic biomarkers, in particular for prenatal diagnosis and oncology. The mere presence of certain DNA sequences in plasma can reveal a tumor undetected by other means. Furthermore, mutations detected in circulating DNA can be used as markers in personalizing treatment and prognosis. Our pipeline for cell-free DNA-based biomarker discovery starts with a full quality control of the data followed by a statistical comparison of pathological and control groups in order to reveal biomarkers with the optimal combination of sensitivity and specificity. Considering biological factors along with clinical feasibility, we summarize the analysis by highlighting the most promising biomarker candidates.


      • List of biomarker candidates from cell-free DNA
      • Sensitivity and specificity estimations for each candidate
      • Database identifiers for known mutations and pathogenicity predictions
  • Metagenomic analysis Study microbial species composition and their changes in your samples
    • Metagenomics offers an unbiased view into the microbial diversity of ecological niches including samples from host organisms and soil. Using whole-genome or, alternatively, 16S sequencing data, we assemble the sequence reads into contigs and assign them to species or operational taxonomic units (OTUs). We then quantify the abundance of each taxon. In the case of multiple samples, we compare the relative abundances and associate them with host phenotype or environmental factors. For whole-genome studies, we identify and annotate genes using both sequence homology and computational gene prediction.


      • Quantitative characterization of microbial diversity
      • Association of species/OTU with host phenotype or other environmental factors
      • Identified and predicted genes with custom annotations

Want to hear more?

Leave us a message and we will be in contact within a day!

Learn more

Bioinformatics buyer's guide