DNA sequencing data analysis

Understand the effects of genetic variation and mutations with DNA sequencing data analysis.

DNA-sequencing comes in many forms. Whole-genome sequencing (WGS), whole-exome sequencing (WES) and targeted sequencing enable studying heritable and somatic DNA variants. In addition to NGS data, SNP and CGH arrays can be used to identify genetic polymorphisms and copy-number variants, respectively. Metagenomic whole-genome sequencing of microbial communities allows analyzing their compositions and functions.

We routinely analyze DNA sequence data to address research questions in both basic biology and biomedical settings. Below we present some of the typical DNA-sequencing data analyses. If you are interested to learn how we can help you to get the most out of your DNA-seq data, leave us a message and we will book you a short call with our expert.

Leave us a short description of your bioinformatics needs and we will be in touch very soon!

Variant analysis

In most cases, DNA sequencing is employed in order to identify and analyze genetic variants. These variants can be small nucleotide substitutions, insertions, deletions, copy-number alterations or structural variants. Futhermore, they may be heritable polymorphisms or somatic mutations.

Variant analysis typically starts with the quality control of raw DNA-sequencing data and aligning the sequencing reads against a reference genome. Variants that differ between the sample and public reference — or between different samples — can then be computationally identified.

A crucial part of variant analysis is annotating the detected variants. Annotations such as allele frequencies (both in-sample and in public databases such as gnomAD), predicted effects on protein structure or gene regulation and predicted pathogeneicity allow for flexible selection or ranking of variants for downstream analyses and interpretation.

Variant analysis in cancer research often focuses on identifying somatic mutations which accelerate tumorigenesis (driver mutations) or that can be used to diagnose a patient or predict their course of disease. Learn more about mutation analysis in cancer research.

Tumor evolutionary analysis

Tumor evolution underpins cancer's ability to adapt under selective pressures imposed by therapies. Somatic mutations at a subclonal level can be used to reveal the clonal structure of a tumor and track it through processes such as relapse and metastasis.

Learn more about tumor clonality analysis.

Genome assembly

For organisms with no reference genomes or highly dynamic genomes, DNA-sequencing data analysis starts with assembling a genome de novo. Genome assembly benefits from deep whole-genome sequencing.

An assembled genome is annotated based on sequence homology, predicted gene sequences and, if available, RNA-sequencing data from the same organism. If annotated genomes for close relative species exist, the annotation can be improved by transferring gene information to the newly assembled genome.

The quality of an assembled genome is assessed using metrics such as N50, L50 and completness with regards to highly conserved orthologs. A new high-quality genome enables analyses into pan-genomes, population genetics and much more!

Metagenomics

Metagenomics offers an unbiased view into the microbial diversity of ecological niches including samples from host organisms and soil. Using shot-gun whole-genome sequencing data, reads are assembled into contigs and assigned to species or operational taxonomic units (OTUs).

Identified species or OTUs are organized into a phylogeny and quantified. The functions brought about by individual genes or multi-gene pathways present in the sequenced community can be identified using public databases.

Note that 16S amplicon sequencing, a cost-effective alternative to metagenomic sequencing, can be used to identify species and build phylogenies, but it does not allow for high-quality functional analyses.

Population genetics

Genome-wide measurements of individuals sampled from related populations contain rich information on the populations’ structure, genealogy and history. Population genetic analyses of non-model organisms often begin with genome assembly and annotation, and proceed to identifying genetic polymorphisms in the sampled populations. The downstream analyses based on these polymorphisms and their allele frequencies help studying evolutionary phenomena such as speciation and adaptation.

Typical analyses involve principal component analysis, analysis of genetic variation within and between populations to identify loci affected by evolutionary selection, and analyses of population admixtures, phylogeny and demographic histories.

Genome-wide association analysis

Biomedically motivated population-scale genetic analyses aim to identify genes and variants associated to relevant phenotypes or diseases. Apart from the few diseases which are monogenic and strongly heritable, most diseases require large, population-level sample sizes to achieve sufficient statistical power to find associations. Such genome-wide association studies (GWAS) are based on SNP-array or DNA-sequencing data from biobanks or other large repositories.

GWAS results in summary statistics on the association between each individual variant and the studied disease. In the case of polygenic diseases, individual variants may have very weak effect sizes even when the disease is strongly heritable. In such cases, polygenic risk scores (PRS) can be used to sum the effect of a large number of variants, resulting in a combined risk score with potential clinical utility.

Contact our experts to learn more

Meet some of our genomics experts

I’m a senior computational biologist with 10+ years of experience in cancer bioinformatics and precision oncology, specializing in multi-omics data analysis (WES, RNA-seq, microarrays, proteomics) for biomarker discovery and translational research.

My work focuses on functional knowledge discovery—such as molecular footprinting to infer regulatory activity—and reconstructing tumor histories using public tools and clinical databases to support personalized medicine. I’m also skilled in statistics, R programming, and developing open-source, reproducible reports.

Dr. Efstathios-Iason Vlachavas Senior Bioinformatician Genevia Technologies Oy

I am a bioinformatician specialised in cancer genomics and genetics with a 10-year experience analysing omics data in countless genetic, genomic, transcriptomic and epigenetic studies.

While my research focus has been in cancer, I have also gained experience in a number of other fields, such as immunology, aging and developmental biology. In recent years, I have also applied machine learning methods to harness biomedical data in various clinical applications.

Dr. Tommi Rantapero Senior Bioinformatician Genevia Technologies Oy

Learn more

Above we introduced some of the computational analyses applied to various types of DNA-seq data. However, our team's experience ranges much deeper — take a look at our references and publications.

Learn more about DNA-seq data analysis

On somatic mutation analysis

References and customer cases

Selected publications from our customers

Poggiali, B. et al. (2026). ECHO: a nanopore sequencing-based workflow for (epi)genetic profiling of the human repeatome. bioRxiv 2026.03.18.712618; doi: https://doi.org/10.64898/2026.03.18.712618
Ambite, I. et al. (2024). Molecular analysis of acute pyelonephritis—excessive innate and attenuated adaptive immunity. Life Science Alliance, 8(3), e202402926. https://doi.org/10.26508/lsa.202402926
Zhong, M. et al. (2024). A meta-analysis and polygenic score study identifies novel genetic markers for waist-hip ratio in African populations. Obesity (Silver Spring, Md.), 10.1002/oby.24123. Advance online publication. https://doi.org/10.1002/oby.24123
Karihtala, P. et al. (2024). Mutational signatures and their association with cancer survival and gene expression in multiple cancer types. International journal of cancer, 10.1002/ijc.35148. Advance online publication. https://doi.org/10.1002/ijc.35148
Adebamowo, S. N. et al. (2024). Genome, HLA and polygenic risk score analyses for prevalent and persistent cervical human papillomavirus (HPV) infections. European journal of human genetics : EJHG, 10.1038/s41431-023-01521-7. Advance online publication. https://doi.org/10.1038/s41431-023-01521-7
Boyd, S., et al. (2024). NGS of brush cytology samples improves the detection of high-grade dysplasia and cholangiocarcinoma in patients with primary sclerosing cholangitis: A retrospective and prospective study. Hepatology communications, 8(4), e0415. https://doi.org/10.1097/HC9.0000000000000415
Ribeiro E Ribeiro, R. et al. (2024). Synchronous Epidermodysplasia Verruciformis and Intraepithelial Lesion of the Vulva Is Caused by Coinfection With Alpha-Human Papillomavirus and Beta-Human Papillomavirus Genotypes and Facilitated by Mutations in Cell-Mediated Immunity Genes. Archives of pathology & laboratory medicine, 148(9), 1014–1021. https://doi.org/10.5858/arpa.2023-0193-OA
Karihtala, P. et al. (2023). Mutational signatures and their association with survival and gene expression in urological carcinomas. Neoplasia (New York, N.Y.), 44, 100933. Advance online publication. https://doi.org/10.1016/j.neo.2023.100933
Karihtala, P. et al. (2022). Comparison of the mutational profiles of neuroendocrine breast tumours, invasive ductal carcinomas and pancreatic neuroendocrine carcinomas. Oncogenesis, 11(1), 53. https://doi.org/10.1038/s41389-022-00427-1
Yuan, O. et al. (2022). A somatic mutation in moesin drives progression into acute myeloid leukemia. Science advances, 8(16), eabm9987. https://doi.org/10.1126/sciadv.abm9987
Wahlström, G. et al. (2022). The variant rs77559646 associated with aggressive prostate cancer disrupts ANO7 mRNA splicing and protein expression. Human molecular genetics, ddac012. Advance online publication. https://doi.org/10.1093/hmg/ddac012
Karihtala, P. et al. (2022). Mutational Signatures Associate With Survival in Gastrointestinal Carcinomas. Cancer genomics & proteomics, 19(5), 556–569. https://doi.org/10.21873/cgp.20340
Ribeiro, R. et al. (2022). Synchronous Epidermodysplasia Verruciformis and Intraepithelial Lesion of the Vulva is Caused by Coinfection with α-HPV and β-HPV Genotypes and Facilitated by Mutations in Cell-Mediated Immunity Genes. Preprint at https://doi.org/10.21203/rs.3.rs-1991512/v1
Pernaute-Lau, L. et al. (2021). Pharmacogene Sequencing of a Gabonese Population with Severe Plasmodium falciparum Malaria Reveals Multiple Novel Variants with Putative Relevance for Antimalarial Treatment. Antimicrobial agents and chemotherapy, 65(7), e0027521. https://doi.org/10.1128/AAC.00275-21
Åvall-Jääskeläinen, S. et al. (2021). Genomic Analysis of Staphylococcus aureus Isolates Associated With Peracute Non-gangrenous or Gangrenous Mastitis and Comparison With Other Mastitis-Associated Staphylococcus aureus Isolates. Frontiers in microbiology, 12, 688819. https://doi.org/10.3389/fmicb.2021.688819
Wullt, B. et al. (2021). Immunomodulation-A Molecular Solution to Treating Patients with Severe Bladder Pain Syndrome?. European urology open science, 31, 49–58. https://doi.org/10.1016/j.euros.2021.07.003
Gallegos, J. E. et al. (2020). Challenges and opportunities for strain verification by whole-genome sequencing. Scientific reports, 10(1), 5873. https://doi.org/10.1038/s41598-020-62364-6
Tikkanen, T. et al. (2018). Seshat: A Web service for accurate annotation, validation, and analysis of TP53 variants generated by conventional and next-generation sequencing. Human mutation, 39(7), 925–933. https://doi.org/10.1002/humu.23543

Selected publications from our team

Fonseca, N. M. et al. (2024). Prediction of plasma ctDNA fraction and prognostic implications of liquid biopsy in advanced prostate cancer. Nature communications, 15(1), 1828. https://doi.org/10.1038/s41467-024-45475-w
Fotakis, G. et al. (2024). Conventional therapy induces tumor immunoediting and modulates the immune contexture in colorectal cancer. bioRxiv 2024.08.21.608938; doi: https://doi.org/10.1101/2024.08.21.608938
Kallio, H. M. et al. (2024). Sensitive circulating tumor DNA-based residual disease detection in epithelial ovarian cancer. Life science alliance, 7(6), e202402658. https://doi.org/10.26508/lsa.202402658
Flack, N. et al. (2024). The genome of Przewalski's horse (Equus ferus przewalskii). G3 (Bethesda, Md.), 14(8), jkae113. https://doi.org/10.1093/g3journal/jkae113
Nurminen, A. et al. (2023). Cancer origin tracing and timing in two high-risk prostate cancers using multisample whole genome analysis: prospects for personalized medicine. Genome medicine, 15(1), 82. https://doi.org/10.1186/s13073-023-01242-y
Ricordel, C. et al. (2023). Genomic characteristics and clinical significance of CD56+ circulating tumor cells in small cell lung cancer. Scientific reports, 13(1), 3626. https://doi.org/10.1038/s41598-023-30536-9
Kontogianni, G. et al. (2023). A Comprehensive Analysis of Cutaneous Melanoma Patients in Greece Based on Multi-Omic Data. Cancers, 15(3), 815. https://doi.org/10.3390/cancers15030815
Tielbeek, J. J. et al. (2022). Uncovering the genetic architecture of broad antisocial behavior through a genome-wide association study meta-analysis. Molecular psychiatry, 10.1038/s41380-022-01793-3. Advance online publication. https://doi.org/10.1038/s41380-022-01793-3
Rieder, D. et al. (2022). nextNEOpi: a comprehensive pipeline for computational neoantigen prediction. Bioinformatics (Oxford, England), 38(4), 1131–1132. https://doi.org/10.1093/bioinformatics/btab759
Rautajoki, K. J. et al. (2022). PTPRD and CNTNAP2 as markers of tumor aggressiveness in oligodendrogliomas. Scientific reports, 12(1), 14083. https://doi.org/10.1038/s41598-022-14977-2
van Heukelum, S. et al. (2021). A central role for anterior cingulate cortex in the control of pathological aggression. Current biology : CB, 31(11), 2321–2333.e5. https://doi.org/10.1016/j.cub.2021.03.062
Rajamäki, K. et al. (2021). Genetic and Epigenetic Characteristics of Inflammatory Bowel Disease-Associated Colorectal Cancer. Gastroenterology, 161(2), 592–607. https://doi.org/10.1053/j.gastro.2021.04.042
Vandekerkhove, G. et al. (2021). Plasma ctDNA is a tumor tissue surrogate and enables clinical-genomic stratification of metastatic bladder cancer. Nature communications, 12(1), 184. https://doi.org/10.1038/s41467-020-20493-6
Cerqueira, J. et al. (2021). Independent and cumulative coeliac disease-susceptibility loci are associated with distinct disease phenotypes. Journal of human genetics, 66(6), 613–623. https://doi.org/10.1038/s10038-020-00888-5
Yusuf, L. et al. (2020). Noncoding regions underpin avian bill shape diversification at macroevolutionary scales. Genome research, 30(4), 553–565. https://doi.org/10.1101/gr.255752.119
Lindfors, K. et al. (2020). Metagenomics of the faecal virome indicate a cumulative effect of enterovirus and gluten amount on the risk of coeliac disease autoimmunity in genetically at risk children: the TEDDY study. Gut, 69(8), 1416–1422. https://doi.org/10.1136/gutjnl-2019-319809
Hayes, K. et al. (2020). A Study of Faster-Z Evolution in the Great Tit (Parus major). Genome biology and evolution, 12(3), 210–222. https://doi.org/10.1093/gbe/evaa044
Taavitsainen, S. et al. (2019). Evaluation of Commercial Circulating Tumor DNA Test in Metastatic Prostate Cancer. JCO precision oncology, 3, PO.19.00014. https://doi.org/10.1200/PO.19.00014
Zeng, K. et al. (2019). Methods for Estimating Demography and Detecting Between-Locus Differences in the Effective Population Size and Mutation Rate. Molecular biology and evolution, 36(2), 423–433. https://doi.org/10.1093/molbev/msy212
Barton, H. J. et al. (2019). The Impact of Natural Selection on Short Insertion and Deletion Variation in the Great Tit Genome. Genome biology and evolution, 11(6), 1514–1524. https://doi.org/10.1093/gbe/evz068
Olofsson, J. K. et al. (2019). Population-Specific Selection on Standing Variation Generated by Lateral Gene Transfers in a Grass. Current biology : CB, 29(22), 3921–3927.e5. https://doi.org/10.1016/j.cub.2019.09.023
Gao, Q. et al. (2018). Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell reports, 23(1), 227–238.e3. https://doi.org/10.1016/j.celrep.2018.03.050
Lin, J. et al. (2018). Bioinformatics Assembling and Assessment of Novel Coxsackievirus B1 Genome. Methods in molecular biology (Clifton, N.J.), 1838, 261–272. https://doi.org/10.1007/978-1-4939-8682-8_18
Kaikkonen, E. et al. (2018). ANO7 is associated with aggressive prostate cancer. International journal of cancer, 143(10), 2479–2487. https://doi.org/10.1002/ijc.31746
Barton, H. J. et al. (2018). New Methods for Inferring the Distribution of Fitness Effects for INDELs and SNPs. Molecular biology and evolution, 35(6), 1536–1546. https://doi.org/10.1093/molbev/msy054
Kim, J. M. et al. (2018). A high-density SNP chip for genotyping great tit (Parus major) populations and its application to studying the genetic architecture of exploration behaviour. Molecular ecology resources, 18(4), 877–891. https://doi.org/10.1111/1755-0998.12778
Corcoran, P. et al. (2017). Determinants of the Efficacy of Natural Selection on Coding and Noncoding Variability in Two Passerine Species. Genome biology and evolution, 9(11), 2987–3007. https://doi.org/10.1093/gbe/evx213
Määttä, K. et al. (2016). Whole-exome sequencing of Finnish hereditary breast cancer families. European journal of human genetics : EJHG, 25(1), 85–93. https://doi.org/10.1038/ejhg.2016.141
Pritchard, C. C. et al. (2016). Inherited DNA-Repair Gene Mutations in Men with Metastatic Prostate Cancer. The New England journal of medicine, 375(5), 443–453. https://doi.org/10.1056/NEJMoa1603144
Laitinen, V. H. et al. Germline copy number variation analysis in Finnish families with hereditary prostate cancer. The Prostate, 76(3), 316–324. https://doi.org/10.1002/pros.23123
Hannon, E. et al. (2016). An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation. Genome biology, 17(1), 176. https://doi.org/10.1186/s13059-016-1041-x

Browse all

Contact us

Leave your email address here with a brief description of your needs, and we will contact you to get things moving forward!

Antti Ylipää CEO, co-founder Genevia Technologies Oy +358 40 747 7672