DNA sequencing data analysis
Understand the effects of genetic variation and mutations with DNA sequencing data analysis.
DNA-sequencing comes in many forms. Whole-genome sequencing (WGS), whole-exome sequencing (WES) and targeted sequencing enable studying heritable and somatic DNA variants. In addition to NGS data, SNP and CGH arrays can be used to identify genetic polymorphisms and copy-number variants, respectively. Metagenomic whole-genome sequencing of microbial communities allows analyzing their compositions and functions.
We routinely analyze DNA sequence data to address research questions in both basic biology and biomedical settings. Below we present some of the typical DNA-sequencing data analyses. If you are interested to learn how we can help you to get the most out of your DNA-seq data, leave us a message and we will book you a short call with our expert.
Leave us a short description of your bioinformatics needs and we will be in touch very soon!
Variant analysis
In most cases, DNA sequencing is employed in order to identify and analyze genetic variants. These variants can be small nucleotide substitutions, insertions, deletions, copy-number alterations or structural variants. Futhermore, they may be heritable polymorphisms or somatic mutations.
Variant analysis typically starts with the quality control of raw DNA-sequencing data and aligning the sequencing reads against a reference genome. Variants that differ between the sample and public reference — or between different samples — can then be computationally identified.
A crucial part of variant analysis is annotating the detected variants. Annotations such as allele frequencies (both in-sample and in public databases such as gnomAD), predicted effects on protein structure or gene regulation and predicted pathogeneicity allow for flexible selection or ranking of variants for downstream analyses and interpretation.
Variant analysis in cancer research often focuses on identifying somatic mutations which accelerate tumorigenesis (driver mutations) or that can be used to diagnose a patient or predict their course of disease. Learn more about mutation analysis in cancer research.
Tumor evolutionary analysis
Tumor evolution underpins cancer's ability to adapt under selective pressures imposed by therapies. Somatic mutations at a subclonal level can be used to reveal the clonal structure of a tumor and track it through processes such as relapse and metastasis.
Learn more about tumor clonality analysis.
Genome assembly
For organisms with no reference genomes or highly dynamic genomes, DNA-sequencing data analysis starts with assembling a genome de novo. Genome assembly benefits from deep whole-genome sequencing.
An assembled genome is annotated based on sequence homology, predicted gene sequences and, if available, RNA-sequencing data from the same organism. If annotated genomes for close relative species exist, the annotation can be improved by transferring gene information to the newly assembled genome.
The quality of an assembled genome is assessed using metrics such as N50, L50 and completness with regards to highly conserved orthologs. A new high-quality genome enables analyses into pan-genomes, population genetics and much more!
Metagenomics
Metagenomics offers an unbiased view into the microbial diversity of ecological niches including samples from host organisms and soil. Using shot-gun whole-genome sequencing data, reads are assembled into contigs and assigned to species or operational taxonomic units (OTUs).
Identified species or OTUs are organized into a phylogeny and quantified. The functions brought about by individual genes or multi-gene pathways present in the sequenced community can be identified using public databases.
Note that 16S amplicon sequencing, a cost-effective alternative to metagenomic sequencing, can be used to identify species and build phylogenies, but it does not allow for high-quality functional analyses.
Population genetics
Genome-wide measurements of individuals sampled from related populations contain rich information on the populations’ structure, genealogy and history. Population genetic analyses of non-model organisms often begin with genome assembly and annotation, and proceed to identifying genetic polymorphisms in the sampled populations. The downstream analyses based on these polymorphisms and their allele frequencies help studying evolutionary phenomena such as speciation and adaptation.
Typical analyses involve principal component analysis, analysis of genetic variation within and between populations to identify loci affected by evolutionary selection, and analyses of population admixtures, phylogeny and demographic histories.
Genome-wide association analysis
Biomedically motivated population-scale genetic analyses aim to identify genes and variants associated to relevant phenotypes or diseases. Apart from the few diseases which are monogenic and strongly heritable, most diseases require large, population-level sample sizes to achieve sufficient statistical power to find associations. Such genome-wide association studies (GWAS) are based on SNP-array or DNA-sequencing data from biobanks or other large repositories.
GWAS results in summary statistics on the association between each individual variant and the studied disease. In the case of polygenic diseases, individual variants may have very weak effect sizes even when the disease is strongly heritable. In such cases, polygenic risk scores (PRS) can be used to sum the effect of a large number of variants, resulting in a combined risk score with potential clinical utility.
Meet some of our genomics experts
I am a senior computational biologist specializing in cancer bioinformatics and precision oncology. With over 10 years of experience in multi-omics data analysis (WES, RNA-Seq, microarrays, proteomics), I focus on biomarker discovery and biological applications in areas such as cancer, immuno-oncology, cell signaling, and clinical pharmacology.
My scientific interests are strongly oriented toward functional knowledge discovery, such as molecular footprint approaches for inferring regulatory activities, to drive translational cancer research. I am also highly proficient in leveraging publicly available bioinformatics tools and clinical databases to reconstruct within-patient tumor histories to support personalized medicine.
Additionally, I possess a solid foundation in statistics, R programming, and the development of open-source, reproducible reports.
I am specialised in population genetics and evolutionary genetics, with over 8 years’ experience constructing Python and R based workflows for a wide range of genetic datasets.
I have worked on data types including whole genome re-sequencing, multispecies whole genome alignments, RNA sequencing, SNP genotyping arrays (high and low density), from which I have performed analysis on types of variation including, SNPs/SNVs, INDELs and CNVs, both in present day populations using polymorphism data and over evolutionary time using divergence data.
I have worked on a broad range of scientific topics in my career including genome evolution, detecting targets of selection, cancer genetics and viral analysis. Organism-wise, I have worked on data from across the tree of life, including humans, mice, fish, birds, insects, grasses and viruses.
I am a bioinformatician specialised in cancer genomics and genetics with a 10-year experience analysing omics data in countless genetic, genomic, transcriptomic and epigenetic studies.
While my research focus has been in cancer, I have also gained experience in a number of other fields, such as immunology, aging and developmental biology. In recent years, I have also applied machine learning methods to harness biomedical data in various clinical applications.
Learn more
Above we introduced some of the computational analyses applied to various types of DNA-seq data. However, our team's experience ranges much deeper — take a look at our references and publications.
Learn more about DNA-seq data analysis
References and customer cases
- Whole-genome and RNA sequencing analyses of hereditary cancer (University of Turku)
- Pharmacogenomic sequencing analysis (Karolinska Institute)
- Reindeer population genomics (LUKE)
- Bacterial genomics (University of Helsinki)
Selected publications from our customers
- Zhong, M. et al. (2024). A meta-analysis and polygenic score study identifies novel genetic markers for waist-hip ratio in African populations. Obesity (Silver Spring, Md.), 10.1002/oby.24123. Advance online publication. https://doi.org/10.1002/oby.24123
- Karihtala, P. et al. (2024). Mutational signatures and their association with cancer survival and gene expression in multiple cancer types. International journal of cancer, 10.1002/ijc.35148. Advance online publication. https://doi.org/10.1002/ijc.35148
- Adebamowo, S. N. et al. (2024). Genome, HLA and polygenic risk score analyses for prevalent and persistent cervical human papillomavirus (HPV) infections. European journal of human genetics : EJHG, 10.1038/s41431-023-01521-7. Advance online publication. https://doi.org/10.1038/s41431-023-01521-7
- Boyd, S., et al. (2024). NGS of brush cytology samples improves the detection of high-grade dysplasia and cholangiocarcinoma in patients with primary sclerosing cholangitis: A retrospective and prospective study. Hepatology communications, 8(4), e0415. https://doi.org/10.1097/HC9.0000000000000415
- Karihtala, P. et al. (2023). Mutational signatures and their association with survival and gene expression in urological carcinomas. Neoplasia (New York, N.Y.), 44, 100933. Advance online publication. https://doi.org/10.1016/j.neo.2023.100933
- Karihtala, P. et al. (2022). Comparison of the mutational profiles of neuroendocrine breast tumours, invasive ductal carcinomas and pancreatic neuroendocrine carcinomas. Oncogenesis, 11(1), 53. https://doi.org/10.1038/s41389-022-00427-1
- Yuan, O. et al. (2022). A somatic mutation in moesin drives progression into acute myeloid leukemia. Science advances, 8(16), eabm9987. https://doi.org/10.1126/sciadv.abm9987
- Wahlström, G. et al. (2022). The variant rs77559646 associated with aggressive prostate cancer disrupts ANO7 mRNA splicing and protein expression. Human molecular genetics, ddac012. Advance online publication. https://doi.org/10.1093/hmg/ddac012
- Karihtala, P. et al. (2022). Mutational Signatures Associate With Survival in Gastrointestinal Carcinomas. Cancer genomics & proteomics, 19(5), 556–569. https://doi.org/10.21873/cgp.20340
- Ribeiro, R. et al. (2022). Synchronous Epidermodysplasia Verruciformis and Intraepithelial Lesion of the Vulva is Caused by Coinfection with α-HPV and β-HPV Genotypes and Facilitated by Mutations in Cell-Mediated Immunity Genes. Preprint at https://doi.org/10.21203/rs.3.rs-1991512/v1
- Pernaute-Lau, L. et al. (2021). Pharmacogene Sequencing of a Gabonese Population with Severe Plasmodium falciparum Malaria Reveals Multiple Novel Variants with Putative Relevance for Antimalarial Treatment. Antimicrobial agents and chemotherapy, 65(7), e0027521. https://doi.org/10.1128/AAC.00275-21
- Åvall-Jääskeläinen, S. et al. (2021). Genomic Analysis of Staphylococcus aureus Isolates Associated With Peracute Non-gangrenous or Gangrenous Mastitis and Comparison With Other Mastitis-Associated Staphylococcus aureus Isolates. Frontiers in microbiology, 12, 688819. https://doi.org/10.3389/fmicb.2021.688819
- Wullt, B. et al. (2021). Immunomodulation-A Molecular Solution to Treating Patients with Severe Bladder Pain Syndrome?. European urology open science, 31, 49–58. https://doi.org/10.1016/j.euros.2021.07.003
- Gallegos, J. E. et al. (2020). Challenges and opportunities for strain verification by whole-genome sequencing. Scientific reports, 10(1), 5873. https://doi.org/10.1038/s41598-020-62364-6
- Tikkanen, T. et al. (2018). Seshat: A Web service for accurate annotation, validation, and analysis of TP53 variants generated by conventional and next-generation sequencing. Human mutation, 39(7), 925–933. https://doi.org/10.1002/humu.23543
Selected publications from our team
- Fonseca, N. M. et al. (2024). Prediction of plasma ctDNA fraction and prognostic implications of liquid biopsy in advanced prostate cancer. Nature communications, 15(1), 1828. https://doi.org/10.1038/s41467-024-45475-w
- Kallio, H. M. et al. (2024). Sensitive circulating tumor DNA-based residual disease detection in epithelial ovarian cancer. Life science alliance, 7(6), e202402658. https://doi.org/10.26508/lsa.202402658
- Nurminen, A. et al. (2023). Cancer origin tracing and timing in two high-risk prostate cancers using multisample whole genome analysis: prospects for personalized medicine. Genome medicine, 15(1), 82. https://doi.org/10.1186/s13073-023-01242-y
- Ricordel, C. et al. (2023). Genomic characteristics and clinical significance of CD56+ circulating tumor cells in small cell lung cancer. Scientific reports, 13(1), 3626. https://doi.org/10.1038/s41598-023-30536-9
- Kontogianni, G. et al. (2023). A Comprehensive Analysis of Cutaneous Melanoma Patients in Greece Based on Multi-Omic Data. Cancers, 15(3), 815. https://doi.org/10.3390/cancers15030815
- Tielbeek, J. J. et al. (2022). Uncovering the genetic architecture of broad antisocial behavior through a genome-wide association study meta-analysis. Molecular psychiatry, 10.1038/s41380-022-01793-3. Advance online publication. https://doi.org/10.1038/s41380-022-01793-3
- Rautajoki, K. J. et al. (2022). PTPRD and CNTNAP2 as markers of tumor aggressiveness in oligodendrogliomas. Scientific reports, 12(1), 14083. https://doi.org/10.1038/s41598-022-14977-2
- van Heukelum, S. et al. (2021). A central role for anterior cingulate cortex in the control of pathological aggression. Current biology : CB, 31(11), 2321–2333.e5. https://doi.org/10.1016/j.cub.2021.03.062
- Rajamäki, K. et al. (2021). Genetic and Epigenetic Characteristics of Inflammatory Bowel Disease-Associated Colorectal Cancer. Gastroenterology, 161(2), 592–607. https://doi.org/10.1053/j.gastro.2021.04.042
- Vandekerkhove, G. et al. (2021). Plasma ctDNA is a tumor tissue surrogate and enables clinical-genomic stratification of metastatic bladder cancer. Nature communications, 12(1), 184. https://doi.org/10.1038/s41467-020-20493-6
- Cerqueira, J. et al. (2021). Independent and cumulative coeliac disease-susceptibility loci are associated with distinct disease phenotypes. Journal of human genetics, 66(6), 613–623. https://doi.org/10.1038/s10038-020-00888-5
- Yusuf, L. et al. (2020). Noncoding regions underpin avian bill shape diversification at macroevolutionary scales. Genome research, 30(4), 553–565. https://doi.org/10.1101/gr.255752.119
- Lindfors, K. et al. (2020). Metagenomics of the faecal virome indicate a cumulative effect of enterovirus and gluten amount on the risk of coeliac disease autoimmunity in genetically at risk children: the TEDDY study. Gut, 69(8), 1416–1422. https://doi.org/10.1136/gutjnl-2019-319809
- Hayes, K. et al. (2020). A Study of Faster-Z Evolution in the Great Tit (Parus major). Genome biology and evolution, 12(3), 210–222. https://doi.org/10.1093/gbe/evaa044
- Taavitsainen, S. et al. (2019). Evaluation of Commercial Circulating Tumor DNA Test in Metastatic Prostate Cancer. JCO precision oncology, 3, PO.19.00014. https://doi.org/10.1200/PO.19.00014
- Zeng, K. et al. (2019). Methods for Estimating Demography and Detecting Between-Locus Differences in the Effective Population Size and Mutation Rate. Molecular biology and evolution, 36(2), 423–433. https://doi.org/10.1093/molbev/msy212
- Barton, H. J. et al. (2019). The Impact of Natural Selection on Short Insertion and Deletion Variation in the Great Tit Genome. Genome biology and evolution, 11(6), 1514–1524. https://doi.org/10.1093/gbe/evz068
- Olofsson, J. K. et al. (2019). Population-Specific Selection on Standing Variation Generated by Lateral Gene Transfers in a Grass. Current biology : CB, 29(22), 3921–3927.e5. https://doi.org/10.1016/j.cub.2019.09.023
- Gao, Q. et al. (2018). Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell reports, 23(1), 227–238.e3. https://doi.org/10.1016/j.celrep.2018.03.050
- Lin, J. et al. (2018). Bioinformatics Assembling and Assessment of Novel Coxsackievirus B1 Genome. Methods in molecular biology (Clifton, N.J.), 1838, 261–272. https://doi.org/10.1007/978-1-4939-8682-8_18
- Kaikkonen, E. et al. (2018). ANO7 is associated with aggressive prostate cancer. International journal of cancer, 143(10), 2479–2487. https://doi.org/10.1002/ijc.31746
- Barton, H. J. et al. (2018). New Methods for Inferring the Distribution of Fitness Effects for INDELs and SNPs. Molecular biology and evolution, 35(6), 1536–1546. https://doi.org/10.1093/molbev/msy054
- Kim, J. M. et al. (2018). A high-density SNP chip for genotyping great tit (Parus major) populations and its application to studying the genetic architecture of exploration behaviour. Molecular ecology resources, 18(4), 877–891. https://doi.org/10.1111/1755-0998.12778
- Corcoran, P. et al. (2017). Determinants of the Efficacy of Natural Selection on Coding and Noncoding Variability in Two Passerine Species. Genome biology and evolution, 9(11), 2987–3007. https://doi.org/10.1093/gbe/evx213
- Määttä, K. et al. (2016). Whole-exome sequencing of Finnish hereditary breast cancer families. European journal of human genetics : EJHG, 25(1), 85–93. https://doi.org/10.1038/ejhg.2016.141
- Pritchard, C. C. et al. (2016). Inherited DNA-Repair Gene Mutations in Men with Metastatic Prostate Cancer. The New England journal of medicine, 375(5), 443–453. https://doi.org/10.1056/NEJMoa1603144
- Laitinen, V. H. et al. Germline copy number variation analysis in Finnish families with hereditary prostate cancer. The Prostate, 76(3), 316–324. https://doi.org/10.1002/pros.23123
- Hannon, E. et al. (2016). An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation. Genome biology, 17(1), 176. https://doi.org/10.1186/s13059-016-1041-x
Contact us
Leave your email address here with a brief description of your needs, and we will contact you to get things moving forward!