Machine learning models for predictive genomics: From variant interpretation to early risk stratification

Authors

Sambasiva Rao Suura
Sr Integration Developer, Natera Inc, Austin

Synopsis

There is a growing focus on how to translate genomic data into new clinical applications. While the increased availability of genomic data allows the identification of genetic mutation carriers in their early years, the heterogeneity of genetic background makes it hard to understand the relationship between genetic variants and phenotypes. Machine learning models have proven to be a promising approach for a broader and highly worthwhile prediction of genetic-related genomic data. Some of the studies focused on variant interpretation for Mendelian diseases, subsetting input at first, and training a model for the mutation impact classification, because they interpret the variants of unclear significance and this could narrow down a significant percentage of SNV in most individuals.

Early risk stratification of chronic diseases has shown to be more effective for therapeutic and reversible intervention. One approach is analyzing gene sets and constructing a proxy SNP set, focusing on genes that are meaningful for a specific disease. Another is using SNP features directly. Studies creating a predictive feature of SNP data focused on early-stage diseases, where it is hard to determine affected genes beforehand. Those can be useful for various polygenic diseases, complex traits, and common disease susceptibility as well. Early risk stratification studies have been conducted primarily on retrospectively collected case-control data, using prevalent cases, with most studies just reporting the AUC. To apply these models in real world clinical practice, it is desirable to conduct a prospective cohort study, possibly utilizing cohort consortium data from multi-centers or countries, examining incident cases, and showing more comprehensive evaluation studies. With the trends towards preventative precision medicine, such prediction models could become prerequisites for participation in chronic disease preventative programs or for initiation of preventative medication. In this direction, there are some studies that have started with drug repurposing on at-risk individuals.

Published

13 April 2025

Categories

How to Cite

Suura, S. R. . (2025). Machine learning models for predictive genomics: From variant interpretation to early risk stratification. In Integrating Artificial Intelligence, Machine Learning, and Big Data with Genetic Testing and Genomic Medicine to Enable Earlier, Personalized Health Interventions (pp. 37-52). Deep Science Publishing. https://doi.org/10.70593/978-93-49307-76-6_3