Predictive analytics for cancer

Learn how a research group in Lund University got a state-of-the-art predictive model utilizing gene expression and clinical data from over 4000 patient samples

Problem

A university research group was developing a clinical nomogram that would predict the existence of nodal metastases in order to avoid over-treatment in cases of breast cancer. They wanted to incorporate molecular data from the primary tumor into the model, but in order to devise a complex model, they needed solid machine-learning expertise.

Solution

We used genome-wide gene expression signatures and 11 clinical parameters from over 4000 primary breast cancers to assess five predictive models: k-nearest neighbour, naive Bayes, random forest, support vector machine and logistic regression.

Outcome

We found the random forest classifier applied to both clinical and gene expression data to be the best performing model. The informative genes used in the model provided new directions for the team’s breast cancer research.

Do you want to learn more about this case? Please ask for more from our bioinformatician by using the contact form below!