Predictive analytics for cancer


A university research group was developing a clinical nomogram that would predict the existence of nodal metastases in order to avoid over-treatment in cases of breast cancer. They wanted to incorporate molecular data from the primary tumor into the model, but in order to devise a complex model, they needed solid machine-learning expertise.


We used genome-wide gene expression signatures and 11 clinical parameters from over 4000 primary breast cancers to assess five predictive models: k-nearest neighbour, naive Bayes, random forest, support vector machine and logistic regression.


We found the random forest classifier applied to both clinical and gene expression data to be the best performing model. The informative genes used in the model provided new directions for the team’s breast cancer research.

Interested in this service? Leave your email and we can tell you more details!