Precision Oncology: An Ensembled Machine Learning Approach to Identify a Candidate mRNA Panel for Stratification of Patients with Breast Cancer


OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2022 (Peer-Reviewed Journal) identifier identifier

  • Publication Type: Article / Article
  • Publication Date: 2022
  • Doi Number: 10.1089/omi.2022.0089
  • Journal Indexes: Science Citation Index Expanded, Scopus, Academic Search Premier, BIOSIS, CAB Abstracts, Chemical Abstracts Core, EMBASE, MEDLINE, Veterinary Science Database
  • Keywords: precision oncology, machine learning, ensembled learning, RNA-seq, breast cancer, algorithms


The rise of machine learning (ML) has recently buttressed the efforts for big data-driven precision oncology. This study used ensemble ML for precision oncology in breast cancer, which is one of the most common malignancies worldwide with marked heterogeneity of the underlying molecular mechanisms. We analyzed clinical and RNA-seq data from The Cancer Genome Atlas (TCGA) (844 patients with breast cancer and 113 healthy individuals) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) (1784 patients with breast cancer and 202 healthy individuals). We evaluated six algorithms in the context of ensemble modeling and identified a candidate mRNA diagnostic panel that can differentiate patients from healthy controls, and stratify breast cancer into molecular subtypes. The ensemble model included 50 mRNAs and displayed 82.55% accuracy, 79.22% specificity, and 84.55% sensitivity in stratifying patients into molecular subtypes in TCGA cohort. Its performance was markedly higher, however, in distinguishing the basal, LumB, and Her2+ breast cancer subtypes from healthy individuals. In overall survival analysis, the mRNA panel showed a hazard ratio of 2.25 (p = 5 x 10(-7)) for breast cancer and was significantly associated with molecular pathways related to carcinogenesis. In conclusion, an ensemble ML approach, including 50 mRNAs, was able to stratify patients with different breast cancer subtypes and differentiate them from healthy individuals. Future prospective studies in large samples with deep phenotyping can help advance the ensemble ML approaches in breast cancer. Advanced ML methods such as ensemble learning are timely additions to the precision oncology research toolbox.