Existing methods for diagnosing cancer, such as histology and imaging for specific markers, tend to vary depending on the variety of the disease, as is the case for the PAM50 whose primary clinical use is to subtype breast cancers. Our research describes a one-stop cancer classification tool capable of accurately differentiating and identifying 27 types of cancer and their subtypes. Created using machine learning of thousands of tissue samples drawn from the Cancer Genome Atlas and verified against both independent databases and patient samples, the classifiers reveal that each cancer type and subtype has a unique and highly precise glycosyltransferase (GT) signature. These signatures can be quickly identified from unlabelled tissue, revealing more than nine times out of 10 whether the sample is cancerous, which type of cancer it is and which sub-variety it is. Our cancer classifiers open a new frontier in using the precision of glycan expression patterns to diagnose cancer because, as we demonstrate, a subset of GT genes are almost universally present as the disease spreads, and therefore have a functional role in cancer’s destructiveness. The broad scope of the tool doesn’t compromise accuracy, either; our breast cancer classifier far surpassed the accuracy of the most widely used industry test, the PAM50, correctly identifying in external testing more than double the number of Luminal A tumor samples. By streamlining the process of diagnosis, the implications for pan-cancer diagnostics and prognosis are significant both in time and cost savings. One classifier can even predict the probability of survival for patients diagnosed with glioma, by zooming in on four GT genes that are strongly related to prognosis. The cancer classifiers described here have great potential to be adapted to create a web-based platform for performing various cancer diagnostic and prognostic functions.
Associate Professor, Bioscience