With an accuracy rate of over 98%, AI unlocks a new way to screen for cancer early! Will the era of early cancer treatment be here?

With an accuracy rate of over 98%, AI unlocks a new way to screen for cancer early! Will the era of early cancer treatment be here?

Cancer has always been one of the most challenging diseases faced by humans, with more than 19 million new cases and 10 million deaths each year. Early detection of cancer combined with existing treatments can significantly improve the survival rate and treatment effects of various cancer types.

Now, artificial intelligence (AI) promises to speed up this process, and doctors may soon be able to use AI to detect and diagnose cancer in their patients, allowing for earlier treatment .

Recently, a research team from Imperial College London and the University of Cambridge trained an artificial intelligence model, EMethylNET, which identified 13 different types of cancer (including breast cancer, liver cancer, lung cancer and prostate cancer) from non-cancerous tissues by observing DNA methylation patterns, with an accuracy rate of up to 98.2% .

The related paper, titled "Early detection and diagnosis of cancer with interpretable machine learning to uncover cancer-specific DNA methylation patterns", has been published in Biology Methods and Protocols.

According to the paper, the model relies on tissue samples (rather than DNA fragments in the blood) and is still in the experimental stage. Additional training and testing on more diverse biopsy samples are required before it can be further used in clinical practice.

The researchers believe that an important significance of this study is the use of an explainable artificial intelligence model to provide an explanation of the logic behind its predictions . The study also explored the inner workings of their model and found that the model has significantly improved the understanding of the underlying processes of carcinogenesis.

The multi-classification model performed well with an accuracy rate of over 98%.

Cancer has always been one of the most challenging diseases faced by humans. The evolution of cancer is extremely complex, and the difficulty of treatment increases with the time of discovery. Early screening of cancer is crucial and is one of the important directions that the medical community has been working hard to overcome.

Genetic information is encoded in the pattern of four bases (A, T, G and C) in DNA. Environmental changes outside the cell can cause certain DNA bases to be modified by adding methyl groups, a process called "DNA methylation". Each cell has millions of these DNA methylation marks. Researchers have observed changes in these marks during the early development of cancer and believe that they may help in the early diagnosis of cancer. Identifying DNA methylation signatures specific to different cancer types is as difficult as finding a needle in a haystack.

In this work, the research team used machine learning methods to identify cancer-specific changes from normal tissue-specific methylation, using DNA methylation microarray data from 13 cancer types and corresponding normal tissues. The methylome data was based on the Illumina Infinium array, and the data was extracted, cleaned, and processed as described in the methods. The methylation microarray data was analyzed to determine the ratio of the methylated probe intensity to the overall intensity (called the beta value) at a given CpG position using a pair of methylated and unmethylated probes.

They trained and evaluated four different model types: logistic regression, support vector machine (SVM), gradient boosted decision tree (XGBoost), and deep neural network (DNN). For the first three model types, binary and multi-classification models were created.

Since the binary logistic regression model does not perform significantly better than the binary XGBoost model, and the MCC score of multiclass logistic regression is lower than that of multiclass XGBoost and DNN, this study focuses the analysis on XGBoost and DNN.

When tested on these independent datasets, most binary XGBoost models (trained on TCGA data) performed well. To create more robust models and improve on these results, the researchers designed EMethylNET, a model consisting of a DNN model trained on features learned from multi-class XGBoost that further improved performance.

Figure | Method Overview

Five of the 13 models (COAD, KIRC, LUAD, LUSC, and UCEC) achieved perfect test set performance by binary classification of DNA methylation in individual tumor and normal tissues to detect cancer status. Across all models, the average accuracy was 98.7% and the average MCC (a performance metric not affected by severe class imbalance) was 91.9%.

They trained a multi-class XGBoost model on the entire training data, which can highly accurately distinguish 13 cancer types from normal samples, with an overall accuracy of 98.2% and an overall MCC of 98.0% . At the same time, the model achieves high accuracy on independent heterogeneous datasets and also shows good performance on independent datasets.

Figure | Performance of binary XGBoost models on independent datasets

The literature on cancer detection and classification using methylation-based methods is large and growing. A comparative analysis of EMethylNET with other related studies demonstrates that EMethylNET achieves competitive test set performance among similar works.

Table | Summary of related research

Multiple types of genes are closely associated with cancer-related processes

A key advantage of using interpretable methods such as XGBoost is that features for classification can be identified . The research team explored PCCs from a multi-class XGBoost model (i.e., the input features of EMethylNET). PCCs can be mapped to proximal genes—genes whose gene bodies or promoter regions (as a 1500 base pair window upstream of the transcription start site) overlap with the PCCs. Genes obtained by mapping multi-class PCCs to proximal genes are called "multi-class genes."

They performed functional enrichment analysis on the multi-class gene set and found that it was enriched in genes that contribute to carcinogenesis and transcriptional regulatory features, and enriched in cancer-related pathways and networks. The multi-class gene set consisted of 229 known tumor suppressors and oncogenes, 546 transcriptional regulators, and was involved in a wide range of cancer-related pathways and processes.

In addition, they found that the gene list contained many noncoding RNA genes, mainly composed of lncRNAs, which is consistent with the view that more and more studies have shown that lncRNAs and other noncoding RNAs play a key role in carcinogenesis.

Compared with related studies, this study is the first to provide an in-depth signature analysis in which CpGs were freely selected by the model without prior feature selection that would add potential bias to the signature analysis results.

Will AI be able to predict cancer soon?

"Through better training on more diverse data and rigorous testing in the clinic, computational approaches like this will ultimately provide AI models that can help doctors with early detection and screening of cancer," said Shamith A Samarajiwa, corresponding author of the paper. "This will lead to better treatment outcomes."

Depending on the availability of training data, this approach can be extended to detect hundreds of cancer types. Future applications include extending this approach to DNA methylation data from cell-free DNA, with the ultimate goal of early detection of multiple types of cancer via liquid biopsy approaches.

In addition, a clear clinical application of this approach is screening for specific cancer types or cancers of unknown origin, and although the current model is not optimized for this purpose, there is room for expansion in this area.

Reference Links:
https://academic.oup.com/biomethods/article/9/1/bpae028/7696058

<<:  A little thing that can relieve stress more than "20 minutes in the park" at 0 cost! You will feel happy immediately after doing it

>>:  Is there a "sleeping" underwater ancient town here? Huai'an Xinlu ruins reveal the changes of the Grand Canal

Recommend

Collection of Double Ninth Festival poster copywriting!

After the Mid-Autumn Festival and National Day, w...

How to design a popular H5? 5 tips to make you a master!

It’s time to learn more. A few days ago, an H5 ad...

Can I open a Baidu framework account for promotion in Beijing?

Baidu framework account promotion and account ope...

How to use the popular TV series "The Ming Dynasty" to attract targeted fans?

There is never a shortage of hot topics on the In...

Broken Lollipop: Five things that need to be fixed in Android 5.0

[[127004]] Despite the improvements, there are st...