The recognition rate is close to 90%! AI robot: Tumor, where can you hide?

The recognition rate is close to 90%! AI robot: Tumor, where can you hide?

Written by: Tian Xiaoting

Currently, cancer has become one of the leading causes of death in the world, with millions of people dying from cancer every year. The World Health Organization has proposed that one-third of cancers can be cured through early detection and early treatment.

However, cancer detection has always been a major challenge in the medical field, especially in pathological analysis. Accurate identification and diagnosis of tumors are crucial to patient treatment, but traditional pathological examinations rely heavily on the experience and expertise of experts.

With the development of large models such as GPT-4, research on using artificial intelligence (AI) to assist pathological diagnosis has gradually emerged, but many AI systems still have problems with insufficient performance and poor interactivity in practical applications .

Recently, a research team from Harvard Medical School and its collaborators developed a visual language general AI assistant for human pathology - PathChat . The system can correctly identify diseases from biopsy slices in nearly 90% of cases, outperforming general AI models and professional medical models currently on the market, such as GPT-4V.

The related research paper, titled “A Multimodal Generative AI Copilot for Human Pathology”, has been published in the scientific journal Nature.

It is worth noting that this breakthrough technology can not only identify tumors, but also interact with users , providing new tools and perspectives for the diagnosis and research of pathology.

PathChat: Multimodal pathology detection AI assistant

Over the years, computational pathology has made great progress in the fields of pathological morphology data analysis, molecular detection data analysis, etc. This sub-research field formed by the intersection of pathology with technologies such as AI and computer vision is gradually becoming a research hotspot in the field of medical image analysis.

Computational pathology uses image processing and AI technology to build AI computational pathology models, obtain histopathological images, and conduct preliminary assessments of the morphological appearance of histopathological images in order to achieve auxiliary diagnosis, quantitative evaluation, and decision-making through automatic image analysis technology.

At present, with the explosive growth of generative AI technologies represented by ChatGPT, multimodal large language models (MLLMs) are increasingly used in computational pathology research and pathology clinical practice. However, in the highly specialized subfield of anatomical pathology, research on building a general, multimodal AI assistant for pathology is still in its infancy.

In this work, the research team designed a multimodal generative AI assistant specifically for human pathology research, PathChat. They pre-trained more than 100 million cell tissue image fragments from more than 1 million slices through self-supervised learning, and combined it with a SOTA pure visual encoder UNI to generate an MLLM that can reason about visual and natural language inputs. After fine-tuning a dataset of more than 450,000 instructions, PathChat was built.

Figure | Instruction fine-tuning dataset and PathChat construction. (Source: the paper)

The study found that PathChat can not only handle multimodal input, but also provide accurate responses to complex pathology-related queries, correctly identifying diseases from biopsy slides in nearly 90% of cases.

Surpassing GPT-4V, with an accuracy rate of nearly 90%

To test the detection performance of PathChat, the research team compared PathChat with the open source model LLaVA, LLaVA-Med customized for the biomedical field, and GPT-4V.

They designed a PathQABench comparative experiment and compared the detection performance of PathChat with LLaVA, LLaVA-Med, and GPT4V by analyzing pathological cases from different organ sites and practices.

Figure|Multiple-choice evaluation of PathChat. (Source: the paper)

The results showed that without providing clinical context, PathChat's diagnostic accuracy was significantly better than LLaVA 1.5 and LLaVA-Med. When only evaluating images, PathChat's accuracy on all combined benchmarks was 78.1%, 52.4% higher than LLaVA 1.5 and 63.8% higher than LLaVA-Med.

After providing clinical context, PathChat's accuracy further improved to 89.5% , which is 39.0% higher than LLaVA 1.5 and 60.9% higher than LLaVA-Med.

Through comparative experiments, it was found that PathChat can obtain a lot of predictive capabilities from the visual features of the image, rather than relying solely on clinical background. It only needs non-visual information provided by ordinary natural language to effectively and flexibly use multimodal information to accurately diagnose histological images.

In order to objectively evaluate the accuracy of each model's answers to open-ended questions, the research team recruited 7 pathologists to form an evaluation team. By comparing the responses of the 4 models to 260 open-ended questions, they analyzed the accuracy of model detection.

Figure | Public response evaluation of PathChat and reader research by a panel of seven pathologists. (Source: the paper)

Finally, on open-ended questions where the seven experts were able to reach a consensus, PathChat’s overall accuracy was 78.7%, 26.4%, 48.9%, and 48.1% higher than GPT-4V, LLaVA 1.5, and LLaVA-Med, respectively. Overall, PathChat showed superior performance compared to the other three models .

The researchers said that PathChat can analyze and describe subtle morphological details in pathological tissue images, and in addition to image input, it can also answer questions that require background knowledge in pathology and general biomedicine, and is expected to become an important auxiliary tool for pathologists and researchers.

Although PathChat performed well in experiments, it still faces some challenges in practical applications. For example, how to ensure that the model can identify invalid queries and avoid erroneous outputs, how to keep up to date with the latest medical knowledge, etc. In addition, PathChat's training data mainly comes from historical data, which may cause it to reflect "past scientific consensus" rather than the latest information.

The researchers said that future research may further enhance PathChat's capabilities, including supporting entire gigapixel WSIs or multiple WSI inputs, and integrating support for more specific tasks, such as accurately counting or locating objects. In addition, integrating PathChat with tools such as digital slide viewers or electronic medical records may also be more conducive to improving its practicality in clinical practice.

Recently, the multimodal generative artificial intelligence model PathChat 2 was released. It can reason about pathology images and texts, accept alternating inputs of multiple high-resolution images and texts in an interactive slide viewer, and thus provide a more comprehensive assessment of each consultation case.

Compared with PathChat 1, it has significantly improved performance in differential diagnosis and morphological description, and is also better at instruction tracking and performing various tasks such as open-ended question answering and report summarization.

References:

https://www.nature.com/articles/s41586-024-07618-3

https://www.modella.ai/intro.html

<<:  The "little devil" in the water is trending again! Why are invasive golden apple snails more deadly than native snails?

>>:  Urgent reminder! There is a risk of electric shock when opening a delivery locker during a thunderstorm!

Recommend

How to get more customers smartly?

If we divide the company's customer base base...

How to use short videos to direct traffic to the live broadcast room!

This is a relatively hardcore article. In additio...

Enterprise Weibo operation methods and strategies!

When it comes to corporate Weibo operations , aft...

Snowflakes are all hexagonal, and each one is unique. Why? | Xian Shuo Ba Dao

The phenomenon of water freezing at 0°C that you ...

How to build your own traffic circulation system?

Before I start sharing how to build a traffic cir...

Be careful of burns if you use these "winter magic tools" incorrectly!

Winter is cold, and people begin to pay attention...

A guide to creating a short video matrix for “Reading Books in Your Palm”!

I believe everyone is familiar with ZhanDuShu , w...

The competitors that keep Apple awake at night

Apple provides a wide range of products and servi...