Skip to main content
News

ChatGPT Vs Multidisciplinary Tumor Board: Treatment Recommendations for Patients With Gynecologic Cancers

Grace Taylor

The use of artificial intelligence (AI) tools, such as Chat Generative Pre-Trained Transformer (ChatGPT), in health care has the potential to assist providers with decision-making. As such, Eric Rios-Doria, MD, University of Washington, and colleagues aimed to compare treatment recommendations for patients with gynecologic cancers made by ChatGPT vs a multidisciplinary tumor board (MTB). Their findings were presented at the 2024 Society of Gynecologic Oncology (SGO) Annual Meeting on Women’s Cancer.

An academic institutional review board met weekly and reviewed 114 consecutive clinical cases from June 1, 2023, through September 30, 2023. The study’s researchers provided ChatGPT (version 3.5) prompts for the same patients regarding their clinical information, surgical outcome, pathologic, and available molecular results. They then directed the AI to provide “the single best treatment option for the clinical scenario” from the following options: chemotherapy, radiation, combined chemoradiation, targeted/ immunotherapy, surgery, surveillance, clinical trial, other, and undecided. The ChatGPT’s output was then compared to the MTB’s treatment selection.

Of the 114 cases reviewed, the gynecologic cancer sites consisted of 42.1% endometrial/uterine (n = 48), 37.7% ovarian (n = 43), 7.9% other/not defined (n = 9), 7.0% vulvar (n = 8), and 5.3% cervical (n = 6). The MTB’s most common recommendation for all disease sites was chemotherapy alone at 35.1% (n = 40), followed by surveillance at 18.4% (n = 21) and other at 10.5% (n = 12). For 36 of the clinical cases, ChatGPT provided more than the prompted single recommendation, and within this group of treatment selections, 72.2% (n = 26) matched the MTB recommendation.

The ChatGPT decisions agreed with the MTB treatment decisions in 45.6% (95% CI 0.25-0.45, P < .001) of all cases, with surveillance being the highest correctly provided output by ChatGPT, consistent with 66.7% (n = 14/21) of cases given this recommendation by the MTB.

When the data was stratified by tumor location, the interobserver agreement was highest for cervical sites (n = 6) at 83.3% with a kappa of 0.76 (95% CI 0.40-1.1, P < .001). For the other/not defined group, the agreement was lowest at 22.2% with a kappa of 0.13 (95% CI -0.12-0.38, P = .2) If the MTB was undecided, ChatGPT chose single decisions in 86% (n = 6/7) of cases and provided the option of discussed but not selected.

The study found that the majority of ChatGPT’s treatment recommendations did not agree with final academic MTB decisions. The authors suggests that because of this low level of concordance and the lack of treatment details within ChatGPT's recommendations, human evaluation is still necessary for thorough patient presentation assessment.


Source: Rios-Doria E, Wang JY, Rodriguez I, et al. Artificial intelligence powered insights: Assessment of ChatGPT's treatment recommendations in gynecologic oncology. Presented at the 2024 Society of Gynecologic Oncology (SGO) Annual Meeting on Women’s Cancer; March 16-March 18, 2024. San Diego, California.