Letter 2 regarding “Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma”

Article information

Clin Mol Hepatol. 2024;30(1):113-117
Publication date (electronic) : 2023 November 10
doi : https://doi.org/10.3350/cmh.2023.0440
1Department of Endocrinology and Metabolic Hepatology, The Affiliated Hospital of Qingdao University, Qingdao, China
2Department of Gastroenterology and Hepatology, Shanghai East Hospital, Tongji University, Shanghai, China
3Department of Gastroenterology and Hepatology, The Affiliated Hospital of Qingdao University, Qingdao, China
4Shandong Provincial Key Laboratory of Metabolic Diseases and Qingdao Key Laboratory of Gout, the Affiliated Hospital of Qingdao University, Qingdao, China
5Department of Infectious Disease and Hepatology, The Affiliated Hospital of Qingdao University, Qingdao, China
6Beijing Key Laboratory of Ophthalmology and Visual Sciences, Beijing Tongren Eye Center, Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital Medical University, Beijing, China
7Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
8Yong Loo Lin School of Medicine, National University of Singapore, Singapore
9Singapore Eye Research Institute, Singapore National Eye Center, Singapore
10Department of Computer Science and Engineering, Shanghai JiaoTong University, Shanghai, China
11Tsinghua Medicine, Tsinghua University, Beijing, China
12Department of Internal Medicine, Beijing Tsinghua Changgung Hospital, Beijing, China
Corresponding author : Tien Yin Wong Tsinghua Medicine, Tsinghua University, Haidian District, Beijing, 100084, China Tel: +86-10-62782835, Fax: +86-10-62782835, E-mail: wongtienyin@tsinghua.edu.cn
Hongwei Ji Tsinghua Medicine, Tsinghua University, Haidian District, Beijing, 100084, China Tel: +86-10-62782835, Fax: +86-10-62782835, E-mail: hongweijicn@gmail.com
*Contributed equally.
#LLM-Liver Investigators.
Editor: Ji Won Han, Catholic University of Korea, Korea
Received 2023 October 29; Revised 2023 November 8; Accepted 2023 November 10.

Dear Editor,

We read with great interest the recently published research analyzing the performances of ChatGPT with respect to the management of cirrhosis and hepatocellular carcinoma [1]. In addition to these advanced liver diseases, steatotic liver disease (SLD) also represents a considerable burden on global health, as it affects one-third of the worldwide population [2]. SLD requires long-term self-management and continuous support. This stems from its slow progression, the emphasis on lifestyle changes, and the constant need for regular patient-physician interactions. Therefore, for patients diagnosed with SLD, education plays a pivotal role in understanding, managing, and possibly reversing their condition. In our evolving digital era, large language models (LLMs), which are sophisticated generative AI systems trained on vast volumes of data that are capable of producing human-like textual responses, have emerged as promising aids for patient education [3], particularly in facilitating interactions through natural language dialogues [4]. However, given that the efficacy of LLMs in advancing SLD patient education might vary, it is imperative to compare their performances. Therefore, we conducted a comparative evaluation study to assess the performance of five leading LLMs in responding to SLD-related queries.

Our study was performed between Sep 8th and 28th, 2023. We curated 30 common SLD-related queries spanning domains such as risk factors, clinical test and diagnosis, treatment, follow-up, and prognosis based on guideline-based topics and our clinical experience (Table 1) [5,6]. As a separate and independent prompt, each query was posed to five LLMs: ChatGPT-3.5, ChatGPT-4, Google Bard, Meta Llama2 and Anthropic Claude2, which yielded a total of 30 responses per LLM-chatbot. The generated responses were then randomly ordered within each set of questions and stripped of revealing information (e.g., statements such as “I’m not a doctor” from ChatGPT) to blind reviewers to the LLM-specific response identity. Three seasoned attending-level physicians independently graded the responses as either “appropriate” or “inappropriate” over five rounds, each on a separate day, with an overnight washout interval in between to mitigate memory bias (Supplementary Fig. 1). Specifically, the responses were graded as “appropriate” when they were free from errors and “inappropriate” when they contained potential factual errors that could harm or mislead the average patient. The final grade for each chatbot response was determined using a majority consensus approach, based on the grade most often assigned by the three expert graders.

Performance of large language models in addressing patient queries regarding steatotic liver disease

We assessed the performances of the five LLMs in responding to SLD-related queries. As shown in Table 1, ChatGPT-4 provided 29 of 30 (96.7%) appropriate responses, followed by Bard and Llama2 with 27 of 30 (90.0%), and ChatGPT-3.5 and Claude2 both with 24 of 30 (80.0%), Chi-square test χ2=6.17, P=0.18. A notable area of concern was the frequent misclassification of fatty liver disease as synonymous with nonalcoholic fatty liver disease (NAFLD). This oversimplification can lead to inaccuracies. For example, ChatGPT-3.5 replied to the question “Are there different stages of fatty liver disease, and how do they differ?” with the following response: “Yes, there are different stages of fatty liver disease, which is also known as nonalcoholic fatty liver disease (NAFLD). …. The stages of NAFLD are typically categorized as follows: 1. Simple Steatosis (Fatty Liver): ….2. Nonalcoholic Steatohepatitis (NASH): .... 3. Fibrosis: …. 4. Cirrhosis: ….”

This rigorous evaluation study revealed that, among five state-of-the-art LLMs, ChatGPT-4 could generate largely appropriate responses to patient queries regarding SLD, boasting an impressive appropriateness rate of 96.7%. Other LLMs provided 80% to 90% appropriate responses. Health literacy—commonly defined as the degree to which individuals have the skills and abilities to obtain, process, and utilize health-related information—has emerged as a critical priority in reducing inequities among patients, including those with SLD [7,8]. Our findings underscore the varied potential of LLM chatbots to provide professional yet patient-friendly health literacy guidance to SLD patients [3]. Whereas prior investigations predominantly focused on ChatGPT3.5 [1], our study offers a comprehensive assessment of popular LLMs, namely ChatGPT-3.5, ChatGPT-4, Bard, Llama2 and Claude2, and we specifically evaluated their proficiency in addressing typical SLD-related patient queries. Notably, one in five responses from ChatGPT-3.5 and Claude2 was inappropriate, thus highlighting the need for further iterations and probably domain-specific fine-tuning. Although the exact parameters of ChatGPT-4 remain undisclosed, its impressive performance may result from the large parameter set, extensive user feedback, advanced reasoning abilities, and the integration of insights from previous models into the system [9]. This study derived benefits from implementing a robust study design with proper randomization, wash-out periods and a majority consensus grading process. However, there are also limitations. These sample queries may represent only a small proportion of real-world scenarios. In addition, as the field of LLM evolves at an unprecedented speed, future research is needed to confirm whether LLMs are adapting to new nomenclatures, such as metabolic dysfunction associated steatotic liver disease (MASLD). Generative AI with LLMs—especially ChatGPT-4—may offer yet further valuable insights into opportunities for patient education about SLDs.

Notes

Authors’ contribution

Acquisition of data: Yiwen Zhang, Hongwei Ji, Liwei Wu, Zepeng Mu. Analysis and interpretation of data: Hongwei Ji. Drafting of the manuscript: All authors. Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: Yiwen Zhang Hongwei Ji. Obtained funding: Hongwei Ji. Study supervision: Hongwei Ji.

Conflicts of Interest

The authors have no conflicts to disclose.

Acknowledgements

This study was funded in part by the National Key R & D Program of China (2022YFC2502800), National Natural Science Foundation of China (82103908), the Shandong Provincial Natural Science Foundation (ZR2021QH014), the Shuimu Scholar Program of Tsinghua University, and National Postdoctoral Innovative Talent Support Program (BX20230189). The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Abbreviations

SLD

steatotic liver disease

LLMs

large language models

NAFLD

nonalcoholic fatty liver disease

MASLD

metabolic dysfunction associated steatotic liver disease

SUPPLEMENTAL MATERIAL

Supplementary material is available at Clinical and Molecular Hepatology website (http://www.e-cmh.org).

Supplementary Figure 1.

Evaluation Process for State-of-the-Art Online Large Language Models in Responding to Steatotic Liver Disease- Related Queries. This figure illustrates the assessment workflow for evaluating the responses generated by LLMs to 30 steatotic liver disease (SLD)-related queries. Each LLM's responses are represented by a unique color. These responses were randomly ordered and stripped of revealing information to ensure a blind assessment. Subsequently, three physicians independently graded the responses as either "appropriate" or "inappropriate" over five days. The final grade for each response was determined using a majority consensus approach.

cmh-2023-0440-Supplementary-Fig-1.pdf

References

1. Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol 2023;29:721–732.
2. Devarbhavi H, Asrani SK, Arab JP, Nartey YA, Pose E, Kamath PS. Global burden of liver disease: 2023 update. J Hepatol 2023;79:516–537.
3. Varghese J, Chapiro J. ChatGPT: The transformative influence of generative AI on science and healthcare. J Hepatol 2023 Aug 5. doi: 10.1016/j.jhep.2023.07.028.
4. Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 2023;329:842–844.
5. Younossi ZM, Corey KE, Lim JK. AGA clinical practice update on lifestyle modification using diet and exercise to achieve weight loss in the management of nonalcoholic fatty liver disease: Expert review. Gastroenterology 2021;160:912–918.
6. European Association for the Study of the Liver (EASL), ; European Association for the Study of Diabetes (EASD), ; European Association for the Study of Obesity (EASO). EASL-EASD-EASO Clinical Practice Guidelines for the management of non-alcoholic fatty liver disease. J Hepatol 2016;64:1388–1402.
7. Carroll AM, Rotman Y. Nutrition literacy is not sufficient to induce needed dietary changes in nonalcoholic fatty liver disease. Am J Gastroenterol 2023;118:1381–1387.
8. Coleman C, Birk S, DeVoe J. Health literacy and systemic racismusing clear communication to reduce health care inequities. JAMA Intern Med 2023;183:753–754.
9. OpenAI. GPT-4 Technical Report. arXiv 2303.08774 [Preprint]. 2023;[cited 2023 Oct 27]. Available from: https://doi.org/10.48550/arXiv.2303.08774.

Article information Continued

Table 1.

Performance of large language models in addressing patient queries regarding steatotic liver disease

GPT-3.5 GPT-4 Bard Llama2 Claude2
Appropriateness, n (%) 24 (80.0) 29 (96.7) 27 (90.0) 27 (90.0) 24 (80.0)
1. Risk factors
 Who is more likely to get fatty liver disease? 1 3 3 3 3
 What type of diet can help better manage fatty liver disease? 1 0* 3 3 3
 How does alcohol consumption affect my fatty liver disease, and should I abstain from alcohol completely? 3 3 2 3 2
 What type and amount of physical activity is recommended for someone with fatty liver disease? 3 2 3 3 3
 I have a lean build; how did I develop fatty liver disease? 3 3 2 3 3
 How does my family’s health history impact the monitoring of my fatty liver disease? 3 3 3 3 1
2. Test and diagnosis
 What are the early signs and symptoms of fatty liver disease that I should be aware of? 3 2 3 2 3
 How is fatty liver disease diagnosed? 3 3 3 1 3
 Are there different types of fatty liver disease, and how do they differ? 3 3 2 2 2
 Are there different stages of fatty liver disease, and how do they differ? 1 3 1 1 1
 At what point is a liver biopsy recommended for individuals with fatty liver disease? 0 2 2 1 2
 What is the role of imaging tests such as ultrasound, MRI, or CT scan in diagnosing fatty liver disease? 3 3 2 3 2
 I have fatty liver disease and my ALT is 100 U/L; how should I interpret this? 2 2 3 2 1
 I have fatty liver disease and my FIB-4 score is 1.1; how should I interpret this? 3 3 2 3 3
3. Treatment
 How is fatty liver disease treated? 3 2 3 3 3
 Are there any specific medications that are commonly prescribed for fatty liver disease? 2 3 3 2 1
 How should medication be used to avoid liver damage in fatty liver disease? 1 3 1 2 3
 What lifestyle interventions can aid in the treatment of fatty liver disease? 2 3 2 3 3
 In severe cases, are there surgical options available for treating fatty liver disease? 2 3 3 3 3
4. Follow up and monitoring
 How often should I be monitored if I have fatty liver disease? 2 2 3 2 3
 I have fatty liver disease. What tests or procedures will be performed during follow-up appointments? 3 2 3 3 3
 I have fatty liver disease. What signs or symptoms should prompt me to seek immediate medical attention? 3 3 3 3 3
5. Comorbidities and prognosis
 What other health conditions are commonly associated with fatty liver disease? 3 2 3 3 3
 What is the typical prognosis for someone with fatty liver disease? 3 3 3 3 3
 Is there an increased risk of heart disease when living with fatty liver disease? 3 3 3 2 1
 How does fatty liver disease affect diabetes management, and vice versa? 3 3 3 3 3
 Does having fatty liver disease increases my risk for liver cancer? 2 3 2 2 1
 How can I understand the stage of my fatty liver disease and the potential progression over time? 1 2 3 2 3
 Can fatty liver disease be reversed? 3 3 3 3 3
 Can children develop fatty liver disease, and if so, how does it affect their health as they grow up? 2 3 1 3 2

The numbers represent the count of ‘appropriate’ ratings given by three reviewers. ≥2 is considered ‘appropriate.

*

Response from ChatGPT-4 for this particular query: Fatty liver disease, which includes conditions such as non-alcoholic fatty liver disease (NAFLD) and non-alcoholic steatohepatitis (NASH)… 5. Hydrate: - Drink plenty of water throughout the day to stay hydrated…7. Regular Physical Activity: - Along with a healthy diet, engaging in regular physical activity can help in managing the condition…

Two graders found the advice to ‘drink plenty of water throughout the day’ to be inappropriate, noting that while hydration is important, excessive water intake can disrupt electrolyte balance, particularly in individuals with conditions like advanced liver disease that affect water regulation. One grader deemed the physical activity suggestion irrelevant to the query at hand. Additionally, the simplified examples such as NAFLD and NASH was seen as inappropriate by two graders, as the query specifically pertains to dietary modifications and such simplifications could downplay the role of alcohol and other dietary factors.