Potential and Limitations of ChatGPT 3.5 and 4.0 as a Source of COVID-19 Information: Comprehensive Comparative Analysis of Generative and Authoritative Information-Reference-Cited by-同舟云学术

Potential and Limitations of ChatGPT 3.5 and 4.0 as a Source of COVID-19 Information: Comprehensive Comparative Analysis of Generative and Authoritative Information

Published:2023-12-14 Issue: Volume:25 Page:e49771
ISSN:1438-8871
Container-title:Journal of Medical Internet Research
language:en
Short-container-title:J Med Internet Res

Author:

Wang Guoyong^ORCID,Gao Kai^ORCID,Liu Qianyang^ORCID,Wu Yuxin^ORCID,Zhang Kaijun^ORCID,Zhou Wei^ORCID,Guo Chunbao^ORCID

Abstract

Background The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has necessitated reliable and authoritative information for public guidance. The World Health Organization (WHO) has been a primary source of such information, disseminating it through a question and answer format on its official website. Concurrently, ChatGPT 3.5 and 4.0, a deep learning-based natural language generation system, has shown potential in generating diverse text types based on user input. Objective This study evaluates the accuracy of COVID-19 information generated by ChatGPT 3.5 and 4.0, assessing its potential as a supplementary public information source during the pandemic. Methods We extracted 487 COVID-19–related questions from the WHO’s official website and used ChatGPT 3.5 and 4.0 to generate corresponding answers. These generated answers were then compared against the official WHO responses for evaluation. Two clinical experts scored the generated answers on a scale of 0-5 across 4 dimensions—accuracy, comprehensiveness, relevance, and clarity—with higher scores indicating better performance in each dimension. The WHO responses served as the reference for this assessment. Additionally, we used the BERT (Bidirectional Encoder Representations from Transformers) model to generate similarity scores (0-1) between the generated and official answers, providing a dual validation mechanism. Results The mean (SD) scores for ChatGPT 3.5–generated answers were 3.47 (0.725) for accuracy, 3.89 (0.719) for comprehensiveness, 4.09 (0.787) for relevance, and 3.49 (0.809) for clarity. For ChatGPT 4.0, the mean (SD) scores were 4.15 (0.780), 4.47 (0.641), 4.56 (0.600), and 4.09 (0.698), respectively. All differences were statistically significant (P<.001), with ChatGPT 4.0 outperforming ChatGPT 3.5. The BERT model verification showed mean (SD) similarity scores of 0.83 (0.07) for ChatGPT 3.5 and 0.85 (0.07) for ChatGPT 4.0 compared with the official WHO answers. Conclusions ChatGPT 3.5 and 4.0 can generate accurate and relevant COVID-19 information to a certain extent. However, compared with official WHO responses, gaps and deficiencies exist. Thus, users of ChatGPT 3.5 and 4.0 should also reference other reliable information sources to mitigate potential misinformation risks. Notably, ChatGPT 4.0 outperformed ChatGPT 3.5 across all evaluated dimensions, a finding corroborated by BERT model validation.

Publisher

JMIR Publications Inc.

Subject

Health Informatics

Reference34 articles.

1. Review of the Clinical Characteristics of Coronavirus Disease 2019 (COVID-19)

2. WHO Coronavirus (COVID-19) dashboardWorld Health Organization20202020-06-01https://covid19.who.int/

3. COVID-19: Unmasking Emerging SARS-CoV-2 Variants, Vaccines and Therapeutic Strategies

4. COVID-19: Facts, Cultural Considerations, and Risk of Stigmatization

5. Knowledge and Anxiety about COVID-19 in the State of Qatar, and the Middle East and North Africa Region—A Cross Sectional Study

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Man Versus Machine: Harnessing Artificial Intelligence for Qualitative Analysis (Preprint);2024-01-17