Background: Conversational chatbots, fueled by large language models, spark debate over their potential in education and medical career exams. There is debate in the literature about the scientific integrity of the outputs produced by these chatbots. Aims: This study evaluates ChatGPT 3.5 and Perplexity AI's cross-sectional performance in responding to questions from the 2023 Italian national residency admission exam (SSM23), comparing results and chatbots' concordance with previous years SSMs. Methods: Gastroenterology-related SSM23 questions were input into ChatGPT 3.5 and Perplexity AI, evaluating their performance in correct responses and total scores. This process was repeated with questions from the three preceding years. Additionally, chatbot concordance was assessed using Cohen's method. Results: In SSM23, ChatGPT 3.5 outperforms Perplexity AI with 94.11% correct responses, demonstrating consistency across years. Concordance weakened in 2023 ( kappa= 0.203, P = 0.148), but ChatGPT consistently maintains a high standard compared to Perplexity AI. Conclusion: ChatGPT 3.5 and Perplexity AI exhibit promise in addressing gastroenterological queries, emphasizing potential educational roles. However, their variable performance mandates cautious use as supplementary tools alongside conventional study methods. Clear guidelines are crucial for educators to balance traditional approaches and innovative systems, enhancing educational standards. (c) 2024 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd. All rights reserved.

Charting new AI education in gastroenterology: Cross-sectional evaluation of ChatGPT and perplexity AI in medical residency exam

Gravina, Antonietta Gerarda;Pellegrino, Raffaele
;
Palladino, Giovanna;Federico, Alessandro
2024

Abstract

Background: Conversational chatbots, fueled by large language models, spark debate over their potential in education and medical career exams. There is debate in the literature about the scientific integrity of the outputs produced by these chatbots. Aims: This study evaluates ChatGPT 3.5 and Perplexity AI's cross-sectional performance in responding to questions from the 2023 Italian national residency admission exam (SSM23), comparing results and chatbots' concordance with previous years SSMs. Methods: Gastroenterology-related SSM23 questions were input into ChatGPT 3.5 and Perplexity AI, evaluating their performance in correct responses and total scores. This process was repeated with questions from the three preceding years. Additionally, chatbot concordance was assessed using Cohen's method. Results: In SSM23, ChatGPT 3.5 outperforms Perplexity AI with 94.11% correct responses, demonstrating consistency across years. Concordance weakened in 2023 ( kappa= 0.203, P = 0.148), but ChatGPT consistently maintains a high standard compared to Perplexity AI. Conclusion: ChatGPT 3.5 and Perplexity AI exhibit promise in addressing gastroenterological queries, emphasizing potential educational roles. However, their variable performance mandates cautious use as supplementary tools alongside conventional study methods. Clear guidelines are crucial for educators to balance traditional approaches and innovative systems, enhancing educational standards. (c) 2024 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd. All rights reserved.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/538592
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 8
social impact