This article examines the challenges and security concerns associated with the use of large language models (LLMs) like ChatGPT in the medical field, focusing particularly on the phenomena known as “LLM jailbreaking”. As LLMs increasingly perform complex tasks involving sensitive information, the risk of their misuse becomes significant. Jailbreaking, originally a concept from software systems, refers to bypassing the restrictions set by developers to unlock new functionalities. This practice has spread to LLMs, where users manipulate model inputs to elicit responses otherwise restricted by ethical and safety guidelines. Our research specifically targets the implications of jailbreaking using ChatGPT versions 3.5 and 4 as case studies in two medical scenarios: pneumonia treatment and a recipe for a drink based on drugs. We demonstrate how modified prompts—such as those used in “Role Playing”—can alter the model’s output, potentially leading to the provision of harmful medical advice or the disclosure of sensitive information. Findings indicate that while newer versions of ChatGPT show improved resistance to such manipulations, significant risks remain. The paper discusses the dual necessity of refining these defensive mechanisms and maintaining ethical oversight to prevent misuse. As LLMs permeate more deeply into critical areas like healthcare, the balance between leveraging their capabilities and safeguarding against risks becomes paramount. This analysis underscores the urgent need for ongoing research into more robust security measures and ethical guidelines to ensure the safe use of transformative artificial technology technologies in sensitive fields. © AME Publishing Company.
Jailbreaking large language models: navigating the crossroads of innovation, ethics, and health risks
Simone Colosimo;Cristiana Indolfi;Michele Miraglia del Giudice;Francesca Rossi
2025
Abstract
This article examines the challenges and security concerns associated with the use of large language models (LLMs) like ChatGPT in the medical field, focusing particularly on the phenomena known as “LLM jailbreaking”. As LLMs increasingly perform complex tasks involving sensitive information, the risk of their misuse becomes significant. Jailbreaking, originally a concept from software systems, refers to bypassing the restrictions set by developers to unlock new functionalities. This practice has spread to LLMs, where users manipulate model inputs to elicit responses otherwise restricted by ethical and safety guidelines. Our research specifically targets the implications of jailbreaking using ChatGPT versions 3.5 and 4 as case studies in two medical scenarios: pneumonia treatment and a recipe for a drink based on drugs. We demonstrate how modified prompts—such as those used in “Role Playing”—can alter the model’s output, potentially leading to the provision of harmful medical advice or the disclosure of sensitive information. Findings indicate that while newer versions of ChatGPT show improved resistance to such manipulations, significant risks remain. The paper discusses the dual necessity of refining these defensive mechanisms and maintaining ethical oversight to prevent misuse. As LLMs permeate more deeply into critical areas like healthcare, the balance between leveraging their capabilities and safeguarding against risks becomes paramount. This analysis underscores the urgent need for ongoing research into more robust security measures and ethical guidelines to ensure the safe use of transformative artificial technology technologies in sensitive fields. © AME Publishing Company.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


