AI Chatbot Jailbreaking Security Threat is ‘Immediate, Tangible, and Deeply Concerning’

A new study has found that leading AI chatbots can still be manipulated into generating harmful content, including instructions on illegal activities, despite ongoing safety improvements by tech companies. The findings raise urgent concerns about how easily these systems can be exploited and how slowly developers are responding to the risks.

Researchers from Ben-Gurion University of the Negev in Israel have revealed that many of today’s AI chatbots, including some of the most advanced systems such as ChatGPT, Gemini, and Claude, can be manipulated using specific prompt-based attacks to generate harmful content. They said the threat is “immediate, tangible, and deeply concerning.”

Jailbreaking in AI involves using carefully crafted prompts to trick a chatbot into ignoring its safety rules. The researchers found that this method works on multiple major AI platforms.

According to the study, once the models are exploited using this method, they are capable of producing outputs for a wide range of dangerous queries, including guides for bomb-making, hacking, insider trading, and drug production.

The rise of dark LLMs

Large language models like ChatGPT are trained on vast amounts of internet data. While companies try to filter out dangerous content, some harmful information slips through. Worse, hackers are now creating or modifying AI models specifically to remove safety controls.

Some of these rogue AIs, like WormGPT and FraudGPT, are openly sold online as tools with “no ethical limits,” The Guardian reported. These so-called dark LLMs are designed to help with scams, hacking, and even financial crimes.

The researchers caution that tools, which were once limited to sophisticated criminals or state-sponsored hackers, could soon be accessible to anyone with basic hardware and internet access.

SEE: GhostGPT: Uncensored Chatbot Used by Cyber Criminals for Malware Creation, Scams

Tech companies’ weak response

The study found that the universal jailbreak method could successfully break through safety barriers on multiple top models, even months after the technique was first published on Reddit. This raises urgent concerns about how slowly, or even inadequately, AI companies are responding to threats.

Despite the researchers’ efforts to notify major AI developers through official channels, the response was described as “underwhelming,” The Guardian noted.

According to the authors, some companies failed to respond to the disclosure, while others claimed the reported vulnerabilities did not meet the criteria of their security or bug bounty frameworks. This leaves the door open for misuse, potentially even by unskilled individuals.

Open-source models make the risk harder to control

Even more worrying is that once an AI model has been modified and shared online, it can’t be recalled. Unlike apps or websites, open-source models can be saved, copied, and redistributed infinitely.

The researchers emphasize that even with regulation or patches, any AI model downloaded and stored locally becomes almost impossible to contain. Worse still, one compromised model can potentially be used to manipulate others, multiplying the threat.

What needs to be done now

To contain the growing threat, the researchers outlined these urgent steps:

  • Curated training data: Models must be trained only on clean, safe data, with harmful content excluded from the start.
  • AI firewalls: Just as antivirus software protects computers, middleware should filter harmful prompts and outputs.
  • Machine unlearning: New technology could help AI “forget” harmful information even after deployment.
  • Continuous red teaming: Ongoing adversarial testing and public bug bounties are key to staying ahead of threats.
  • Public awareness: Governments and educators must treat dark LLMs like unlicensed weapons, regulating access and spreading awareness.

Without decisive action, the researchers warn, AI systems could become powerful enablers of criminal activity, placing dangerous knowledge just a few keystrokes away.

Adblock test (Why?)