Chatbots Lose Safety Controls During Long Conversations

AI systems drop their safety measures as users talk longer, increasing the risk of harmful or inappropriate replies, a new report reveals. A few simple prompts let users bypass most artificial intelligence guardrails, according to the same study.

Cisco Tests Popular AI Models for Vulnerabilities

Cisco examined large language models behind chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft to determine how quickly they exposed unsafe or illegal information. The team conducted 499 conversations using “multi-turn attacks,” where users ask multiple questions to override protections. Each chat involved five to ten exchanges.

Researchers compared responses from several questions to measure how often chatbots shared harmful or inappropriate data, including private company details or misinformation. On average, they extracted malicious information from 64 percent of multi-question conversations, versus only 13 percent from single-question interactions. Success rates ranged from 26 percent with Google’s Gemma to 93 percent with Mistral’s Large Instruct model.

Cisco warned that multi-turn attacks could spread dangerous content or grant hackers unauthorized access to sensitive corporate data. The report showed that AI tools often fail to recall and enforce safety policies during extended dialogue, allowing attackers to refine questions and bypass protections.

Open-Source Models Shift Responsibility to Users

Mistral, along with Meta, Google, OpenAI, and Microsoft, uses open-weight LLMs that reveal the safety parameters used during training. Cisco stated that these models include lighter built-in safeguards, enabling users to download and modify them freely. This structure transfers safety responsibility to those customizing the open-source versions.

Cisco also noted that Google, OpenAI, Meta, and Microsoft have taken steps to prevent malicious fine-tuning. However, critics continue to accuse AI firms of weak security measures that allow systems to be misused for criminal acts.

In August, U.S. company Anthropic reported that criminals exploited its Claude model to steal and extort personal data, demanding ransoms exceeding $500,000 (€433,000).

What's Hot

Beecle Officially Launches Its Online Shop and Enters the Cosmetics Market

Beecle Gets Ready to Enter the Competitive Beauty Market

Over 21,000 Food Establishments Inspected in Nationwide Raid

BMW Rimac Boost EV Battery Range

OpenAI Weighed Police Alert Months Before Deadly Canada School Shooting

US Startup Funding Soars in Tech Boom

Categories

Important Links

Latest News