AI systems drop their safety measures as users talk longer, increasing the risk of harmful or inappropriate replies, a new report reveals. A few simple prompts let users bypass most artificial intelligence guardrails, according to the same study.

Cisco Tests Popular AI Models for Vulnerabilities

Cisco examined large language models behind chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft to determine how quickly they exposed unsafe or illegal information. The team conducted 499 conversations using “multi-turn attacks,” where users ask multiple questions to override protections. Each chat involved five to ten exchanges.

Researchers compared responses from several questions to measure how often chatbots shared harmful or inappropriate data, including private company details or misinformation. On average, they extracted malicious information from 64 percent of multi-question conversations, versus only 13 percent from single-question interactions. Success rates ranged from 26 percent with Google’s Gemma to 93 percent with Mistral’s Large Instruct model.

Cisco warned that multi-turn attacks could spread dangerous content or grant hackers unauthorized access to sensitive corporate data. The report showed that AI tools often fail to recall and enforce safety policies during extended dialogue, allowing attackers to refine questions and bypass protections.

Open-Source Models Shift Responsibility to Users

Mistral, along with Meta, Google, OpenAI, and Microsoft, uses open-weight LLMs that reveal the safety parameters used during training. Cisco stated that these models include lighter built-in safeguards, enabling users to download and modify them freely. This structure transfers safety responsibility to those customizing the open-source versions.

Cisco also noted that Google, OpenAI, Meta, and Microsoft have taken steps to prevent malicious fine-tuning. However, critics continue to accuse AI firms of weak security measures that allow systems to be misused for criminal acts.

In August, U.S. company Anthropic reported that criminals exploited its Claude model to steal and extort personal data, demanding ransoms exceeding $500,000 (€433,000).

Share.

Rachel Maddow is a freelance journalist based in the USA, with over 20 years of experience covering Politics, World Affairs, Business, Health, Technology, Finance, Lifestyle, and Culture. She earned her degree in Political Science and Journalism from Stanford University. Throughout her career, she has contributed to outlets such as MSNBC, The New York Times, and The Washington Post. Known for her thorough reporting and compelling storytelling, Rachel delivers accurate and timely news that keeps readers informed on both national and global developments.

Leave A Reply