The False Sense Of Security In AI Red Teaming

The False Sense Of Security In AI Red Teaming

Itamar Golan, CEO and Co-founder of Prompt Security, Core Member of OWASP Top 10 for LLM-based Apps.

AI red teaming has emerged as a critical security measure for AI-powered applications. It involves adopting adversarial methods to proactively identify flaws and vulnerabilities such as harmful or biased outputs, unintended behaviors, system limitations or potential misuse risks. Organizations use it to simulate attacks and uncover vulnerabilities in their AI models, particularly large language models (LLMs).

However, a dangerous misconception is spreading: the belief that red teaming alone can secure AI systems. AI red teaming mostly relies on identifying and patching fixed vulnerabilities, which is a great starting point but not nearly enough.

I’ll explore two distinct types of red teaming efforts, each with its own scope and responsibility depending on the stage of the AI system life cycle—model red teaming and application red teaming.

Model Red Teaming

This form of red teaming is typically the responsibility of the model provider (e.g., OpenAI, Anthropic, etc.). It focuses on identifying risks inherent to the LLM itself. This includes testing for unsafe outputs, jailbreakability, discriminatory responses and overall alignment with safety goals.

Model providers often conduct this red teaming at scale, using both internal and external experts. A good example is OpenAI’s publication on its red teaming approach, which describes how it engages with a diverse group of external red teamers alongside internal teams to rigorously evaluate its models.

Application Red Teaming

Once a model is integrated into a real-world application, the responsibility shifts to the application builder, typically the organization deploying the AI-powered solution. This layer of red teaming focuses not only on how the model behaves but how the application as a whole handles user interactions, data flows and prompts.

For example, imagine a travel agency deploying an AI-powered chatbot to help users book trips. This system may be connected to one or more LLMs, vector databases, proprietary data sources and APIs. Red teaming this application would include evaluating the system prompt (the persistent instructions steering the model’s behavior), how user inputs are processed and how the model interfaces with external tools or services. It goes beyond the model’s raw capabilities to uncover vulnerabilities in the application logic, prompt engineering, data exposure or potential abuse vectors.

The Fundamental Difference Between AI Security And Traditional Software Security

Traditional software security followed a structured approach: Identify vulnerabilities through penetration testing, patch them and move on. This worked because software flaws were deterministic; once fixed, they stayed fixed.

In contrast, LLMs operate probabilistically, generating responses based on context and training data, which makes them inherently unpredictable. Even if a red team finds a vulnerability today, the model may still behave unpredictably tomorrow. While valuable, red teaming has critical limitations.

AI risks, such as prompt injections, can’t be patched like traditional bugs. These attacks manipulate prompts to trigger harmful outputs, exploiting the probabilistic nature of LLMs. Even after mitigation, slight input changes can bypass defenses. Attackers adapt quickly, chaining prompts or exploiting previously unseen behaviors. Red teaming offers only a point-in-time snapshot.

LLMs and GenAI applications behave like living systems; they are constantly evolving. The OpenAI model you used yesterday might be a different version tomorrow with slightly altered behavior. This means a simulated attack that was blocked before could now succeed simply due to a model update.

The Need For Runtime AI Security

To truly secure AI applications, organizations must implement runtime protection—security solutions that operate in real time to detect and mitigate threats as they occur. Let’s examine how runtime security works in different AI-powered applications.

Chatbots And AI Assistants

AI chatbots are used across industries, from customer support to healthcare. They face threats like prompt injection, context hijacking and response manipulation. A chatbot may pass red teaming but still be vulnerable in production.

Runtime protection can detect suspicious inputs, filter harmful outputs in real time and block data leaks and unauthorized actions.

AI-Powered Code Generation Tools

AI code assistants like GitHub Copilot and Cursor boost developer productivity, but they can also introduce security risks by generating insecure code. Red teaming helps spot some issues, but it can’t catch everything.

Runtime protection can analyze AI-generated code in real time to detect flaws, enforce policies to block insecure suggestions and learn from new threats to stay ahead of evolving attack techniques.

AI Agents In Agentic Workflows

AI agents automate workflows based on user instructions, but they can be manipulated into executing unintended actions. Red teaming may reveal such risks, but static fixes often fall short.

Runtime protection can help by monitoring agent behavior for risky actions, dynamically filtering and blocking harmful commands and adapting policies in real time using threat intelligence to prevent exploitation.

Challenges In Mainstreaming AI Runtime Protection

Adopting runtime protection for AI introduces several challenges. Cost is a primary concern—not just for the technology but for the resources required to implement and maintain it. Performance trade-offs, such as latency and false positives, raise concerns about user experience and operational reliability. Runtime tools also add complexity to crowded tech stacks, increasing the burden on engineering and security teams.

A deeper challenge lies in the shifting responsibility model. Unlike traditional software, AI systems blur the lines between developer, model provider, infrastructure team and security owner. It’s often unclear who is accountable for securing AI behavior at runtime. This ambiguity can complicate implementation and slow decision-making.

As AI systems become more embedded in critical workflows, resolving these questions becomes essential, yet many organizations still struggle with placing where AI security fits within their structure and processes.

Changing The AI Security Mindset

AI security demands real-time threat detection, adaptive defenses and full-stack runtime protection to counter evolving threats. AI models behave probabilistically, making vulnerabilities difficult to fix permanently.

Relying solely on red teaming can create a false sense of security. While useful for identifying some risks, it doesn’t account for AI’s unpredictability or the persistence of attack variants. As a starting point, teams should explore frameworks like the OWASP Top 10 for LLMs and the NIST AI Risk Management Framework to guide effective, ongoing protection strategies.


Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?


Leave a Reply

Your email address will not be published. Required fields are marked *