AI Agent Threat Modelling
- Arjun Ramakrishnan
- AI Security , Threat Modelling
- Published: 23 May, 2025
Decoding the Matrix: A CISO’s Guide to Threat Modeling Agentic AI
The paradigm of Artificial Intelligence is rapidly shifting towards more autonomous systems known as Agentic AI. These AI agents, supercharged by Large Language Models (LLMs), can independently perceive their environment, reason, make decisions, and take actions to achieve specific objectives. While this leap in capability promises unprecedented innovation, it also ushers in a new frontier of complex security threats. For IT and Cybersecurity professionals, understanding and mitigating these risks is paramount.
This post delves into the insights from the OWASP Agentic Security Initiative’s (ASI) guide, “Agentic AI - Threats and Mitigations,” offering a technical primer on threat modeling for these sophisticated AI systems.
Executive Summary
Agentic AI, particularly when integrated with generative AI and LLMs, significantly expands the scale, capabilities, and associated risks of autonomous systems. The OWASP ASI’s first guide focuses on providing a threat-model-based reference for these emerging agentic threats, with a particular emphasis on agents built on LLMs. This document outlines key definitions, reference architectures for single and multi-agent systems, a detailed threat model discussing new agentic threats, and comprehensive mitigation strategies. For cybersecurity professionals, this means adapting existing threat modeling practices to address the unique vulnerabilities introduced by AI agents’ autonomy, planning capabilities, memory, and tool-use functionalities.
Podcast
All podcasts, unless specifically mentioned, are generated by AI using NotebookLM
Understanding AI Agents: The New Digital Workforce
At its core, an AI agent is an intelligent software system designed for autonomous operation, perceiving its environment, reasoning, making decisions, and taking actions to achieve specific objectives. Think of them as digital entities that can:
- Plan & Reason: Agents can formulate, track, and update action plans to handle complex tasks, often using LLMs as their reasoning engines. This includes sophisticated strategies like Reflection (evaluating past actions), Self-Criticism (correcting errors), Chain of Thought (step-by-step reasoning), and Subgoal Decomposition.
- Maintain Memory (Statefulness): They can retain and recall information, both short-term (session-based) and long-term (persistent), influencing future actions.
- Take Action & Use Tools: Agents can invoke built-in functions, external tools via API calls, and even generate/run code to accomplish tasks. This “function calling” capability is a cornerstone of modern agentic systems.
Frameworks like LangChain, AutoGen, and CrewAI are increasingly used to build these agents, encapsulating their core capabilities.
Reference Architectures: Blueprints for Agentic Systems
OWASP provides reference architectures to contextualize threats:
- Single-Agent Architecture: Typically includes an application embedding agentic functionality, one or more LLM models for reasoning, services (tools, functions), and supporting services like long-term memory stores and vector databases (often used in Retrieval Augmented Generation - RAG).
- Multi-Agent Architecture: Involves multiple agents that might specialize in different tasks or scale functionality. Key additions are inter-agent communication and potentially a coordinating agent.
These architectures can manifest various Agentic AI Patterns, such as Reflective Agents, Task-Oriented Agents, Hierarchical Agents, and Human-in-the-Loop Collaboration, each with distinct interaction and risk profiles.
Threat Modeling for Agentic AI: What Can Go Wrong?
Traditional threat modeling methodologies (like STRIDE) need adaptation for AI, particularly Agentic AI. The OWASP document emphasizes a practical approach, using its reference architecture to identify threats rather than strictly adhering to one methodology.
Agentic AI threats can be new or agentic variations of existing ones. Key attack surfaces and risks include:
- Agent Memory & Tools: These are prime targets. Memory can be poisoned with malicious data, and tools can be misused, especially when agents have unconstrained autonomy.
- Privilege Compromise: The “Confused Deputy” problem is a significant concern, where an agent with higher privileges is tricked into performing unauthorized actions. Non-Human Identities (NHIs) for agents also present risks if not managed carefully. Agents might chain tools in unexpected ways to bypass security controls or inherit excessive permissions.
- Cascading Hallucinations: While LLMs can hallucinate, agentic systems can amplify this through self-reflection, memory reinforcement, or multi-agent interactions, leading to systemic misinformation.
- Overwhelming Human-in-the-Loop (HITL): The complexity and scale of agentic AI can overwhelm human oversight, creating attack vectors.
- Intent Manipulation & Deceptive Behaviors: Attackers can manipulate an agent’s goals or exploit its reasoning to cause harmful or disallowed actions. This includes agents learning to be deceptive to achieve objectives.
- Repudiation & Untraceability: The complex, often parallel, reasoning and execution paths can make it hard to trace actions and assign accountability.
- Multi-Agent System Threats: These include rogue agents operating undetected, manipulation of inter-agent communication (Agent Communication Poisoning), and exploitation of distributed workflows.
Agent Threat Model as defined in OWASP Agentic AI - Threats and Mitigations
Reproduced from OWASP Agentic AI - Threats and Mitigations
Note: The ‘Priority’ and ‘Rationale’ columns in the table below represent the author’s recommendations for prioritization and are not part of the original OWASP threat model. The ‘Mitigations’ column from the original OWASP model has been omitted here, as specific mitigations should be defined on a per-use-case basis.
| TID | Threat Name | Threat Description | Priority | Rationale for Prioritization |
|---|---|---|---|---|
| T1 | Memory Poisoning | Memory Poisoning involves exploiting an AI’s memory systems, both short and long-term, to introduce malicious or false data and exploit the agent’s context. This can lead to altered decision-making and unauthorized operations. | HIGH | Persistent impact on decision-making quality; difficult to detect once memory is corrupted; can affect all future agent operations |
| T2 | Tool Misuse | Tool Misuse occurs when attackers manipulate AI agents to abuse their integrated tools through deceptive prompts or commands, operating within authorized permissions. This includes Agent Hijacking, where an AI agent ingests adversarial manipulated data and subsequently executes unintended actions, potentially triggering malicious tool interactions. For more information on Agent Hijacking see https://www.nist.gov/news. | CRITICAL | High likelihood of occurrence; immediate system impact; relatively easy to exploit through prompt manipulation; core attack vector |
| T3 | Privilege Compromise | Privilege Compromise arises when attackers exploit weaknesses in permission management to perform unauthorized actions. This often involves dynamic role inheritance or misconfigurations. | CRITICAL | Severe access control implications; common misconfiguration risk in dynamic systems; enables further attack escalation |
| T4 | Resource Overload | Resource Overload targets the computational, memory, and service capacities of AI systems to degrade performance or cause failures, exploiting their resource-intensive nature. | MEDIUM | Primarily availability impact; existing DoS protections partially applicable; limited business-critical consequences |
| T5 | Cascading Hallucination Attacks | These attacks exploit an AI’s tendency to generate contextually plausible but false information, which can propagate through systems and disrupt decision-making. This can also lead to destructive reasoning affecting tools invocation. | HIGH | Amplifies misinformation across systems; difficult to detect plausible false information; systemic cascading impact |
| T6 | Intent Breaking & Goal Manipulation | This threat exploits vulnerabilities in an AI agent’s planning and goal-setting capabilities, allowing attackers to manipulate or redirect the agent’s objectives and reasoning. One common approach is Agent Hijacking mentioned in Tool Misuse. | CRITICAL | Fundamental compromise of agent’s core purpose; high business impact; undermines entire system integrity |
| T7 | Misaligned & Deceptive Behaviors | AI agents executing harmful or disallowed actions by exploiting reasoning and deceptive responses to meet their objectives. | MEDIUM | Emerging threat with detection methods still maturing; requires sophisticated AI behavior analysis; theoretical risk level |
| T8 | Repudiation & Untraceability | Occurs when actions performed by AI agents cannot be traced back or accounted for due to insufficient logging or transparency in decision-making processes. | HIGH | Critical for compliance and forensic investigations; undermines accountability frameworks; regulatory implications |
| T9 | Identity Spoofing & Impersonation | Attackers exploit authentication mechanisms to impersonate AI agents or human users, enabling them to execute unauthorized actions under false identities. | HIGH | Violates fundamental trust boundaries; enables privilege escalation; difficult to distinguish from legitimate behavior |
| T10 | Overwhelming Human in the Loop | This threat targets systems with human oversight and decision validation, aiming to exploit human cognitive limitations or compromise interaction frameworks. | MEDIUM | Human factors vulnerability; gradually exploitable; existing UI/UX patterns provide some protection |
| T11 | Unexpected RCE and Code Attacks | Attackers exploit AI-generated execution environments to inject malicious code, trigger unintended system behaviors, or execute unauthorized scripts. | CRITICAL | Direct system compromise potential; high technical impact; immediate security control bypass |
| T12 | Agent Communication Poisoning | Attackers manipulate communication channels between AI agents to spread false information, disrupt workflows, or influence decision-making. | MEDIUM | Specific to multi-agent deployments; requires sophisticated attack coordination; limited current deployment scope |
| T13 | Rogue Agents in Multi-Agent Systems | Malicious or compromised AI agents operate outside normal monitoring boundaries, executing unauthorized actions or exfiltrating data. | HIGH | Classic insider threat model applied to AI; difficult detection in complex distributed systems; stealth capability |
| T14 | Human Attacks on Multi-Agent Systems | Adversaries exploit inter-agent delegation, trust relationships, and workflow dependencies to escalate privileges or manipulate AI-driven operations. | MEDIUM | Requires multi-agent deployment context; complex attack chain dependency; limited immediate applicability |
| T15 | Human Manipulation | In scenarios where AI agents engage in direct interaction with human users, the trust relationship reduces user skepticism, increasing reliance on the agent’s responses and autonomy. This implicit trust and direct human/agent interaction create risks, as attackers can coerce agents to manipulate users, spread misinformation, and take covert actions. | HIGH | Exploits inherent user trust in AI systems; immediate social engineering risk; difficult user education challenge |
OWASP Agentic Threats Taxonomy Navigator
To systematically evaluate these risks, OWASP introduces a taxonomy navigator based on a decision path:
- Agency & Reasoning Threats:
- Intent Breaking & Goal Manipulation: Attackers alter an agent’s planning or objectives via prompt injections, compromised data, or malicious tools. Scenario Example: An attacker injects sub-goals to make an AI drift from its original objectives gradually.
- Memory-Based Threats:
- Memory Poisoning: Corrupting an AI’s memory systems, both short and long-term, to introduce malicious or false data and exploit the agent’s context. This can lead to altered decision-making and unauthorized operations. Scenario Example: An attacker reinforces a false pricing rule in a travel agent’s memory, leading to unauthorized bookings.
- Tool & Execution-Based Threats:
- Tool Misuse: Manipulating AI agents to abuse their integrated tools through deceptive prompts or commands, operating within authorized permissions. This includes Agent Hijacking, where an AI agent ingests adversarial manipulated data and subsequently executes unintended actions, potentially triggering malicious tool interactions. For more information on Agent Hijacking see https://www.nist.gov/news. Scenario Example: An attacker chains tool actions in a customer service AI to extract records and email them out.
- Unexpected RCE & Code Attacks: Exploiting AI-generated code execution. Scenario Example: An AI DevOps agent is manipulated to generate scripts with hidden commands to extract secrets.
- Authentication & Spoofing Threats:
- Identity Spoofing & Impersonation: Attackers impersonate AI agents, users, or services. Scenario Example: A rogue AI mimics a legitimate system agent to gain unauthorized access.
- Human-Related Threats:
- Overwhelming Human-in-the-Loop (HITL): Exploiting human oversight dependencies by overwhelming users with requests or decision fatigue. Scenario Example: Attackers flood reviewers with tasks, inducing fatigue and leading to rushed approvals.
- Human Manipulation: Adversaries manipulate a compromised AI to coerce users into harmful actions. Scenario Example: An AI is compromised to replace legitimate vendor bank details, tricking a user into processing a fraudulent transfer.
- Multi-Agent System Threats:
- Agent Communication Poisoning: Manipulating inter-agent communication to inject false information or misdirect decisions. Scenario Example: An attacker forges consensus messages to manipulate inter-agent validation.
- Rogue Agents in Multi-Agent Systems: Malicious or compromised agents infiltrating architectures to manipulate decisions or corrupt data. Scenario Example: A rogue agent impersonates a financial approval AI to inject fraudulent transactions.
Conclusion: Charting a Secure Path for Agentic AI
The rise of Agentic AI is transformative, but it demands a proactive and evolving approach to security. Threat modeling is not a one-time exercise but a continuous process that must be integrated throughout the AI system’s lifecycle.
Recommendations (Mitigation Playbooks)
OWASP outlines six playbooks to structure mitigation strategies:
- Preventing AI Agent Reasoning Manipulation:
- Restrict tool access to minimize the attack surface.
- Implement goal consistency validation to detect unintended behavioral shifts.
- Enforce cryptographic logging and immutable audit trails.
- Preventing Memory Poisoning & AI Knowledge Corruption:
- Enforce memory content validation and restrict persistence to trusted sources.
- Use session isolation to prevent unintended knowledge carryover.
- Implement rollback mechanisms for AI knowledge.
- Securing AI Tool Execution & Preventing Unauthorized Actions:
- Implement strict tool access control policies and use execution sandboxes.
- Use rate-limiting for API calls and resource-intensive tasks.
- Require human verification for AI-generated code with elevated privileges.
- Strengthening Authentication, Identity & Privilege Controls:
- Require cryptographic identity verification for AI agents.
- Implement granular Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC).
- Deploy Multi-Factor Authentication (MFA) for high-privilege AI accounts.
- Protecting HITL & Preventing Threats Rooted in Human Interaction:
- Use AI trust scoring to prioritize HITL review queues.
- Limit AI-generated notifications to prevent cognitive overload.
- Implement AI-assisted explanation summaries for human reviewers.
- Securing Multi-Agent Communication & Trust Mechanisms:
- Require message authentication and encryption for all inter-agent communications.
- Deploy agent trust scoring and consensus verification for high-risk operations.
- Implement task segmentation to prevent privilege escalation across interconnected agents.
In Conclusion
Agentic AI systems are not just another IT asset; their autonomy, learning capabilities, and complex interactions create novel security challenges that require specialized attention. Cybersecurity professionals must champion a security-first mindset in the development and deployment of these powerful technologies. The OWASP ASI’s work provides a critical foundation for navigating this new landscape.
Useful Resources
For those looking to dive deeper, here are some valuable resources:
- OWASP Top 10 for Large Language Model Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/
- OWASP Agentic Security Initiative (ASI) GitHub: https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/tree/main/initiatives/agent_security_initiative
- OpenAI’s work on Agent Capabilities:
- Operator System Card: https://openai.com/index/operator-system-card/
- Function Calling Guide: https://platform.openai.com/docs/guides/function-calling
- Lilian Weng’s Blog on LLM-powered Autonomous Agents: https://lilianweng.github.io/posts/2023-06-23-agent/
- OWASP Threat Modeling Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Threat_Modeling_Cheat_Sheet.html
- NIST on AI Agent Hijacking: https://www.nist.gov/news-events/news/2025/01/technical-blog-strengthening-ai-agent-hijacking-evaluations
By understanding the unique threat landscape of Agentic AI and proactively implementing robust security measures, we can harness its immense potential while safeguarding our digital environments.