AI Agent Attacks • Prompt Injection • Tool Abuse • Memory Poisoning

Autonomous agents create a new security perimeter.

AI agents do not just answer questions — they retrieve content, call tools, write memory, and take actions. That makes prompt injection, malicious tool outputs, poisoned memory, and identity abuse production security risks.

Explore attack vectors See defenses

Four attack surfaces every agent team must defend

Agents combine language inputs, external tools, long-lived memory, and trust relationships. Each layer introduces distinct failure modes.

Input attacks

Attackers manipulate prompts, retrieved content, or tool results to hijack agent goals.

Execution attacks

Agents are tricked into unsafe tool calls, malicious code execution, or unauthorized APIs.

Data & memory attacks

Persistent memory, vector stores, and tool outputs are poisoned to corrupt future behavior.

Identity & trust attacks

Impersonation exploits trust hierarchies to gain elevated permissions.

Input and prompt attacks against AI agents

Input & Prompt Attacks

Attackers can hijack behavior through crafted context

Direct prompt injection tells the agent to ignore instructions. Indirect injection hides malicious commands in documents, web pages, or tool outputs. Context stuffing floods the window to dilute the original goal.

Hardest to detect: indirect injection, because the agent encounters the malicious instruction as retrieved content instead of direct user input.

Execution & Supply Chain Attacks

Tools turn bad instructions into real-world actions

Agents with tool access can be manipulated into privilege escalation, unsafe API calls, malicious plugin/MCP interactions, or code execution exploits.

Tool abuse

Prompts convince the agent to call high-privilege tools outside the task scope.

Malicious dependencies

Compromised plugins or MCP servers return payloads the agent treats as legitimate data.

Execution and supply chain attacks against AI agents

Data memory and identity attacks against AI agents

Data, Memory & Identity

Long-lived state can make attacks persistent

A poisoned vector store, malicious memory entry, rogue peer agent, or secret-harvesting instruction can corrupt behavior across future sessions and users.

Persistent risk: memory poisoning is silent and can spread across thousands of future sessions if write controls and integrity checks are missing.

Map every attack class to a defense layer

Agent security is strongest when each layer has explicit controls: input, execution, data, and identity.

Input layer

Use prompt hardening, content tagging, system prompt pinning, and context length limits.

Execution layer

Apply least-privilege tool scopes, tool allowlists, signature verification, and isolated sandboxes.

Data & identity layer

Control memory writes, hash critical memory, sign inter-agent messages, and keep secrets out of context.

Production Controls

Six precautions every production agent needs

No single control is enough. Agents need prompt hardening, least-privilege tools, input/output filtering, full observability, human gates, and secrets management working together.

Scan retrieved content before passing it to the modelRequire approval for irreversible or high-impact actionsLog prompts, tool calls, results, and agent decisions

Security precautions and checks for AI agents

Defense-in-depth implementation checklist

Secure the entire request lifecycle: before execution, during execution, and after execution.

Before execution

Validate inputs, verify tool allowlists, and confirm identity and task scope.

During execution

Tag retrieved content, monitor tool calls, block violations, and rate-limit steps.

After execution

Scan outputs, log full traces, and require human review for high-risk outcomes.

Agent security requires all three layers working together: prevention before execution, monitoring during execution, and auditability after execution.

Treat every agent input, tool result, and memory write as untrusted.

Production AI agents need the same rigor as distributed systems and third-party code: least privilege, signed identities, sandboxing, observability, human gates, and defense-in-depth controls.