Input attacks
Attackers manipulate prompts, retrieved content, or tool results to hijack agent goals.
AI agents do not just answer questions — they retrieve content, call tools, write memory, and take actions. That makes prompt injection, malicious tool outputs, poisoned memory, and identity abuse production security risks.
Agents combine language inputs, external tools, long-lived memory, and trust relationships. Each layer introduces distinct failure modes.
Attackers manipulate prompts, retrieved content, or tool results to hijack agent goals.
Agents are tricked into unsafe tool calls, malicious code execution, or unauthorized APIs.
Persistent memory, vector stores, and tool outputs are poisoned to corrupt future behavior.
Impersonation exploits trust hierarchies to gain elevated permissions.
Direct prompt injection tells the agent to ignore instructions. Indirect injection hides malicious commands in documents, web pages, or tool outputs. Context stuffing floods the window to dilute the original goal.
Agents with tool access can be manipulated into privilege escalation, unsafe API calls, malicious plugin/MCP interactions, or code execution exploits.
Prompts convince the agent to call high-privilege tools outside the task scope.
Compromised plugins or MCP servers return payloads the agent treats as legitimate data.
A poisoned vector store, malicious memory entry, rogue peer agent, or secret-harvesting instruction can corrupt behavior across future sessions and users.
Agent security is strongest when each layer has explicit controls: input, execution, data, and identity.
Use prompt hardening, content tagging, system prompt pinning, and context length limits.
Apply least-privilege tool scopes, tool allowlists, signature verification, and isolated sandboxes.
Control memory writes, hash critical memory, sign inter-agent messages, and keep secrets out of context.
No single control is enough. Agents need prompt hardening, least-privilege tools, input/output filtering, full observability, human gates, and secrets management working together.
Secure the entire request lifecycle: before execution, during execution, and after execution.
Validate inputs, verify tool allowlists, and confirm identity and task scope.
Tag retrieved content, monitor tool calls, block violations, and rate-limit steps.
Scan outputs, log full traces, and require human review for high-risk outcomes.
Production AI agents need the same rigor as distributed systems and third-party code: least privilege, signed identities, sandboxing, observability, human gates, and defense-in-depth controls.