Securing the Autonomous Frontier: Layered Defenses for AI Agent Deployments

Security Careers

03 May 2025 — 6 min read

AI agents are no longer theoretical concepts; they are rapidly becoming integral components of real-world applications across diverse sectors, from customer service to finance and healthcare. Defined as software programs designed to autonomously collect data, process information, and take actions toward specific objectives without direct human intervention, these agents typically leverage large language models (LLMs) as their core reasoning engines. A key capability enabling their autonomy is their ability to connect LLMs to external functions or tools, such as APIs, databases, or services.

While AI agents inherit many security risks from the LLMs they are built upon, including prompt injection, sensitive data leakage, and supply chain vulnerabilities outlined in the OWASP Top 10 for LLMs, their integration with external tools introduces a significant expansion of the attack surface. This integration exposes them to classic software threats like SQL injection, remote code execution (RCE), and broken access control. The combination of inherent LLM risks and these new, integrated tool risks makes securing AI agents particularly critical, as their ability to interact with external systems or even the physical world amplifies the potential impact of compromises. Attacks can escalate from simple information leakage to credential theft, tool exploitation, and even full infrastructure takeover.

Our sources highlight that the vulnerabilities and attack vectors targeting AI agents are largely framework-agnostic. Whether built using open-source frameworks like CrewAI or AutoGen, the risks primarily stem from insecure design patterns, misconfigurations, and unsafe tool integrations, rather than inherent flaws in the frameworks themselves.

Given the complexity and expanded attack surface of agentic applications, no single mitigation is sufficient to effectively reduce risk. A layered, defense-in-depth strategy is necessary, combining multiple safeguards across various components including the agents themselves, the tools they utilize, the prompts they receive, and their runtime environments.

paloaltoalagents

paloaltoalagents.pdf

2 MB

Based on the attack scenarios demonstrated in the sources, the following five key mitigation strategies form essential layers in securing AI agent deployments:

Prompt Hardening:
- A prompt serves as the definition of an agent's behavior. Poorly scoped or overly permissive prompts make agents vulnerable to manipulation and exploitation, even without explicit injections.
- Prompt hardening involves designing prompts with strict constraints and guardrails. Best practices include explicitly prohibiting agents from disclosing their instructions, coworker agents, and tool schemas. It also involves narrowly defining each agent’s responsibilities and rejecting requests that are outside of their defined scope. Furthermore, it requires constraining tool invocations to expected input types, formats, and values.
- While prompt hardening raises the bar for attackers, it is not a standalone solution; advanced injection techniques can still bypass these defenses, necessitating pairing it with runtime content filtering. Prompt hardening is relevant against threats like identifying participant agents, extracting agent instructions, extracting agent tool schemas, and various intent breaking/goal manipulation attacks.
Content Filtering:
- Content filters act as inline defenses, inspecting and potentially blocking agent inputs and outputs in real time before threats can propagate.
- While traditionally used in GenAI applications to defend against jailbreaks and prompt injection, they remain a critical layer for agentic applications which inherit these risks and introduce new ones.
- Advanced solutions offer deeper inspection tailored to AI agents, capable of detecting not just traditional prompt injection but also tool schema extraction attempts, tool misuse (including unintended invocations and vulnerability exploitation), memory manipulation, malicious code execution attempts (like SQL injection and exploit payloads), sensitive data leakage (such as credentials and secrets), and malicious URLs or domain references.
- Content filtering is a broad defense applicable against prompt injection, intent breaking, goal manipulation, tool misuse, and communication poisoning across various scenarios, including data exfiltration and unauthorized access attempts.
Tool Input Sanitization:
- A fundamental principle is that tools integrated into AI agents must never implicitly trust their inputs, even if those inputs appear to come from a benign agent. Attackers can manipulate agents to supply crafted inputs designed to exploit vulnerabilities within the tools.
- To prevent such abuse, every tool should sanitize and validate its inputs before execution.
- Key checks include validating input type and format (e.g., expected strings, numbers, or structured objects), performing boundary and range checking, and filtering/encoding special characters to prevent injection attacks (like SQL injection).
- Tool input sanitization is crucial for defending against attacks that exploit tools, such as gaining unauthorized access to internal networks via web readers or exploiting SQL injection vulnerabilities.
Tool Vulnerability Scanning:
- All external functions or tools integrated into agentic systems require regular security assessments.
- This includes performing Static Application Security Testing (SAST) for source code analysis, Dynamic Application Security Testing (DAST) for runtime behavior analysis, and Software Composition Analysis (SCA) to detect vulnerable dependencies and third-party libraries.
- These assessments help identify misconfigurations, insecure logic, and outdated components that attackers could exploit through tool misuse. For instance, scanning can help uncover vulnerabilities like Broken Object-Level Authorization (BOLA) that allow attackers to access unauthorized user data by simply manipulating object references in tool inputs.
Code Executor Sandboxing:
- Code executors allow agents to dynamically generate and run code, a powerful capability that introduces significant risks like arbitrary code execution and lateral movement within the execution environment.
- While most agent frameworks utilize container-based sandboxes for isolation, default configurations are often insufficient. Stricter runtime controls are needed to prevent sandbox escape or misuse.
- Essential sandboxing practices include restricting container networking to allow only necessary outbound domains and blocking access to internal services (like metadata endpoints or private addresses). It also requires limiting mounted volumes to avoid broad or persistent paths, dropping unnecessary Linux capabilities, blocking risky system calls, and enforcing resource quotas to prevent denial-of-service (DoS) or runaway code.
- Robust code executor sandboxing is vital for preventing unexpected RCE and code attacks, including sensitive data exfiltration via mounted volumes or service account access token exfiltration via metadata services.

Securing AI agents requires a comprehensive approach that goes beyond these core technical mitigations. Protecting sensitive information like credentials also necessitates the use of Data Loss Prevention (DLP) solutions, robust audit logs, and secret management services.

Conclusion

AI agents, by combining the complexities of LLMs with the integration of external tools, present a significantly expanded and complex attack surface. The simulated attacks discussed in the sources demonstrate that vulnerabilities are often rooted in design and configuration, not specific frameworks, and can be triggered by flexible and evasive prompt payloads. Effectively defending these systems demands more than ad hoc fixes; it requires a deliberate, layered defense-in-depth strategy. By implementing safeguards across prompts, tool inputs, tool security, and runtime environments – including prompt hardening, content filtering, tool input sanitization, tool vulnerability scanning, and robust code executor sandboxing – organizations can build resilient defenses necessary to navigate the security challenges of the autonomous frontier. Purpose-built solutions tailored for AI systems can further enhance these defenses by providing real-time threat detection and control over AI usage.

Securing the Autonomous Frontier: Layered Defenses for AI Agent Deployments

Security Careers

Read more

Beyond the Known: Navigating Cybersecurity Risks in Your Multi-Tiered Supply Chain

The Silent Compromise: How "Overemployed" Remote Workers Are Creating a New Class of Insider Threats in the Software Development Lifecycle

Navigating the Digital Maze: How AI-Enhanced DLP Tames Multi-Cloud Chaos and Shadow IT

Bridging the Gap: Balancing Security, User Experience, and Operational Efficiency in Identity Management