AI agents that execute code on behalf of users represent a powerful but dangerous capability. Without proper input validation, these systems can be manipulated to run arbitrary commands, access sensitive resources, or compromise the underlying infrastructure. This guide outlines practical techniques to prevent code execution vulnerabilities in AI assistant implementations.
Understanding the Injection Risk
The core vulnerability stems from AI agents interpreting user instructions as executable code. When an agent passes raw user input directly to shell commands, eval() statements, or code interpreters, attackers can inject malicious payloads that execute with the agent's privileges. This pattern—command injection—has existed for decades but takes on new complexity when mediated by natural language understanding.
Attackers exploit this through prompt injection techniques that manipulate the agent's reasoning. A seemingly innocent request like "Please summarize this file: ; rm -rf /" can trigger catastrophic consequences if the agent passes the unvalidated string to a file-reading utility. The semicolon terminates the intended command and starts a new one, with the agent executing both. The natural language wrapper doesn't provide protection; the underlying code execution context remains vulnerable.
Implementing Input Sanitization
Effective defenses require treating all user-supplied content as untrusted until proven otherwise. Never pass raw user input to execution contexts. Instead, extract structured parameters through strict parsing and validate them against explicit allowlists.
from typing import Optional
import re
class SafeCodeExecutor:
ALLOWED_COMMANDS = {"python", "node", "ruby"}
DISALLOWED_PATTERNS = [
r";", r"&", r"\|", r"`", r"\$\(", r"\$\{",
r">>", r"<", r">", r"\*\*"
]
def validate_command(self, user_input: str) -> Optional[str]:
# Extract only the intended command, ignore everything else
parts = user_input.strip().split(maxsplit=1)
if not parts:
return None
command = parts[0].lower()
if command not in self.ALLOWED_COMMANDS:
return None
# Check for injection patterns in entire input
for pattern in self.DISALLOWED_PATTERNS:
if re.search(pattern, user_input):
return None
return command
def execute(self, user_input: str) -> dict:
validated = self.validate_command(user_input)
if not validated:
return {"error": "Invalid or unsafe command", "executed": False}
# Only proceed with validated, safe command
return self.run_in_sandbox(validated)
This example demonstrates allowlisting permitted commands and pattern-matching for dangerous characters. The key principle: reject anything that doesn't match expected patterns rather than trying to remove bad content.
Architectural Controls
Beyond input validation, architectural decisions significantly impact security posture. Isolate execution environments using containers or virtual machines with minimal privileges. Never run agent code as root or with access to production databases.
Consider these architectural patterns: - Parameter extraction via LLM: Use a separate, constrained model instance to extract structured parameters from natural language, then validate those parameters independently of the original input - Capability-based restrictions: Implement capability tokens that agents must present to access sensitive operations, with tokens tied to specific, pre-approved actions - Read-only operations: Design agents to prefer read-only operations; any write or execute capability requires explicit human confirmation - Audit logging: Log all code execution events with full input context for forensic analysis
Validation Checklist for Agent Developers
Before deploying AI agents with code execution capabilities, verify these controls are in place:
- Input validation: All user input parsed through strict schemas with allowlist-based validation
- Command filtering: Dangerous patterns blocked through pattern matching, not blacklist removal
- Sandbox isolation: Execution occurs in isolated environments without network access to sensitive systems
- Least privilege: Agent processes run with minimal permissions, never as root or admin
- Audit trails: Complete logging of execution events with input preservation
- Human-in-the-loop: Destructive operations require explicit confirmation
- Rate limiting: Execution frequency capped to prevent automated exploitation
- Output sanitization: Results filtered before returning to users to prevent data leakage
Conclusion
Code execution capabilities in AI agents demand defensive programming patterns that treat every user interaction as potentially hostile. The techniques outlined here—allowlist validation, pattern-based filtering, sandbox isolation, and architectural controls—provide defense in depth against injection attacks. Implement these controls before exposing agents to untrusted users, and review them regularly as attack techniques evolve.