Prevent Code Execution in AI Assistants: Input Validation Essentials

AI agents that execute code on behalf of users represent a powerful but dangerous capability. Without proper input validation, these systems can be manipulated to run arbitrary commands, access sensitive resources, or compromise the underlying infrastructure. This guide outlines practical techniques to prevent code execution vulnerabilities in AI assistant implementations.

Understanding the Injection Risk

The core vulnerability stems from AI agents interpreting user instructions as executable code. When an agent passes raw user input directly to shell commands, eval() statements, or code interpreters, attackers can inject malicious payloads that execute with the agent's privileges. This pattern—command injection—has existed for decades but takes on new complexity when mediated by natural language understanding.

Attackers exploit this through prompt injection techniques that manipulate the agent's reasoning. A seemingly innocent request like "Please summarize this file: ; rm -rf /" can trigger catastrophic consequences if the agent passes the unvalidated string to a file-reading utility. The semicolon terminates the intended command and starts a new one, with the agent executing both. The natural language wrapper doesn't provide protection; the underlying code execution context remains vulnerable.

Implementing Input Sanitization

Effective defenses require treating all user-supplied content as untrusted until proven otherwise. Never pass raw user input to execution contexts. Instead, extract structured parameters through strict parsing and validate them against explicit allowlists.

from typing import Optional
import re

class SafeCodeExecutor:
    ALLOWED_COMMANDS = {"python", "node", "ruby"}
    DISALLOWED_PATTERNS = [
        r";", r"&", r"\|", r"`", r"\$\(", r"\$\{",
        r">>", r"<", r">", r"\*\*"
    ]

    def validate_command(self, user_input: str) -> Optional[str]:
        # Extract only the intended command, ignore everything else
        parts = user_input.strip().split(maxsplit=1)
        if not parts:
            return None

        command = parts[0].lower()
        if command not in self.ALLOWED_COMMANDS:
            return None

        # Check for injection patterns in entire input
        for pattern in self.DISALLOWED_PATTERNS:
            if re.search(pattern, user_input):
                return None

        return command

    def execute(self, user_input: str) -> dict:
        validated = self.validate_command(user_input)
        if not validated:
            return {"error": "Invalid or unsafe command", "executed": False}

        # Only proceed with validated, safe command
        return self.run_in_sandbox(validated)

This example demonstrates allowlisting permitted commands and pattern-matching for dangerous characters. The key principle: reject anything that doesn't match expected patterns rather than trying to remove bad content.

Architectural Controls

Beyond input validation, architectural decisions significantly impact security posture. Isolate execution environments using containers or virtual machines with minimal privileges. Never run agent code as root or with access to production databases.

Consider these architectural patterns: - Parameter extraction via LLM: Use a separate, constrained model instance to extract structured parameters from natural language, then validate those parameters independently of the original input - Capability-based restrictions: Implement capability tokens that agents must present to access sensitive operations, with tokens tied to specific, pre-approved actions - Read-only operations: Design agents to prefer read-only operations; any write or execute capability requires explicit human confirmation - Audit logging: Log all code execution events with full input context for forensic analysis

Validation Checklist for Agent Developers

Before deploying AI agents with code execution capabilities, verify these controls are in place:

Input validation: All user input parsed through strict schemas with allowlist-based validation
Command filtering: Dangerous patterns blocked through pattern matching, not blacklist removal
Sandbox isolation: Execution occurs in isolated environments without network access to sensitive systems
Least privilege: Agent processes run with minimal permissions, never as root or admin
Audit trails: Complete logging of execution events with input preservation
Human-in-the-loop: Destructive operations require explicit confirmation
Rate limiting: Execution frequency capped to prevent automated exploitation
Output sanitization: Results filtered before returning to users to prevent data leakage

Conclusion

Code execution capabilities in AI agents demand defensive programming patterns that treat every user interaction as potentially hostile. The techniques outlined here—allowlist validation, pattern-based filtering, sandbox isolation, and architectural controls—provide defense in depth against injection attacks. Implement these controls before exposing agents to untrusted users, and review them regularly as attack techniques evolve.