Build a Zero-Trust URL Pipeline for AI Agents

Build a Zero-Trust URL Pipeline for AI Agents

AI agents increasingly fetch external resources—documentation, APIs, data feeds—to complete tasks. Yet this capability becomes an attack surface when agents fetch from untrusted URLs. In 2023, several LLM systems demonstrated how zero-click data exfiltration occurs: an agent receives a benign instruction, fetches a URL, and leaks sensitive context to an external server. The root cause wasn't sophisticated LLM exploitation, but implicit trust in URL destinations. This article outlines how to build a URL pipeline that assumes every destination is potentially hostile.

The Attack Model: How Implicit Trust Fails

When an AI agent receives a URL from any source, it typically validates the string looks like a URL, then fetches it. This fails against multiple attack vectors. Attackers craft URLs using IDN homographs (e.g., раypal.com with Cyrillic 'а'), IPv4-mapped IPv6 addresses, or URL-encoded path traversals. DNS rebinding attacks exploit the gap between validation-time and fetch-time resolution, where a hostname initially points to a safe address but switches to an internal service after TTL expiration.

Consequences extend beyond SSRF. Agents might leak conversation context through Referer headers, User-Agent strings with session identifiers, or URL query parameters. When agents operate with elevated privileges, this leakage becomes catastrophic. Zero-click exfiltration requires no user interaction—merely the presence of a crafted URL in training data or tool outputs.

Layer One: Explicit Allowlisting with Canonicalization

Zero-trust URL handling requires explicit allowlisting—not of URL patterns, but of canonicalized destinations. Pattern matching (*.trusted.com) fails against subdomain takeover. Instead, resolve hostnames to IPs at validation time, then validate those IPs. This collapses the DNS rebinding window and eliminates homograph attacks.

Canonicalization must be aggressive: decode percent-encoding, normalize Unicode to NFKC, lowercase schemes and hostnames, and resolve relative paths. The validated result should be an internal representation—never the original string passed to fetch operations.

from urllib.parse import urlparse, unquote
import unicodedata
import socket

def canonicalize_and_validate(url: str, allowed_hosts: set) -> dict:
    parsed = urlparse(url)

    if parsed.scheme not in ('http', 'https'):
        raise ValueError(f"Scheme not allowed: {parsed.scheme}")

    hostname = unicodedata.normalize('NFKC', parsed.hostname or '')
    hostname = hostname.lower().strip('.')

    try:
        resolved_ip = socket.getaddrinfo(hostname, None)[0][4][0]
    except socket.gaierror:
        raise ValueError(f"Cannot resolve: {hostname}")

    if hostname not in allowed_hosts:
        raise ValueError(f"Host not allowed: {hostname}")

    return {
        'scheme': parsed.scheme,
        'hostname': hostname,
        'resolved_ip': resolved_ip,
        'port': parsed.port or (443 if parsed.scheme == 'https' else 80),
        'path': unquote(parsed.path),
        'query': unquote(parsed.query)
    }

Validation must occur in a separate security context from fetch operations—ideally a distinct process with no access to agent credentials.

Layer Two: Controlled Resolution and Fetch Isolation

Validation without isolation remains vulnerable. The fetch component must operate with minimal privileges: no internal API access, no environment secrets, restricted network egress.

Implement fetch isolation through sandboxed workers with: - Fixed timeouts (5-10 seconds maximum) - Response size limits at the transport layer - No automatic redirect following—revalidate each Location header - Explicit User-Agent strings - No cookie jar or credential access

Redirects require special handling. Capture 3xx responses, extract Location headers, and feed URLs back through validation before following. This prevents open redirects from bypassing controls.

For authenticated API access, inject credentials at the fetch layer—not in URL construction. Never embed tokens or keys in URLs passed between components.

Layer Three: Content Verification and Behavioral Monitoring

Zero-trust extends beyond fetch completion. Response content requires verification before reaching agent context. Attackers serve benign content to validation pipelines, then switch payloads for actual requests—a TOCTOU variant.

Enforce content-type strictly. If your agent expects JSON, reject text/html responses. Parse responses in the isolated fetcher before passing structured data—never raw bytes. For HTML processing, use hardened parsers with external entities disabled, extracting only required fields.

Log every fetch attempt: requested URL, canonicalized destination, resolved IP, response size, content type, and timing. Anomalies trigger review without blocking operation. This telemetry enables incident response forensics.

Implementation Recommendations

Building this pipeline requires architectural decisions:

  • Separate concerns: Validation, resolution, fetch, and content processing as distinct components
  • Default-deny: Adding to allowlists requires human review
  • Test attack corpora: Use SSRF payloads, DNS rebinding scenarios, and homograph variants
  • Monitor bypass attempts: Log validation failures and analyze patterns
  • Assume failure: Design for detection and response, not just prevention

Zero-trust URL handling treats every external request as a potential compromise. Production-ready agents require this assumption from inception.

AgentGuard360

Built for agents and humans. Comprehensive threat scanning, device hardening, and runtime protection. All without data leaving your machine.

Coming Soon