Agentic ProbLLMs: Exploiting AI Computer-Use And Coding Agents

The 39C3 Chaos Communication Congress revealed a critical vulnerability in AI agent systems that researchers are calling "Agentic ProbLLMs" - a novel attack vector that exploits probabilistic LLM behavior to subvert computer-use and coding agents. This research, presented by Embrace The Red, demonstrates how attackers can manipulate AI agents into executing malicious code through carefully crafted prompts that appear benign to traditional security filters.

How the Attack Works

Agentic ProbLLMs exploit the probabilistic nature of large language models when they function as autonomous agents. Unlike traditional prompt injection attacks that rely on obvious malicious instructions, this technique leverages the inherent uncertainty in LLM decision-making processes. Attackers craft prompts that create high-probability pathways toward malicious outcomes while maintaining semantic innocence.

The attack specifically targets the tool-calling mechanisms within agent frameworks. When an AI agent evaluates whether to execute a tool call, it assigns probabilities to different potential actions. By carefully structuring prompts that increase the probability weight of malicious tool executions, attackers can influence the agent to perform unauthorized operations. This is particularly effective against coding agents that have access to file system operations, shell commands, or code execution environments.

Real-World Implications

This attack vector poses significant risks to production AI agent deployments across multiple sectors. Development environments using AI coding assistants face immediate threats, as malicious code suggestions could compromise entire codebases. Customer service agents with database access could be manipulated to exfiltrate sensitive information. Financial trading agents might execute unauthorized transactions based on probabilistically manipulated decision paths.

The research demonstrates successful exploitation against popular agent frameworks including LangGraph and AutoGPT implementations. In one demonstration, researchers showed how a coding agent could be gradually manipulated into creating a backdoor in a web application over multiple benign-seeming code review requests. The agent, maintaining context across sessions, eventually suggested security-weakening changes that appeared to address performance optimization.

Defensive Measures and Implementation

Protecting against Agentic ProbLLMs requires implementing multi-layered security controls that address both the probabilistic exploitation and the tool execution pathways. The most effective defense involves implementing strict tool permission boundaries and probability threshold monitoring.

from langgraph.graph import MessagesState, StateGraph, END, START
from langchain_core.tools import tool
import numpy as np

@tool
class SecureToolWrapper:
    """Wrapper that adds probability monitoring to tool calls"""

    def __init__(self, tool_function, risk_threshold=0.8):
        self.tool = tool_function
        self.risk_threshold = risk_threshold
        self.call_history = []

    def __call__(self, *args, **kwargs):
        # Monitor consecutive tool calls for pattern anomalies
        if len(self.call_history) > 3:
            recent_patterns = self.call_history[-3:]
            pattern_entropy = self._calculate_pattern_entropy(recent_patterns)
            if pattern_entropy < 0.3:  # Suspiciously low entropy indicates manipulation
                raise SecurityError("Potential probabilistic manipulation detected")

        result = self.tool(*args, **kwargs)
        self.call_history.append((self.tool.name, args, kwargs))
        return result

    def _calculate_pattern_entropy(self, patterns):
        tool_counts = {}
        for pattern in patterns:
            tool_name = pattern[0]
            tool_counts[tool_name] = tool_counts.get(tool_name, 0) + 1

        total = len(patterns)
        entropy = -sum((count/total) * np.log2(count/total) 
                       for count in tool_counts.values())
        return entropy

# Implement tool whitelist with strict boundaries
SAFE_TOOLS = {
    'file_read': SecureToolWrapper(file_read_tool),
    'code_analyze': SecureToolWrapper(code_analyze_tool),
}

# Block high-risk tools entirely
BLOCKED_TOOLS = [
    'shell_execute', 'file_write_system', 'network_request'
]

Additional defensive measures include implementing conversation-level anomaly detection that tracks the cumulative effect of user interactions. Monitor for gradual shifts in agent behavior patterns, particularly changes in tool selection probabilities over time.

Immediate Action Items

Organizations deploying AI agents should immediately audit their current implementations for vulnerability to probabilistic manipulation attacks. Review and restrict tool permissions to the minimum necessary for agent functionality.

Key immediate steps include:

  1. Audit Tool Permissions: Review all tools available to AI agents and remove any unnecessary high-risk capabilities
  2. Implement Probability Monitoring: Add logging for tool selection probabilities and alert on anomalous patterns
  3. Deploy Conversation Analysis: Monitor multi-turn conversations for gradual shifts toward malicious behavior
  4. Restrict Context Windows: Limit the context available to agents to prevent long-term probabilistic manipulation
  5. Enable Human Oversight: Require human approval for sensitive operations, especially file system modifications

The research presented at 39C3 highlights a fundamental security challenge in current AI agent architectures. As these systems become more autonomous and gain access to sensitive operations, addressing probabilistic exploitation vectors becomes critical for maintaining security postures.

For detailed technical information and proof-of-concept demonstrations, refer to the original research at https://embracethered.com/blog/posts/2025/39c3-agentic-probllms-exploiting-computer-use-and-coding-agents/.

AgentGuard360

Built for agents and humans. Comprehensive threat scanning, device hardening, and runtime protection. All without data leaving your machine.

Coming Soon