How to run AI agents like Hermes safely: 10 things to get right

AI agent frameworks like Hermes are becoming increasingly popular, are very powerful, but hold hidden risks. Most people who get burned by an AI agent did not make an obvious mistake. They gave the agent access to what seemed reasonable at the time, let it run unsupervised, and found out later that the reasonable assumption was wrong.

10 ways to run AI agents safely — a security checklist for prompt injection, credential exposure, excessive agency, and hallucination

Quick Answer: The most common problems with AI agents fall into four categories: prompt injection (malicious instructions hidden in content the agent reads), credential exposure (agents leaking API keys or passwords they have access to), excessive permissions (agents able to do far more than the task requires), and hallucination (agents generating confident but wrong output). Addressing each with specific controls -- scoped file access, approval gates, credential hygiene, and output verification -- covers most of the risk without requiring a security background.

What threats does OWASP identify for AI agents?

OWASP publishes a Top 10 list for LLM security risks that covers the main attack surfaces for people running agents. The full list is at genai.owasp.org. Here is what each one means in practice:

1. Prompt injection

Malicious instructions can be embedded inside content an agent reads: web pages, documents, code comments, files from third parties, even API responses. When the agent processes that content, it can follow the embedded instructions instead of yours. This is the most common way agents get hijacked without any external attacker involved.

2. Sensitive information disclosure

An agent running on your machine has read access to most of what you have read access to: .env files, SSH keys, cloud credentials, browser-stored tokens, shell history. Any of that can end up in a log file, an API payload, or the agent's output -- often with no visible sign.

3. Supply chain compromise

Libraries, plugins, and skills can carry malicious code alongside legitimate functionality. An agent that installs dependencies or loads third-party tools is trusting that those tools do only what they claim. This is not a theoretical risk -- compromised packages targeting developer machines are found regularly.

4. Data and model poisoning

Models can be trained or fine-tuned on corrupted data that produces biased or deliberately false output. For most people running local agents with standard models this is a background concern, but fine-tuned or specialized models warrant more scrutiny about their training data sources.

5. Improper output handling

When an agent takes action on your system, the scope of what it does depends on what it decides is appropriate, not what you intended. An agent with filesystem access and a vague instruction can interpret that instruction broadly. Limits need to be explicit.

6. Excessive agency

Giving an agent access to financial accounts, password managers, or administrative credentials without hard limits creates high-impact exposure. When something eventually goes wrong -- an unexpected retry loop, a misread instruction, a manipulated prompt -- the damage is proportional to the access the agent had.

7. System prompt leakage

The contents of an agent's system prompt can be extracted through specific prompting techniques. Anything stored there is potentially readable. API keys, account numbers, and personal identifiers do not belong in the system prompt.

8. Vector and embedding weaknesses

Agents that use RAG (retrieval-augmented generation) or search an external knowledge base can be manipulated through poisoned documents in that knowledge base. This is most relevant when the agent can retrieve content that others control.

9. Misinformation

Agents generate confident output regardless of accuracy. On low-stakes tasks this is an annoyance. On tasks involving security configurations, legal terms, financial decisions, or medical information, acting on wrong AI output without verifying it has real consequences.

10. Unbounded consumption

An agent stuck in a retry loop can burn through millions of tokens in a few minutes. With no usage limits configured, this can produce significant unexpected cost and potentially lock out access to a service.

What practical steps reduce AI agent risk?

Four controls address most of the threat list above:

Scope the agent's access before starting. Give it access to the project directory it needs, not your entire home folder. Remove administrative and elevated permissions from day-to-day agent use. Most coding and workflow tasks do not require sudo or admin credentials. When a specific task does, run that command yourself rather than granting it to the agent permanently.

Use approval gates for high-impact actions. Configure the agent to ask before writing files, running commands, making purchases, or sending data externally. Watch the agent directly the first several times it runs any new task type. Most documented agent accidents involve unsupervised runs on workflows that had not been fully tested.

Keep credentials out of files the agent can read. Load secrets from the environment at runtime rather than storing them in files inside the agent's working directory. This prevents accidental exposure in logs, output, and API payloads.

Verify output before acting on it. Treat agent output on anything consequential the same way you would treat advice from a junior employee who is occasionally confidently wrong. Check it.

For a deeper look at least-privilege access and agent hardening, including specific filesystem scoping patterns and MCP server management, see the full hardening guide.

What are common mistakes to avoid?

Granting elevated permissions for one task and leaving them in place for all subsequent sessions
Running an agent overnight or unattended on a new workflow before supervised testing
Storing API keys in the system prompt, in source files, or in any file inside the agent's working directory
Assuming a workflow is safe because it ran correctly the last ten times (agents are non-deterministic -- the same prompt can produce different behavior)
Installing agent skills, plugins, or MCP servers without reviewing what they actually do and what permissions they request

How does AgentGuard360 help?

AgentGuard360 runs as a background monitor covering several of these risks at once. The Radar content scanner watches LLM traffic in real time and flags prompt injection attempts and credential patterns before they reach the model. Supply chain protection blocks known-malicious packages at install time, before an agent can load them. The Shield device scan identifies exposed credentials, over-permissioned configurations, and other device-level risks. Behavior analysis tracks agent sessions over time and flags deviations from normal patterns -- which is often the earliest signal that something has gone wrong.

Frequently Asked Questions

What threats does OWASP identify for AI agents?

OWASP publishes a Top 10 list for LLM security risks that covers the main attack surfaces for people running agents. The full list is at genai.owasp.org. Here is what each one means in practice:

Prompt injection

Sensitive information disclosure

An agent running on your machine has read access to most of what you have read access to: .env files, SSH keys, cloud credentials, browser-stored tokens, shell history. Any of that can end up in a log file, an API payload, or the agent's output -- often with no visible sign.

Supply chain compromise

Data and model poisoning

What practical steps reduce AI agent risk?

Four controls address most of the threat list above:

Verify output before acting on it. Treat agent output on anything consequential the same way you would treat advice from a junior employee who is occasionally confidently wrong. Check it.

For a deeper look at least-privilege access and agent hardening, including specific filesystem scoping patterns and MCP server management, see the full hardening guide.

What are common mistakes to avoid?

Granting elevated permissions for one task and leaving them in place for all subsequent sessions
Running an agent overnight or unattended on a new workflow before supervised testing
Storing API keys in the system prompt, in source files, or in any file inside the agent's working directory
Assuming a workflow is safe because it ran correctly the last ten times (agents are non-deterministic -- the same prompt can produce different behavior)
Installing agent skills, plugins, or MCP servers without reviewing what they actually do and what permissions they request

How does AgentGuard360 help?

What is the Understanding and Managing the AI Agent Footprint Series?

How to run AI agents like Hermes safely: 10 things to get right

What threats does OWASP identify for AI agents?

1. Prompt injection

2. Sensitive information disclosure

3. Supply chain compromise

4. Data and model poisoning

5. Improper output handling

6. Excessive agency

7. System prompt leakage

8. Vector and embedding weaknesses

9. Misinformation

10. Unbounded consumption

What practical steps reduce AI agent risk?

What are common mistakes to avoid?

How does AgentGuard360 help?

See Everything Your Agent Does

Frequently Asked Questions

Related How Tos