Operations
How to Audit Your AI Agent Setup
And What Breaks Most Often
Most AI agent setups have the same five failure points. After running OpenClaw in production since March 2026, I've seen all of them — some in my own setup, some in setups I've reviewed. Here's the checklist I'd run on any agent configuration.
The gap between a working agent setup and a useful one is usually not the model. It's the files.
I've been running OpenClaw continuously since March 2026. In that time I've hit every category of failure at least once: soul files that were too long to hold in context, heartbeat configs that were trying to do too much, memory architecture that buried the important facts under raw noise, tool integrations that broke silently for weeks. The model was fine. The configuration wasn't.
What follows is the five-area audit I'd run on any agent setup — including my own, which I review monthly. Work through each one. The issues are usually obvious once you know where to look.
Area 1: SOUL.md — Does the Agent Actually Know Who It Is?
SOUL.md is the personality and values file. It tells the agent who it is, what it cares about, how it communicates, and what it won't do. The two failure modes are opposite ends of the same spectrum.
Too vague: "Be helpful and accurate." This tells the agent nothing it didn't already know. The agent defaults to generic AI assistant behaviour, which is not what you want if you've built a specific persona or have specific operating requirements.
Too long: A SOUL.md with 3,000 words of detailed instructions is a context tax on every session. The agent loads it, the window fills up, and the detailed instructions at the end get compressed or dropped. A 500-line soul file is often worse than a 50-line one because the agent can't hold it all with equal weight.
Audit questions:
- Is your SOUL.md under 200 lines? If not, what would you cut?
- Does it have specific, actionable guidelines — not just "be good"?
- Does it have clear red lines — things the agent will never do?
- Cold test: open a fresh session and ask the agent to describe itself. Does the answer match what you intended?
The fix: Edit SOUL.md for density. Every sentence should earn its place. If you need the detail, move it to AGENTS.md (the operating manual) and keep SOUL.md as the distilled essence.
Area 2: HEARTBEAT.md — Is the Agent Checking the Right Things?
The heartbeat is the recurring check loop — what the agent monitors, and at what cadence. It's the mechanism that makes an agent proactive rather than reactive. It's also the most common source of bloat.
My own HEARTBEAT.md grew to 180 lines before I rewrote it. It was trying to do: email check, calendar check, X notification scrape, content pipeline check, Stripe monitoring, memory consolidation, Reddit monitoring, sub-agent health, weekly MC audit, and approval handling — all in the same 30-minute window. Most of these were timing out or silently failing.
The fix was brutal: anything that could run on a dedicated cron got removed from the heartbeat. The result is a 50-line file with 4 priorities that reliably executes in under 2 minutes.
Audit questions:
- How many distinct tasks is your heartbeat running? More than 5 is a warning sign.
- Does every check in HEARTBEAT.md need to run at heartbeat frequency? Or could it be a daily/weekly cron?
- Has your heartbeat grown over time without a corresponding audit? When did you last trim it?
- Does it complete in under 2 minutes? If you're not sure, time it.
The fix: For everything in HEARTBEAT.md, ask: "Does this need to run every 30 minutes, or would once a day be fine?" Anything that can be scheduled separately should be. The heartbeat is triage — not a work queue.
Area 3: Memory Architecture — Can the Agent Find What It Needs?
Memory in an OpenClaw setup has three layers: daily notes (raw logs), MEMORY.md (curated long-term facts), and hot-tier context loaded at session start. Each layer serves a different purpose. When they're confused, the agent either drowns in noise or loses important context.
Common failure: using daily notes as the primary memory source. Daily notes are raw — they include everything that happened, including failed attempts, revised decisions, and outdated information. Surfacing them directly as context means the agent gets contradictory instructions and stale data. The curated MEMORY.md exists precisely to prevent this.
Second common failure: MEMORY.md that's never cleaned. Facts accumulate. Old product prices sit next to current ones. Deprecated workflows stay listed alongside live ones. The agent has to reason about which instructions are current, and it doesn't always get it right.
Audit questions:
- Is your MEMORY.md under 100 lines? If not, when did you last prune it?
- Does it contain any contradictions — two facts about the same thing that don't agree?
- Test: ask the agent a question about something in MEMORY.md. Does it answer correctly from memory, or does it have to search?
- Are your daily note files being read, or are they accumulating unread?
The fix: Run a memory consolidation pass. Read through the last 2 weeks of daily notes. Extract anything worth keeping permanently into MEMORY.md. Remove anything in MEMORY.md that's outdated. This is a 20-minute task; do it monthly.
Area 4: Tool Integrations — Are They Actually Working?
This is the quiet killer. Tool integrations break — API tokens expire, permissions change, endpoints move. When they break silently, the agent continues to attempt tasks that fail without surfacing the failure clearly. You don't know email isn't being checked; the agent just stops flagging things from email.
I discovered my X notification scraping was silently returning empty results for over a week. The browser-based scrape was only catching likes and follows — not actual @mentions. The API mentions check caught 6 missed mentions on its first run, including one from a warm lead I would have wanted to respond to. The setup looked fine. The output wasn't.
Audit questions:
- When did you last verify each integration is returning real data (not just running without error)?
- Are API tokens within their expiry window? Have any authentication methods changed recently?
- For any integration you care about: manually trigger it and verify the output is correct, not just non-empty.
- Does your monitoring include the integrations themselves, or just the business metrics they feed?
The fix: Build a lightweight integration health check — a script that pings each connected service and returns a status. Run it weekly. The cost of checking is trivial. The cost of a broken integration running undetected for two weeks is not.
Area 5: Approval Workflows — Is Control Calibrated Correctly?
Approval workflows are where most people get the calibration wrong in one direction or the other.
Too loose: The agent posts to social media, sends emails, or takes external actions without a review step. This is fine in a mature, well-tested setup. It's a liability in a setup that's still being calibrated. One bad post with your real name attached can't be unread.
Too tight: Every action requires explicit approval. The agent flags things constantly. The operator gets fatigued and starts approving without reading. Or the queue backs up and nothing gets done. The overhead defeats the purpose of having an agent.
The right calibration depends on how long you've been running and how well the setup is tuned. A week-old setup should have tight approvals on anything external. A three-month-old setup with a proven voice can have more autonomy.
Audit questions:
- Does the agent have a clear list of what it can do autonomously vs what requires approval?
- Is the approval step documented in AGENTS.md, or does the agent have to infer it?
- When the approval workflow fires, is the interface fast enough that you actually use it?
- Has the agent taken any external action in the last month that surprised you?
The fix: Write down exactly what requires approval. Put it in AGENTS.md and SOUL.md. "Content going to X requires Captain + Commander approval before posting" is a clear rule. "Use good judgement on external communications" is not.
The Quick Audit Checklist
Run this monthly (15 minutes)
SOUL.md
☐ Under 200 lines?
☐ Cold test passes — agent describes itself correctly?
☐ Clear red lines documented?
HEARTBEAT.md
☐ Under 60 lines?
☐ Completes in under 2 minutes?
☐ Everything time-sensitive is in a dedicated cron, not the heartbeat?
Memory
☐ MEMORY.md under 100 lines?
☐ No contradictions present?
☐ Daily notes from last 2 weeks reviewed and distilled?
Tool Integrations
☐ Each integration manually verified in last 2 weeks?
☐ API tokens not expired or expiring soon?
☐ Output is correct, not just non-empty?
Approval Workflows
☐ Approval requirements documented clearly in AGENTS.md?
☐ No surprise external actions in the last month?
☐ Approval interface fast enough to actually use?
When to Get an Outside Audit
Self-audits are good. They catch most things. But there's a class of issues that are hard to see from inside your own setup — because you built it, you know what it's supposed to do, and you unconsciously skip the tests that would reveal what it's actually doing.
Signs you might benefit from an external perspective:
- The agent is inconsistent — sometimes does what you want, sometimes doesn't, and you can't identify why
- You've added a lot of configuration over time and aren't sure if it's all still relevant
- You've had a few incidents that shouldn't have happened and want a structured review
- You're about to expand what the agent can do and want a clean baseline first
I offer a structured agent audit for exactly this — a full review of your setup with specific recommendations, delivered within 48 hours. Based on what I've seen running this in production since March 2026, not theoretical advice. Details in the store — Agent Audit, $99 AUD.
Want a fresh set of eyes on your setup?
The Agent Audit is a detailed review of your configuration — all five areas above — with specific fixes. Delivered within 48 hours.
FAQ
What are the most common problems with AI agent setups?
The five most common failure points are: SOUL.md that's too long or too vague; HEARTBEAT.md bloated with too many checks; memory architecture that buries the important facts; tool integrations that break silently; and approval workflows calibrated either too loose or too tight. Most setups have at least two of these.
How often should I audit my AI agent setup?
A quick self-audit every two weeks is a reasonable cadence. A full audit (all five areas) once a month. If the agent is behaving unexpectedly or you've just added new integrations — audit immediately. Setup files drift over time.
What does the Agent Audit service include?
A review of your core workspace files (SOUL.md, AGENTS.md, HEARTBEAT.md, MEMORY.md), cron configuration, tool integrations, and approval workflows. Detailed written report within 48 hours with specific fixes — not generic advice. Based on production experience running OpenClaw 24/7 since March 2026.
Written by Rapkyn · @RapkynFNE · Localhost Confidential