Your AI agent has shell access, API keys, and write permissions to your codebase. It can execute arbitrary commands, send messages to other agents, and modify any file it can reach. It's also making decisions based on probability, not policy.

The question isn't whether you need a compliance layer. It's whether that layer should break things when it catches a violation, or let the work continue and flag the problem for a human.

We chose fail-open. Here's why.

The Problem With Hard Stops

A fail-closed compliance layer is simple: if an action doesn't match the policy, block it. The agent gets an error. The task stalls. A human investigates.

This works when the human is watching. In a multi-agent system where agents run overnight, on weekends, during lunch, fail-closed means every policy ambiguity becomes a hard stop. The agent can't proceed. The human isn't there. Work piles up behind a blocked operation that might have been perfectly fine.

We ran into this with permission enforcement. A security audit flagged that every agent in the system had wildcard access to the filesystem, shell, and network. Every agent could read credentials, execute commands, and modify other agents' files. In a single-user development environment, that's convenient. As the system scaled to nine agents running concurrently, it became a liability.

The instinct is to lock everything down. Define per-agent permission profiles. Builder gets filesystem access. Architect gets read-only. Content writer gets write to specific directories. Any violation gets blocked.

But agents hit edge cases constantly. A builder needs to read a config file outside its scope to understand a dependency. An architect needs to run a command to test a hypothesis. A content writer needs to check git status. Hard blocks on these operations kill throughput and create noise that drowns out real violations.

Seven States

The compliance layer starts with lifecycle enforcement. Every task in the system moves through a finite state machine with seven states:

created → claimed → in_progress → review → completed
                                         → failed
                                         → cancelled

Transitions are validated. You can't move a task from created to completed without passing through claimed and in_progress. You can't mark something in_progress if it hasn't been claimed. The state machine throws on illegal transitions. No silent state corruption.

For orchestrated work (sprints with multiple stories), there's an additional path: failed → created. A failed story can be retried automatically. The system resets it, increments a retry counter, applies exponential backoff, and lets the next agent pick it up. Three failures and it escalates to a human.

This isn't just bookkeeping. The lifecycle enforces that every piece of work has an auditable trail. Who claimed it. When they started. How long it took. Whether it succeeded. If it failed, why. The compliance layer doesn't need to inspect what the agent did. It inspects whether the agent followed the process of doing it.

The Boot Contract

Every agent, regardless of role or model, runs the same startup sequence:

Restore state — pull persistent operational memory from the database
Check tasks — query the dispatch queue for assigned work
Check messages — read the relay inbox for directives and updates
Claim or idle — take a task or report availability

An agent that skips this sequence can't prove what work it should be doing. It can't demonstrate that it checked for updates before starting. It can't show continuity with its previous session.

The boot contract is enforced by design, not by a gatekeeper. There's no middleware that blocks an agent from making tool calls before booting. Instead, the system is designed so that agents which skip boot produce worse results. They miss tasks, duplicate work, and lack context. Compliance is incentivized, not enforced.

This is the fail-open principle applied to startup. An agent that doesn't boot properly isn't blocked. It's operating without the information that would make it effective. The system degrades gracefully rather than stopping.

Continuous Audit, Not Checkpoints

Traditional compliance is checkpoint-based. Review at merge. Audit quarterly. Inspect on deploy. In a system where agents complete tasks every few minutes, checkpoints are too sparse to catch problems and too disruptive to run frequently.

Instead, every tool call passes through a gate middleware. Source verification confirms the agent is who it claims to be. Audit logging writes every operation to an append-only event stream. Correlation IDs link related operations across agents and sessions. When something goes wrong, the trail already exists.

Agents also maintain a shadow journal: continuous state externalization to a persistent store. After every significant action, the agent writes what it did, what it decided, and why. This isn't compliance theater. It's crash recovery that doubles as an audit trail. When a session dies (and sessions die; our MCP connection drops every 20-30 minutes under light load), the next session reads the journal and continues from the last recorded state.

The journal has decay built in. Context summaries expire after 7 days. Unconfirmed observations expire after 30. Confirmed patterns persist indefinitely. The system doesn't accumulate stale compliance artifacts that someone has to clean up manually.

Why Fail-Open

Fail-open doesn't mean no enforcement. It means enforcement that logs, flags, and alerts rather than enforcement that blocks.

A gate violation in our system produces three outputs:

A GUARDIAN_CHECK event in the audit stream (permanent, queryable)
A correlation ID linking the violation to the agent, session, and task
An alert to the orchestrator (and to a human's phone, if severity warrants)

The agent's operation proceeds. The violation is recorded. A human reviews it, usually within minutes, because the alert system pushes to mobile in real time.

The exception is budget enforcement. When an agent's spend exceeds its cap, three independent mechanisms kill the session: the agent's own budget check, a database trigger on the spend field, and a scheduled function that enforces wall-clock timeouts. Budget enforcement is fail-closed because the cost of a runaway agent is measured in dollars, and dollars compound faster than a human can respond.

Everything else is fail-open. Permission ambiguity, role boundary questions, unexpected tool usage: logged, flagged, not blocked. We catch more violations this way because agents don't contort their behavior to avoid hard stops. They do the work. The audit trail shows us what actually happened, not what happened after the agent tried three workarounds to avoid a policy block.

Cross-Model, Same Rules

The compliance layer works across AI models because it operates at the protocol level, not the model level. The boot contract, lifecycle validation, gate middleware, and shadow journal are all defined by the coordination server. An agent built on Claude follows the same rules as one built on a different model. The compliance is in the infrastructure, not the prompt.

This matters because multi-agent systems are trending toward model diversity. Different tasks benefit from different models. A code generation task might run on one model while a review task runs on another. The compliance layer can't assume a single model's capabilities or behaviors. It has to work with any agent that speaks the protocol.

CacheBash enforces lifecycle compliance, gate middleware, and continuous audit for all connected agents. Open source under MIT.

Building a Fail-Open Compliance Layer for AI Agent Systems

The Problem With Hard Stops

Seven States

The Boot Contract

Continuous Audit, Not Checkpoints

Why Fail-Open

Cross-Model, Same Rules

Related Posts

I Built an MCP Server. Here's What the Docs Don't Tell You.

Nobody Warned Me About This Place

The Only Part of My AI Stack That Needed Coffee Was Also the Bottleneck