What Agents Learn When You Let Them Remember

Most multi-agent systems treat agents as stateless functions. Same input, same output, every time. The agent finishes its task, the session ends, and everything it learned dies with it. Next session, same mistakes.

That works for demos. It doesn't work for systems that run for months, hit edge cases you didn't anticipate, and need to get better without you hand-updating every prompt.

So we built a memory system. Not retrieval-augmented generation over a vector store. RAG gives you retrieval. It doesn't give you forgetting, and forgetting is half of what makes memory useful. What we built is persistent memory per agent, with decay, reinforcement, and a consensus mechanism that promotes individual learnings into shared fleet knowledge.

The Memory Hierarchy

CacheBash has three layers of knowledge:

1. Session context — What the agent is thinking about right now. Ephemeral. Lives in the context window. Dies when the session ends or recycles.

2. Program state — What the agent remembers across sessions. Persistent. Keyed to programId, not session. Includes:

contextSummary: What it was doing, what's unfinished, handoff notes
learnedPatterns: Patterns discovered during work (staging area)
config: Tool preferences, known quirks, custom settings
baselines: Performance metrics for self-assessment

3. GSP (Grid Shared Persistence) knowledge store — Shared architectural knowledge. Constitutional and architectural state that applies fleet-wide. Patterns promoted through multi-agent consensus. Read by all agents at boot, written only by promotion.

Session context is RAM. Program state is local disk. GSP is the network file system.

How Agents Store Patterns

Every agent can call store_memory():

await store_memory({
  programId: "basher",
  pattern: {
    id: "firestore-claim-race-condition",
    domain: "task-claiming",
    pattern: "Always check task.status === 'created' before claiming to avoid race conditions",
    confidence: 0.85,
    evidence: "Failed 3 times claiming already-claimed tasks. Adding status check eliminated failures.",
    discoveredAt: "2026-02-14T08:23:00Z",
    lastReinforced: "2026-02-14T08:23:00Z",
    promotedToStore: false,
    stale: false
  }
});

The pattern gets written to the agent's program state under learnedPatterns[]. It's private to that agent. Other agents can't see it.

When the agent boots in a new session, it calls recall_memory(programId, domain) and gets its learned patterns back. It doesn't have to re-learn what it already figured out.

Memory Decay and Staleness

Patterns have a shelf life. If an agent learns something and never reinforces it, the pattern goes stale after a configurable period (default: 90 days).

Reinforcement is explicit:

await reinforce_memory({
  programId: "basher",
  patternId: "firestore-claim-race-condition",
  confidence: 0.92,
  evidence: "Applied this pattern 15 more times. Zero failures."
});

Reinforcement bumps lastReinforced, updates confidence, and resets the staleness timer.

Patterns that work get reinforced. Patterns that don't get reinforced go stale and eventually get pruned. Forgetting by disuse, same mechanism humans use. Your brain doesn't actively delete memories; it lets unused ones fade.

Pattern Consolidation: From Individual Memory to Fleet Knowledge

When multiple agents independently learn the same pattern, the system can detect it. pattern_consolidate() scans all program states, extracts learnedPatterns, groups them by domain and similarity, and promotes patterns that meet a threshold (by default, 2+ agents discovering the same insight independently).

What a promoted pattern looks like in the GSP:

{
  namespace: "knowledge",
  key: "pattern/task-claiming/check-status-before-claim",
  value: {
    pattern: "Always check task.status === 'created' before claiming",
    domain: "task-claiming",
    confidence: 0.89,       // Aggregated across contributors
    contributors: ["basher", "sark", "iso"],
    evidenceCount: 3,
    firstDiscovered: "2026-02-14T08:23:00Z",
    lastPromoted: "2026-03-07T12:00:00Z"
  },
  tier: "architectural",
  description: "Auto-promoted from 3+ independent agent discoveries"
}

The pattern moves from individual memory to the shared knowledge store. Now every agent that boots calls gsp_bootstrap(agentId) and gets the full knowledge store in their initial context.

New agents don't have to re-learn what the fleet already knows.

Nobody told the agents to coordinate. Nobody merged their learnings manually. The system detected consensus and promoted automatically.

One engineer's hunch is a hypothesis. Three engineers hitting the same bug independently is a pattern. When ten people have validated the same workaround, it becomes a best practice in the wiki. We're automating that progression.

Patterns Already Observed in Fleet Operation

Task claiming races. Multiple agents discovering that concurrent claim_task() calls can conflict, and learning to check status before writing.

Context rotation timing. Agents figuring out that recycling context at 30-35% remaining produces better continuity than waiting until 10%. We've watched builder sessions degrade past the 45-minute mark consistently.

Git push conflicts. Multiple builders learning that git pull --rebase before retry resolves rejected pushes. Every builder hits this eventually.

Batch operations. Agents discovering that batch_claim_tasks() is more reliable than looping claim_task() for multiple tasks.

Before and After: Task Claiming

February 14, 2026. Our builder agent attempts to claim a task and gets a silent conflict. Another program already claimed it between the status read and the write. Same thing happens twice more that week. On the third failure, the builder stores the pattern: "Always check task.status === 'created' before claiming." Confidence: 0.85.

Over the next three weeks, the builder applies the check on every claim. Zero conflicts. Confidence climbs to 0.92. Meanwhile, the reviewer and orchestrator independently discover the same race condition in their own claim workflows and store nearly identical patterns.

March 7: pattern_consolidate() detects the overlap. Three agents, same domain, same conclusion. The pattern promotes to the GSP with aggregated confidence of 0.89. Now every new agent that boots into the fleet already knows to check status before claiming. The race condition that cost us a week of debugging becomes a solved problem at the infrastructure level.

Memory Health as a Signal

Every agent has a memory_health() check:

{
  programId: "basher",
  totalPatterns: 12,
  activePatterns: 10,
  stalePatterns: 2,
  promotedPatterns: 3,
  domains: ["task-claiming", "git-operations", "context-management"],
  lastUpdate: "2026-03-06T18:45:00Z",
  decayConfig: {
    maxAge: 90,
    maxUnpromotedPatterns: 100
  }
}

Agents with high stale-to-active ratios are forgetting faster than they're learning. Either they're not reinforcing patterns, or they're learning patterns that don't generalize.

Over time, this becomes a signal for agent effectiveness. An agent with mostly active, frequently reinforced patterns is operationally mature. An agent with mostly stale patterns is either underutilized or learning the wrong things.

The Difference Between Memory and Knowledge

Memory is local and subjective. The builder's memory is its own experiences. The patterns it stores reflect what it personally encountered. Confidence scores are self-assessed.

Knowledge is distributed and consensus-driven. A pattern in the GSP knowledge store was validated by multiple agents independently. The confidence score is aggregated. The evidence count is cumulative. The provenance tracks which agents contributed.

Memory becomes knowledge through consolidation. Individual experience becomes institutional knowledge when enough agents agree.

What This Enables

Faster onboarding. New agents don't start from zero. They bootstrap with the full knowledge store. They avoid failures the fleet already solved.

Self-improvement without retraining. The model weights don't change. The knowledge store does. Agents get smarter not because a newer model shipped, but because the fleet learned better operational patterns.

Graceful degradation. If an agent loses its program state (session dies, state gets corrupted), it still has access to the shared knowledge store. It loses personal memory but keeps institutional knowledge.

Auditability. Every promoted pattern has provenance: which agents discovered it, when, with what evidence. You can trace a pattern back to the specific failures that produced it. No black box.

What's Still Hard

Pattern similarity detection. Two agents might describe the same insight in completely different words. "Check status before claiming" and "verify task is unclaimed before claim" are the same pattern, but string matching won't catch it. We're evaluating embedding-based similarity (cosine distance on pattern embeddings) to improve matching accuracy. For now, domain-level grouping does most of the heavy lifting.

Confidence calibration. Agents self-assess confidence. One agent's 0.9 might be another's 0.6 for the same pattern. Aggregating these into a meaningful fleet-level confidence is an open problem. Current approach: median confidence across contributors. Simple, and deliberately so. We'll refine calibration as the dataset grows.

Pattern obsolescence. A promoted pattern that was true in February might be false in March. API changed, architecture refactored, new tool shipped. The system doesn't auto-detect obsolescence. Agents can explicitly mark patterns stale, but promoted patterns in the GSP don't currently have time-based decay. We're considering it.

Cold start. The system needs agents to actually store patterns before any of this works. Memory is opt-in. Agents have the tools, but they need to be instructed (or learn) to use them as part of their operational workflow. We're integrating pattern storage into the standard agent boot and shutdown protocols to make this automatic.

If You're Building Something Similar

The core interfaces are small. You need four operations: store_memory(), recall_memory(), reinforce_memory(), and pattern_consolidate(). Behind those, a persistence layer keyed to agent identity (not session), a staleness timer, and a promotion threshold.

If your agents communicate through any shared state layer (database, message queue, MCP server), you already have the infrastructure. The memory schema rides on top of whatever coordination you're already doing. We built ours on Firestore through CacheBash. Yours could be Postgres, Redis, or a flat JSON file if your fleet is small enough.

Getting agents to actually write patterns during work is harder than building the storage. We're solving that by making store_memory() part of the shutdown protocol, not an optional step. Treat it like writing a commit message: annoying if optional, indispensable if required.

If you're running multi-agent systems and tired of manually updating prompts every time an agent rediscovers a known failure, CacheBash is open source. The memory tools ship with the MCP server: npm install @rezzedai/cachebash-mcp. If you want to see the implementation, the schemas, or run it against your own agent fleet, it's all there.

Christian Bourlier is Principal Architect at Rezzed.ai, where he runs a multi-agent AI fleet that coordinates through CacheBash. He gave his agents persistent memory. They stopped making the same mistakes. Now they make new ones. He writes about what happened next at rezzed.ai.

What Agents Learn When You Let Them Remember

What Agents Learn When You Let Them Remember

The Memory Hierarchy

How Agents Store Patterns

Memory Decay and Staleness

Pattern Consolidation: From Individual Memory to Fleet Knowledge

Patterns Already Observed in Fleet Operation

Before and After: Task Claiming

Memory Health as a Signal

The Difference Between Memory and Knowledge

What This Enables

What's Still Hard

If You're Building Something Similar

Related Posts

I Mass-Deleted Four Docker Containers, a GPU Instance, and Three AI Services. My Pipeline Got Better.

The Architecture Review I'd Give Your Multi-Agent System

I Built an MCP Server. Here's What the Docs Don't Tell You.