8 min

Every Token Is Rent: Building an Agent That Doesn't Forget

Memory Architecture LLM System Design AI OpenClaw

The Blank Slate Problem

Here's the dirty secret of modern AI agents: they're savants with amnesia.

A top-tier LLM can reason through complex problems, generate elegant code, and hold context over 200K tokens. But close the tab — or your heartbeat timer expires — and it resets to factory default. The entire episode vanishes.

For chatbots answering support tickets? Fine.

For an agent running projects, managing infrastructure, negotiating with other agents, evolving its own behavior? Catastrophic.

We spent weeks building capabilities — diagram generators, email pipelines, research workflows, security scanners. Every new session, the agent had to re-learn everything: who we are, what projects we're running, which files matter, what decisions we made yesterday.

The work got done. But it never accumulated.

Every week you delay building memory architecture, your agent is learning the same lessons for the first time — and forgetting them before the week is out. So we stopped blaming the technology and built our own memory system.


The Three-Tier Architecture That Fixed It

After reading every paper on agent memory systems and failing at two naive approaches (flat files, then JSON blobs), we settled on a three-tier architecture inspired by how human memory actually works.

L1 — Working Memory (The Context Window)

This is what the LLM sees right now. The current conversation. The messages. The files loaded into the prompt.

Rule: Every token is rent on the agent's attention. Only the most relevant context survives.

In practice: Each session loads a curated context — current objective, relevant knowledge nodes, recent decisions. Everything else waits in the wings. We stopped trying to cram everything in and started being ruthless about what deserved space.

L2 — RAM/Working State (The Whiteboard)

This is the scratchpad — where the agent writes down what it's doing right now. A JSON file tracking:

  • Current objective and step
  • Session metrics (tokens used, cost)
  • Files currently being worked on
  • Staging files waiting for review

Think of it as a whiteboard: messy, current, constantly updated.

Rule: If it matters, write it down immediately. Mental notes don't survive session restarts.

In practice: After every major action, the agent calls an L2 state manager that updates this JSON. If the session crashes — which happens — the next session picks up exactly where we left off. No duplicate work. No lost context.

L3 — Knowledge Graph (The Library)

This is the curated archive. What survives from L2 after we've decided it matters. Published books, not whiteboard scribbles.

Structure: Individual markdown files organized by domain:

  • Project registries (what we're building)
  • Design decisions (why we chose this way)
  • Lessons learned (what broke and how we fixed it)
  • Methodologies (reusable protocols, captured as process)
  • Research notes
  • Cost ledgers

Rule: Idempotent writes. The knowledge builder script can run any number of times — it creates or overwrites nodes, never duplicates.

A Concrete Example

Last week, I asked my agent to analyze 30 emails from a negotiation experiment and summarize the counterparty's behavioral patterns.

Without the memory system, it would have read all 30 emails in one session, produced a summary, and forgotten what it learned the moment the session ended.

With the memory system: it stored the raw patterns in L2, extracted generalizable insights into L3 (under lessons-learned.md with a [[negotiation-tactics]] wikilink), and referenced those insights in the next session when I asked about a new email. The agent got smarter about negotiation over time — not because the model improved, but because it remembered.


What Most People Get Wrong

The obvious approach is to save everything. Log every message, store every state, dump it all into a vector database and let similarity search figure it out.

That's a storage problem masquerading as a memory solution.

Real memory is selective forgetting. The human brain doesn't remember everything — it retains what's important, compresses patterns into schemas, and lets the rest decay.

We needed the same mechanism.


The Retention Formula

I built a scoring system that answers one question: does this file deserve to exist tomorrow?

Retention Score = 0.30 × Recency + 0.30 × Importance + 0.30 × Task_Activity − 0.10 × Size_Ratio

Where:

  • Recency — last accessed within how many hours
  • Importance — how many cross-references point to it
  • Task Activity — is the project currently active
  • Size Ratio — penalty for bloated files

In practice: a weekly project status report nobody has touched in 14 days and has no cross-references scores roughly 0.10 — staged for 48 hours, then archived. A design decision document referenced every session with three wikilinks? Score of 0.80+. It stays.

The weights sum to 0.80. The remaining 0.20 is a baseline retention floor — ensuring nothing gets evicted prematurely just because it's new or lightly referenced.

Files below threshold move to a 48-hour staging area. If nobody misses them, they're archived. Untouched for 90 days? They enter a review queue for deletion.

This runs every 6 hours. Cron job. Automated.


Making It Work — Automation and Governance

Memory architecture is useless without automation. You can't tell an amnesiac agent to "remember better" and expect results.

Three scripts form the backbone:

  1. L2 State Manager — updates working state after every task. Called with the current objective, step number, token usage, and cost. Writes to a JSON file that survives session restarts.
  1. L3 Knowledge Builder — reads daily notes and extracts entities, decisions, and lessons learned. Writes them as structured knowledge nodes with [[wikilinks]] for cross-referencing. Idempotent — safe to run repeatedly, never duplicates.
  1. Garbage Collector — the eviction engine. Runs the retention formula, stages borderline files for 48 hours, archives the dead. Logs everything it does.

The result? An agent that wakes up, checks its state file, and says: "I was working on X. Here's what we learned about Y. Let's continue."

But automation alone isn't enough. Here's the part that doesn't make it into the blog posts:

Memory maintenance is emotional. You will have opinions about what should stay — and nostalgia is not a retention strategy.

That improvement log that proves you fixed the same bug three times? Keep it. That experimental architecture you abandoned two weeks ago? Archive it.

I added a recurring maintenance checklist (HEARTBEAT.md) that the agent reads every few cycles:

  • Review recent daily notes
  • Extract new lessons into L3 knowledge nodes
  • Run the garbage collector
  • Check: is L2 state still accurate?

Memory isn't a file. It's a practice.


Why This Matters

Every team building AI agents right now is building capability. Smarter reasoning. Better tool use. More complex chain-of-thought.

Almost none are building continuity.

You can have the most capable agent in the world. If it starts every conversation from zero, it's a party trick, not a collaborator.

The teams that solve memory — that figure out how to make their agents accumulate wisdom — will leave everyone else in the dust. Because capability compounds. Every session builds on the last. The agent gets smarter not because the model improved, but because it remembers what worked.

Savants with amnesia don't lead revolutions. But agents that remember? They're the ones to watch.


Let's Talk

We're three weeks into this architecture and it's already transformed what our agent can do. The issues that used to take three sessions now resolve in one. The designs that used to get lost are evolving across weeks.

I do this for a living. I'm running a small experiment on agent memory architectures and looking for teams to share findings with. If your vector database is 400GB and your agent still can't remember your name — drop a comment, send a DM. I'd love to compare notes.


Building infrastructure for autonomous agents. Specializing in agent memory and multi-agent protocols.

Working on a similar problem? Let's talk about how I can help your team.

Get in Touch