Jun 11 2026 9 min

ALMS: Shared Operational Memory for Autonomous Agents via MCP

Open Source Agents MCP Go PostgreSQL Multi-Agent Learning Infrastructure

Multiple autonomous agent nodes connected to a central shared memory hub with glowing data pathways

github.com/ghassan-ai-projects/alms — MIT license, shipping today.

Last month I lost an afternoon to a bug I'd already fixed. An ingestion agent in my fleet hit a malformed payload from a vendor API. It adapted, worked around it, and moved on. Two days later, a different agent hit the same API, the same payload shape, and failed the same way. I had to debug it again. Same root cause, same fix, zero knowledge transfer between the two agents.

That's when I stopped adding features and built ALMS.

ALMS — the Agent Learning Management System — is a self-hosted Model Context Protocol (MCP) server that gives autonomous agents a shared operational memory. Register agents, store what they learn, sync that knowledge across the fleet, and distribute operator protocols by tag. It's a single Go binary backed by PostgreSQL, and it's open source today under MIT.

Verification snapshot, 28 July 2026: the complete short test suite passes with the race detector, uncached and shuffled, across config, models, server, service, and store packages. PostgreSQL integration and load tests require a configured database and were not part of this local snapshot.

What ALMS Is (And Isn't)

ALMS is three things:

An agent registry — agents register and send heartbeats so you know what's running
A learning store — agents publish discoveries, and other agents sync them
A protocol distributor — operators push tagged instructions that agents pull on their own schedule

That's it. ALMS is not an orchestration framework, not a message bus, and not a process supervisor. It doesn't schedule work, manage processes, or route messages. It stores operational knowledge and gets out of the way.

Why not just a database?

You could store agent learnings in a shared Postgres table. What you'd be missing: MCP-native transport (your agents already speak it), gap-safe sync with acknowledgement (no missed records), tag-based protocol distribution (operators can target subsets of the fleet), and the learning lifecycle management (soft-delete, enrichment, garbage collection). ALMS packages these as a purpose-built MCP tool surface so agents integrate in minutes, not days.

It stays out of the hot path. If ALMS is unreachable, your agents keep working and queue learnings locally. When it comes back, they resync. Offline-first isn't a feature — it's a design constraint.

Option	Good at	What ALMS adds
Shared database	Durable rows and queries	Agent contracts, sync acknowledgement, lifecycle, protocols
Vector store	Similarity retrieval	Typed records, status, provenance, exact lifecycle
Framework memory	One runtime's context	Runtime-neutral MCP access across heterogeneous agents
Message bus	Transient event delivery	Searchable durable learning and later resynchronisation

How It Works

An ingestion agent discovers a recurring failure pattern in a vendor API. It stores that as a learning with tags. Later, a completely different agent syncs ALMS and pulls that learning before touching the same API. It avoids the failure path without ever encountering it.

In my fleet, the concrete example: ingest-agent-1 discovered that Vendor X's REST API returns HTTP 200 with an error body when rate-limited. The learning it stored (type: edge_case, tags: api-integration, vendor-x, rate-limiting) was synced by analysis-agent-3 the next morning. That agent was about to query the same endpoint — and skipped the failing code path entirely.

The flow is six calls:

agent.register        — "I exist, here's my type"
agent.heartbeat       — "Still alive"
learning.store        — "Here's something I learned"
learning.sync         — "What does the fleet know?"
learning.sync_ack     — "Got it, processed"
protocol.push/pull    — operator → agents by tag

The sync model is gap-safe: agents pull learnings ordered by creation time, process the batch, then acknowledge the full list of IDs. If the ack fails, the server retains the records for the next sync. Cursor tracking is on the client side — no server-side state to corrupt.

What Ships in 0.1.0

This is the initial public release. It is operator-reported as running on a fleet of eight agents across two machines; that deployment claim is not a published availability benchmark.

Single-binary Go server (Go 1.22+)
PostgreSQL persistence with full migrations
Streamable HTTP MCP transport
Agent registration, heartbeat, listing
Learning store: create, sync, search, soft-delete, enrichment update
Protocol publishing and tag-based retrieval
Background garbage collection for lifecycle management
systemd deployment assets under deploy/
Prompt library and an agent skill definition for integration
Verified helper scripts for sync and publish workflows (scripts/)

The codebase is layered: server → service → store → models. Business logic is testable through store interfaces. pgx native pools, no ORM. Test coverage targets are enforced per layer (90% on models, 80% on service).

Known scoring defect in 0.1.0

The current TTL decay implementation derives elapsed intervals from created_at, then subtracts that cumulative amount from the record's already-decayed score. Repeated sweeps can therefore over-decay old records. A production fix should retain a decay checkpoint or compute from an immutable base; until then, do not schedule repeated decay sweeps without compensating controls. Pinned learnings skip decay.

Quick Start

git clone https://github.com/ghassan-ai-projects/alms
cd alms
docker compose up -d db

export ALMS_PG_DSN="postgres://alms:alms@localhost:5432/alms_db?sslmode=disable"
export ALMS_AUTH_TOKEN="***"

migrate -path internal/store/migrations -database "$ALMS_PG_DSN" up
make build
./bin/alms

Verify:

curl -s -X POST http://127.0.0.1:8001/mcp \
  -H "Content-Type: application/json" \
  -H "X-ALMS-TOKEN: ***" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'

curl -s -X POST http://127.0.0.1:8001/mcp \
  -H "Content-Type: application/json" \
  -H "X-ALMS-TOKEN: ***" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"agent.register","arguments":{"agent_id":"test-agent-1","agent_type":"systemd"}}}'

Integration docs and the default agent skill are in documentation/agent-learning-integration.md and skill/alms-learning-SKILL-v2.1.md. The skill covers startup sync, during-work capture, and post-task publish — the full learning lifecycle.

Why MIT

ALMS is infrastructure. Infrastructure should be open. MIT is simple, understood everywhere, and permissive for both hobby projects and commercial agent stacks. No contributor agreements, no dual-licensing complexity. Fork it, ship it, build on it.

What's Next

0.1.0 is a starting point. The things I want to harden next:

Sync contract semantics — the client-side cursor and server-side ack state overlap more than they should
Enrichment/scoring API alignment — docs and server disagree on naming; needs a cleanup pass
Auth — shared-token is intentionally minimal for v0.1, but fleet-scale needs scoped keys and agent-level identity

If this sounds useful: clone it, run the quick start, register an agent. If you find a bug or have an idea for the sync model, open an issue. If you want to contribute a learning type or a protocol template, PRs are open.

The agents are getting smarter. Let's make sure they're not learning in isolation.

Repository: github.com/ghassan-ai-projects/alms
License: MIT
Release: 0.1.0 — June 9, 2026
Docs: documentation/ in the repo

Download an ALMS learning record

Want to inspect or challenge the shared-learning contract?

Review ALMS