Skip to main content

Per-agent memory across channels — handover (2026-05-06)

Amended 2026-05-08 — partial supersession. Phases 2 and 3 below (seeding shared_kb_db via a Kubernetes job and a workflow trigger) are deferred. The shared knowledge base now uses a wiki-indexed pattern (docs/specs/INDEX.md + .context/decisions/INDEX.md, auto-generated from frontmatter) instead of vector-RAG over shared_kb_db. Phase 1 still applies (PR-1 shipped 2026-05-07 — shared_kb_db schema is live). Phase 4 (per-agent memory in agent_memory_db) still applies unchanged. See shared-kb-bifurcation-decision-2026-05-08.md for the rationale, the deferred-seeding option preserved for re-evaluation, and the triggers that would reactivate it.

Handover for the agent (or human + agent pair) picking up the per-agent memory work after this conversation. The entire design conversation is captured here — the Phase 0 decisions are locked in. The work is to (a) write a formal SPEC, (b) ensure prerequisites are met, (c) build the tables/tools/prompts.

TL;DR

Today atlas has no persistent memory. Every new conversation starts cold. Even the shared knowledge base (shared_kb_db — canonical terms, approved specs, ADRs, system topology) is fully designed but not deployed — its database has zero tables. This blocks scaling conversations across channels: a thread in Drive doesn't carry forward into a thread in Slack, and atlas re-derives the same conclusions every time.

The mission is to ship persistent memory in two layers:

  1. Shared memory (the shared_kb_db from SPEC-005) — canonical terms, indexed approved specs/ADRs/topology. Read by all agents. Already designed; just needs to ship.
  2. Per-agent memory (new, this handover) — what atlas remembers across conversations. Per-agent for v1; per-agent-per-user when personal assistants come online.

The two layers integrate via canonical terms: every memory write goes through canonical_terms_lookup so memories are coherent across the shared/private boundary.

Phase 0 decisions — locked in (do not re-litigate)

These were debated and decided in the 2026-05-06 design conversation. The new SPEC must reflect them as accepted.

QuestionDecisionRationale
Memory scopePer-agent for v1; per-agent-per-user later (when building personal assistants)Simpler v1; schema admits user_id so adding it later is a NOT NULL flip
Cross-channel join keyagent_id from Astra (NOT email)Email is outbound platform identity (atlas-agent@aucert.dev); rotation breaks memory. agent_id is opaque, internal, stable. Astra already owns it.
Write triggersAggressive — recall before reasoning on any new topic, evaluate write at end of substantive turn, forced write on "remember that…"Agent should bias toward remembering rather than selectively curating
Dedup-on-writeRequired. Before INSERT, recall on the same topic; if strong-similarity (>0.92 cosine) hit exists, UPDATE that memory's content/updated_at/access_count instead of INSERTWithout this, "aggressive remembering" turns into memory bloat within a week — top-k retrieval slots fill with near-duplicates
ForgettingNever (for v1). Memories soft-deleted only on explicit user requestRevisit when storage becomes a real concern; not now
Recency rankingVector similarity multiplied by recency decay exp(-Δdays / 90), then sortMemories don't disappear, they just compete worse for retrieval slots over time
Retrieval blendingBlend at retrieval (single ranked list combining shared kb + private memory), tag at presentation (each retrieved fact carries source: private vs canonical)Agent reasons over a unified list; user/auditor sees attribution
Vocabulary normalizationMandatory canonical_terms_lookup at write AND recall time. Store both topic_raw (verbatim) and topic (canonical-resolved). Auto-propose new terms via canonical_terms_proposePrevents drift between memory stores; creates a virtuous loop where memory drives canonical vocabulary growth
Privacy boundaryPer-agent memory in a separate database (agent_memory_db) from shared_kb_dbDifferent access patterns, different sensitivity (agent memory will eventually carry user PII). Future-proofs for per-user role scoping.

Schema sketch

New database agent_memory_db. One table to start:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID NOT NULL, -- references astra_db.agents(id) by value, not FK (cross-DB)
user_id UUID NULL, -- NOT NULL once personal-assistant mode lights up
topic VARCHAR(256) NOT NULL, -- canonical-term-resolved, used for retrieval matching
topic_raw VARCHAR(256) NOT NULL, -- original phrasing as encountered, preserved for citation fidelity
is_canonical BOOLEAN NOT NULL, -- true if topic was found in canonical_terms; false if fallback
content TEXT NOT NULL, -- the memory body — never normalize this, store verbatim
source JSONB NOT NULL, -- { platform, channel_id, thread_id, message_id, observed_at }
embedding vector(1536) NOT NULL, -- embed `topic || content` so canonical form drives retrieval
access_count INTEGER NOT NULL DEFAULT 0, -- popularity signal — incremented on each recall hit
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
deleted_at TIMESTAMPTZ NULL -- soft delete; never hard-delete
);

CREATE INDEX ON agent_memories USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON agent_memories (agent_id, updated_at DESC);
CREATE INDEX ON agent_memories (agent_id, deleted_at) WHERE deleted_at IS NULL;

Tool contracts

Three new tools, registered in the shared tool registry (so all agents inherit them — atlas first, then any future agent):

1. agent_memory_recall(query: string, k: int = 10) → MemoryHit[]

  • Internally: canonical_terms_lookup(query)query_canonical
  • Vector search on agent_memories.embedding for query_canonical || query (or just query_canonical if the lookup found a match)
  • Filter: agent_id = current_agent_id AND deleted_at IS NULL
  • Re-rank: cosine_similarity * exp(-Δdays / 90) where Δdays is now() - updated_at in days
  • Return top k as [{memory_id, topic, topic_raw, content, source, similarity, recency_score, source_tag: "private"}]
  • Also: increment access_count on each returned hit (best-effort UPDATE in the same transaction)
  • ALSO blend with shared_kb_search results on the same query, tagged source_tag: "canonical". Agent's prompt teaches it to reason over the unified list and attribute by tag.

2. agent_memory_remember(topic_raw: string, content: string, source: object) → {memory_id, was_update: bool}

  • canonical_terms_lookup(topic_raw)topic_canonical, found: bool
    • If found: topic = topic_canonical, is_canonical = true
    • If not found: topic = topic_raw, is_canonical = false, AND call canonical_terms_propose(topic_raw, ...) to queue the new term for human review
  • Compute embedding via EmbeddingClient.embed(topic || content)
  • Dedup: query top-1 nearest neighbor on (agent_id, topic). If cosine_similarity > 0.92, UPDATE that row's content, updated_at, access_count++; return was_update: true
  • Otherwise: INSERT new row; return was_update: false

3. agent_memory_forget(memory_id: UUID) → void

  • Soft delete: UPDATE agent_memories SET deleted_at = now() WHERE id = memory_id AND agent_id = current_agent_id
  • Operator/user-driven; agent calls this only on explicit user request ("forget that I said X")
  • Audit row in audit_log (internal_shared_db) for the deletion event

Personality prompt updates

The new tools are useless without prompt changes. Atlas's personality prompt (composable fragment in astra_db.personalities) needs new sections:

  1. Memory recall discipline (added to context-first behavior):

    "Before reasoning about any topic in the user's message, call agent_memory_recall(topic). If the recall returns hits with high recency_score, surface them in your reasoning as 'I remember (from {date}): {content}'. If the recall returns canonical-tagged hits (from shared kb), cite them as 'Per {spec_id|adr_id}: {content}'. Always distinguish private memory ('you told me') from canonical knowledge ('decided in spec/ADR')."

  2. Memory write discipline (added to tool discipline footer):

    "At the end of substantive turns, evaluate whether to call agent_memory_remember. Bias toward remembering. Write a memory when: the user shared a preference (agent_memory_remember(topic='vivek_prefers_terraform_managed_infra', ...)), an entity was named (agent_memory_remember(topic='atlas_agent_email', ...)), a decision was made or referenced, or a context shift happened. Skip writes only for ephemeral chat (greetings, clarifications). Never refuse a forced 'remember that…' request."

  3. Vocabulary discipline (already exists for canonical_terms_lookup; extend):

    "When proposing a new canonical term via canonical_terms_propose (because agent_memory_remember couldn't find it in the dictionary), include the source memory_id in the proposal so reviewers see the provenance."

Phasing (hard prerequisites in order)

The dependency graph is sequential, not parallel. Memory v1 cannot ship without shared_kb being live first.

Phase 1 — Activate shared_kb deployment (HARD prerequisite, ~1-2 days)

  • Add SHARED_KB_DB_URL/USER/PASSWORD to the astra-db-credentials secret via Terraform
  • Add a third Flyway block to .github/workflows/deploy-astra.yml that mounts infra/migrations/shared-kb/ and runs Flyway against shared_kb_db. Pattern: copy the existing astra_db block, swap names + secret keys
  • Verify pgvector is enabled on the Flexible Server (Terraform azurerm_postgresql_flexible_server_configuration)
  • Trigger workflow → 9 rows materialise in shared_kb_db.flyway_schema_history (V001-V009 from infra/migrations/shared-kb/)
  • Smoke test: shared_kb_search from the worker pod should return [] cleanly (tables exist, just empty)

Phase 2 — Seed shared_kb (HARD prerequisite, ~3-5 days)

  • Build a one-shot Kubernetes Job that ingests:
    • docs/specs/approved/*.mdapproved_specs + approved_spec_embeddings
    • .context/decisions/*.mdadrs + adr_embeddings
    • .context/ARCHITECTURE.mdsystem_topology (parse component names + dependencies)
    • Seed canonical_terms with Aucert's existing vocabulary (mine from .context/GLOSSARY.md, ADRs, frontend domain models). Without this, atlas's agent_memory_remember will hit canonical_terms_lookup misses on every common Aucert term and propose them all on day one.
  • Schedule on PR merges (extend deploy-astra.yml or new workflow)
  • Verify: a [kimi] test where atlas is asked about a known ADR retrieves via shared_kb_search instead of reading the file

After Phase 2, atlas has organizational memory — a complete shared kb. The Phase 3 spec can now be written against a real working substrate.

Phase 3 — Write SPEC-023 (this handover's deliverable, ~1-2 days)

Use docs/specs/TEMPLATE.md. Required sections:

  • Decisions: copy the locked-in Phase 0 table verbatim
  • Schema: the agent_memories table; explicitly call out the agent_memory_db separate-database choice
  • Tool contracts: the three signatures above with full input/output schemas
  • Retrieval blending + tagging rules: how the unified ranked list is computed and presented
  • Recency-decay formula: cosine * exp(-Δdays / 90) — flag for tuning later
  • Personality prompt fragments: the three discipline sections
  • Dependencies: SPEC-005 (canonical_terms), Phase 1 + 2 above
  • Deferred to v2: per-user mode, real forgetting policy, cross-agent privacy boundaries (today every agent sees only its own memories — but no enforcement at the DB layer beyond agent_id filter), batch re-canonicalization job
  • Open questions to nail before approval:
    • Do we expose memory entries via Astra UI for operator audit? (Probably yes — read-only listing, similar to "files atlas is watching" page from drive-watch handover)
    • What's the indexed embedding model? (Should match what EmbeddingClient uses today — verify and pin in spec)
    • Do we expose agent_memory_recall as a debug tool atlas can call for itself? (Yes — useful for "what do you remember about X")

Submit as draft, get one human review (likely Vivek), approve, archive.

Phase 4 — Build per-agent memory (~1-2 weeks)

Standard sequence after the spec lands:

  1. Migration files in infra/migrations/agent-memory/V001__create_agent_memories.sql etc.
  2. Add Flyway block in deploy-astra.yml for agent_memory_db (fourth block)
  3. Schema in internal/backend/src/main/kotlin/.../shared/clients/AgentMemoryRepository.kt
  4. Three tools in internal/backend/src/main/kotlin/.../common-tools/memory/
  5. Tool registry registration in ToolContext / shared executor
  6. Personality prompt fragments — update astra_db.personalities rows for atlas
  7. Wire agent_memory_recall to also call shared_kb_search and merge results
  8. Integration test: post a memory, recall it across a different conversation in a different channel
  9. Astra UI page for memory listing/audit (read-only; deletion is a separate UI gate)

Phase 5 — Loud-silence detection (~half day)

Add metric agent_memory_recall_zero_hits_total{agent_id, query_topic}. When the agent calls recall with a query that the LLM rated as "I should know this" and gets zero hits, log and emit metric. Without this, "memory subsystem is silently empty" failures (like the shared_kb_db one we hit on 2026-05-06) recur.

Context (read these before starting)

In this order:

FileWhy
docs/internal/docs/agents/per-agent-memory-handover-2026-05-06.md (this file)Mission + decisions + sequence
docs/specs/drafts/SPEC-005-spec-agent-v0.1.md §8 (shared_kb_db schema)The shared-memory schema this work integrates with — canonical terms, approved specs, ADRs, system topology
infra/migrations/shared-kb/V001..V009*.sqlThe 9 migrations that need to land in Phase 1
.github/workflows/deploy-astra.yml (lines 157-185 area)The pattern for adding Flyway blocks — copy + swap names for shared-kb
infra/migrations/internal-shared/V006__seed_model_registry.sqlReference for the seed-data pattern Phase 2 will use
internal/backend/src/main/kotlin/dev/aucert/internal/agents/common-tools/shared_kb/SharedKbSearchTool.ktThe existing (currently no-op) shared-kb tool to extend
internal/backend/src/main/kotlin/dev/aucert/internal/agents/common-tools/shared_kb/CanonicalTermsLookupTool.ktThe canonical-terms tool that memory writes/reads will depend on
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/EmbeddingClient.ktThe existing embedding client (verify which model — pin in SPEC-023)
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/PostgresClientFactory.ktThe factory pattern for adding a new logical DB (agent_memory_db)
docs/specs/TEMPLATE.mdThe frontmatter format SPEC-023 must match (machine-validated by tools/scripts/validate-spec-frontmatter.sh)

What this handover is NOT

To keep scope honest:

  • Not a forgetting policy design — explicitly deferred to v2
  • Not a cross-agent memory sharing design — agents share via the canonical kb, not by reading each other's private memory
  • Not a multi-tenant scope design — single tenant assumed for v1; multi-tenant needs separate SPEC
  • Not a UI design — Astra console page is mentioned as a phase 4 deliverable but the page design is left to the implementing agent
  • Not a token-budget calibration — recall returns top k=10 by default; whether that fits in context windows is a tuning concern for after Phase 4

Suggested PR breakdown for the implementing agent

  1. PR 1: Phase 1 — Add SHARED_KB credentials secret + Flyway block in deploy-astra.yml. Triggers schema creation in shared_kb_db. Small, infra-only.
  2. PR 2: Phase 2 (part 1) — Build the seed-job Kubernetes Job. Run manually once. Verify canonical_terms, approved_specs, adrs, system_topology populated.
  3. PR 3: Phase 2 (part 2) — Add seed-job to deploy-astra.yml so it runs after spec/ADR merges.
  4. PR 4: Phase 3 — Draft SPEC-023 in docs/specs/drafts/. Get review. Approve, move to approved/.
  5. PR 5: Phase 4 (part 1) — agent_memory_db migrations + repository + tool stubs. Tools return placeholder responses.
  6. PR 6: Phase 4 (part 2) — Wire embedding + dedup-on-write + canonical_terms integration. Tools functional but no personality prompt yet.
  7. PR 7: Phase 4 (part 3) — Personality prompt updates + integration test that proves cross-channel continuity.
  8. PR 8: Phase 4 (part 4) — Astra UI memory listing page (read-only).
  9. PR 9: Phase 5 — agent_memory_recall_zero_hits_total metric + alert.

Starting prompt for the parallel chat

Paste this into the new Claude session to bootstrap the work:

Read docs/internal/docs/agents/per-agent-memory-handover-2026-05-06.md end to end, then read the files listed in the "Context" section of that handover. Confirm understanding by summarising in your first response: (a) the locked-in Phase 0 decisions, (b) why Phase 1 + 2 are hard prerequisites for Phase 3+, (c) the integration mechanism between per-agent memory and canonical terms. Do not make any code changes until I approve the plan. Start with Phase 1 (the Flyway block addition for shared-kb) — it's the smallest, most independent piece and unblocks everything else.

References

  • 2026-05-06 design conversation (this conversation; transcript in claude.ai)
  • SPEC-005 — Spec agent v0.1 design (defines shared_kb_db schema)
  • ADR-005 — PostgreSQL JSONB for KG (the database choice this work inherits from)
  • ADR-008 amendment 2026-05-03 — Foundry deployment naming (relevant only if memory ever uses Foundry-hosted embedding models)
  • 2026-05-06 incident: shared_kb_db discovered empty — the loud-silence failure mode this design must avoid