Per-agent memory across channels — handover (2026-05-06)
Amended 2026-05-08 — partial supersession. Phases 2 and 3 below (seeding
shared_kb_dbvia a Kubernetes job and a workflow trigger) are deferred. The shared knowledge base now uses a wiki-indexed pattern (docs/specs/INDEX.md+.context/decisions/INDEX.md, auto-generated from frontmatter) instead of vector-RAG overshared_kb_db. Phase 1 still applies (PR-1 shipped 2026-05-07 —shared_kb_dbschema is live). Phase 4 (per-agent memory inagent_memory_db) still applies unchanged. Seeshared-kb-bifurcation-decision-2026-05-08.mdfor the rationale, the deferred-seeding option preserved for re-evaluation, and the triggers that would reactivate it.
Handover for the agent (or human + agent pair) picking up the per-agent memory work after this conversation. The entire design conversation is captured here — the Phase 0 decisions are locked in. The work is to (a) write a formal SPEC, (b) ensure prerequisites are met, (c) build the tables/tools/prompts.
TL;DR
Today atlas has no persistent memory. Every new conversation starts cold. Even the shared knowledge base (shared_kb_db — canonical terms, approved specs, ADRs, system topology) is fully designed but not deployed — its database has zero tables. This blocks scaling conversations across channels: a thread in Drive doesn't carry forward into a thread in Slack, and atlas re-derives the same conclusions every time.
The mission is to ship persistent memory in two layers:
- Shared memory (the
shared_kb_dbfrom SPEC-005) — canonical terms, indexed approved specs/ADRs/topology. Read by all agents. Already designed; just needs to ship. - Per-agent memory (new, this handover) — what atlas remembers across conversations. Per-agent for v1; per-agent-per-user when personal assistants come online.
The two layers integrate via canonical terms: every memory write goes through canonical_terms_lookup so memories are coherent across the shared/private boundary.
Phase 0 decisions — locked in (do not re-litigate)
These were debated and decided in the 2026-05-06 design conversation. The new SPEC must reflect them as accepted.
| Question | Decision | Rationale |
|---|---|---|
| Memory scope | Per-agent for v1; per-agent-per-user later (when building personal assistants) | Simpler v1; schema admits user_id so adding it later is a NOT NULL flip |
| Cross-channel join key | agent_id from Astra (NOT email) | Email is outbound platform identity (atlas-agent@aucert.dev); rotation breaks memory. agent_id is opaque, internal, stable. Astra already owns it. |
| Write triggers | Aggressive — recall before reasoning on any new topic, evaluate write at end of substantive turn, forced write on "remember that…" | Agent should bias toward remembering rather than selectively curating |
| Dedup-on-write | Required. Before INSERT, recall on the same topic; if strong-similarity (>0.92 cosine) hit exists, UPDATE that memory's content/updated_at/access_count instead of INSERT | Without this, "aggressive remembering" turns into memory bloat within a week — top-k retrieval slots fill with near-duplicates |
| Forgetting | Never (for v1). Memories soft-deleted only on explicit user request | Revisit when storage becomes a real concern; not now |
| Recency ranking | Vector similarity multiplied by recency decay exp(-Δdays / 90), then sort | Memories don't disappear, they just compete worse for retrieval slots over time |
| Retrieval blending | Blend at retrieval (single ranked list combining shared kb + private memory), tag at presentation (each retrieved fact carries source: private vs canonical) | Agent reasons over a unified list; user/auditor sees attribution |
| Vocabulary normalization | Mandatory canonical_terms_lookup at write AND recall time. Store both topic_raw (verbatim) and topic (canonical-resolved). Auto-propose new terms via canonical_terms_propose | Prevents drift between memory stores; creates a virtuous loop where memory drives canonical vocabulary growth |
| Privacy boundary | Per-agent memory in a separate database (agent_memory_db) from shared_kb_db | Different access patterns, different sensitivity (agent memory will eventually carry user PII). Future-proofs for per-user role scoping. |
Schema sketch
New database agent_memory_db. One table to start:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE agent_memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id UUID NOT NULL, -- references astra_db.agents(id) by value, not FK (cross-DB)
user_id UUID NULL, -- NOT NULL once personal-assistant mode lights up
topic VARCHAR(256) NOT NULL, -- canonical-term-resolved, used for retrieval matching
topic_raw VARCHAR(256) NOT NULL, -- original phrasing as encountered, preserved for citation fidelity
is_canonical BOOLEAN NOT NULL, -- true if topic was found in canonical_terms; false if fallback
content TEXT NOT NULL, -- the memory body — never normalize this, store verbatim
source JSONB NOT NULL, -- { platform, channel_id, thread_id, message_id, observed_at }
embedding vector(1536) NOT NULL, -- embed `topic || content` so canonical form drives retrieval
access_count INTEGER NOT NULL DEFAULT 0, -- popularity signal — incremented on each recall hit
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
deleted_at TIMESTAMPTZ NULL -- soft delete; never hard-delete
);
CREATE INDEX ON agent_memories USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON agent_memories (agent_id, updated_at DESC);
CREATE INDEX ON agent_memories (agent_id, deleted_at) WHERE deleted_at IS NULL;
Tool contracts
Three new tools, registered in the shared tool registry (so all agents inherit them — atlas first, then any future agent):
1. agent_memory_recall(query: string, k: int = 10) → MemoryHit[]
- Internally:
canonical_terms_lookup(query)→query_canonical - Vector search on
agent_memories.embeddingforquery_canonical || query(or justquery_canonicalif the lookup found a match) - Filter:
agent_id = current_agent_idANDdeleted_at IS NULL - Re-rank:
cosine_similarity * exp(-Δdays / 90)where Δdays isnow() - updated_atin days - Return top k as
[{memory_id, topic, topic_raw, content, source, similarity, recency_score, source_tag: "private"}] - Also: increment
access_counton each returned hit (best-effort UPDATE in the same transaction) - ALSO blend with
shared_kb_searchresults on the same query, taggedsource_tag: "canonical". Agent's prompt teaches it to reason over the unified list and attribute by tag.
2. agent_memory_remember(topic_raw: string, content: string, source: object) → {memory_id, was_update: bool}
canonical_terms_lookup(topic_raw)→topic_canonical, found: bool- If found:
topic = topic_canonical, is_canonical = true - If not found:
topic = topic_raw, is_canonical = false, AND callcanonical_terms_propose(topic_raw, ...)to queue the new term for human review
- If found:
- Compute embedding via
EmbeddingClient.embed(topic || content) - Dedup: query top-1 nearest neighbor on (agent_id, topic). If
cosine_similarity > 0.92, UPDATE that row'scontent,updated_at,access_count++; returnwas_update: true - Otherwise: INSERT new row; return
was_update: false
3. agent_memory_forget(memory_id: UUID) → void
- Soft delete:
UPDATE agent_memories SET deleted_at = now() WHERE id = memory_id AND agent_id = current_agent_id - Operator/user-driven; agent calls this only on explicit user request ("forget that I said X")
- Audit row in
audit_log(internal_shared_db) for the deletion event
Personality prompt updates
The new tools are useless without prompt changes. Atlas's personality prompt (composable fragment in astra_db.personalities) needs new sections:
-
Memory recall discipline (added to context-first behavior):
"Before reasoning about any topic in the user's message, call
agent_memory_recall(topic). If the recall returns hits with high recency_score, surface them in your reasoning as 'I remember (from{date}):{content}'. If the recall returns canonical-tagged hits (from shared kb), cite them as 'Per{spec_id|adr_id}:{content}'. Always distinguish private memory ('you told me') from canonical knowledge ('decided in spec/ADR')." -
Memory write discipline (added to tool discipline footer):
"At the end of substantive turns, evaluate whether to call
agent_memory_remember. Bias toward remembering. Write a memory when: the user shared a preference (agent_memory_remember(topic='vivek_prefers_terraform_managed_infra', ...)), an entity was named (agent_memory_remember(topic='atlas_agent_email', ...)), a decision was made or referenced, or a context shift happened. Skip writes only for ephemeral chat (greetings, clarifications). Never refuse a forced 'remember that…' request." -
Vocabulary discipline (already exists for canonical_terms_lookup; extend):
"When proposing a new canonical term via
canonical_terms_propose(becauseagent_memory_remembercouldn't find it in the dictionary), include the source memory_id in the proposal so reviewers see the provenance."
Phasing (hard prerequisites in order)
The dependency graph is sequential, not parallel. Memory v1 cannot ship without shared_kb being live first.
Phase 1 — Activate shared_kb deployment (HARD prerequisite, ~1-2 days)
- Add
SHARED_KB_DB_URL/USER/PASSWORDto theastra-db-credentialssecret via Terraform - Add a third Flyway block to
.github/workflows/deploy-astra.ymlthat mountsinfra/migrations/shared-kb/and runs Flyway againstshared_kb_db. Pattern: copy the existingastra_dbblock, swap names + secret keys - Verify pgvector is enabled on the Flexible Server (Terraform
azurerm_postgresql_flexible_server_configuration) - Trigger workflow → 9 rows materialise in
shared_kb_db.flyway_schema_history(V001-V009 frominfra/migrations/shared-kb/) - Smoke test:
shared_kb_searchfrom the worker pod should return[]cleanly (tables exist, just empty)
Phase 2 — Seed shared_kb (HARD prerequisite, ~3-5 days)
- Build a one-shot Kubernetes Job that ingests:
docs/specs/approved/*.md→approved_specs+approved_spec_embeddings.context/decisions/*.md→adrs+adr_embeddings.context/ARCHITECTURE.md→system_topology(parse component names + dependencies)- Seed
canonical_termswith Aucert's existing vocabulary (mine from.context/GLOSSARY.md, ADRs, frontend domain models). Without this, atlas'sagent_memory_rememberwill hitcanonical_terms_lookupmisses on every common Aucert term and propose them all on day one.
- Schedule on PR merges (extend
deploy-astra.ymlor new workflow) - Verify: a
[kimi]test where atlas is asked about a known ADR retrieves viashared_kb_searchinstead of reading the file
After Phase 2, atlas has organizational memory — a complete shared kb. The Phase 3 spec can now be written against a real working substrate.
Phase 3 — Write SPEC-023 (this handover's deliverable, ~1-2 days)
Use docs/specs/TEMPLATE.md. Required sections:
- Decisions: copy the locked-in Phase 0 table verbatim
- Schema: the
agent_memoriestable; explicitly call out theagent_memory_dbseparate-database choice - Tool contracts: the three signatures above with full input/output schemas
- Retrieval blending + tagging rules: how the unified ranked list is computed and presented
- Recency-decay formula:
cosine * exp(-Δdays / 90)— flag for tuning later - Personality prompt fragments: the three discipline sections
- Dependencies: SPEC-005 (canonical_terms), Phase 1 + 2 above
- Deferred to v2: per-user mode, real forgetting policy, cross-agent privacy boundaries (today every agent sees only its own memories — but no enforcement at the DB layer beyond
agent_idfilter), batch re-canonicalization job - Open questions to nail before approval:
- Do we expose memory entries via Astra UI for operator audit? (Probably yes — read-only listing, similar to "files atlas is watching" page from drive-watch handover)
- What's the indexed embedding model? (Should match what
EmbeddingClientuses today — verify and pin in spec) - Do we expose
agent_memory_recallas a debug tool atlas can call for itself? (Yes — useful for "what do you remember about X")
Submit as draft, get one human review (likely Vivek), approve, archive.
Phase 4 — Build per-agent memory (~1-2 weeks)
Standard sequence after the spec lands:
- Migration files in
infra/migrations/agent-memory/V001__create_agent_memories.sqletc. - Add Flyway block in
deploy-astra.ymlforagent_memory_db(fourth block) - Schema in
internal/backend/src/main/kotlin/.../shared/clients/AgentMemoryRepository.kt - Three tools in
internal/backend/src/main/kotlin/.../common-tools/memory/ - Tool registry registration in
ToolContext/ shared executor - Personality prompt fragments — update
astra_db.personalitiesrows for atlas - Wire
agent_memory_recallto also callshared_kb_searchand merge results - Integration test: post a memory, recall it across a different conversation in a different channel
- Astra UI page for memory listing/audit (read-only; deletion is a separate UI gate)
Phase 5 — Loud-silence detection (~half day)
Add metric agent_memory_recall_zero_hits_total{agent_id, query_topic}. When the agent calls recall with a query that the LLM rated as "I should know this" and gets zero hits, log and emit metric. Without this, "memory subsystem is silently empty" failures (like the shared_kb_db one we hit on 2026-05-06) recur.
Context (read these before starting)
In this order:
| File | Why |
|---|---|
docs/internal/docs/agents/per-agent-memory-handover-2026-05-06.md (this file) | Mission + decisions + sequence |
docs/specs/drafts/SPEC-005-spec-agent-v0.1.md §8 (shared_kb_db schema) | The shared-memory schema this work integrates with — canonical terms, approved specs, ADRs, system topology |
infra/migrations/shared-kb/V001..V009*.sql | The 9 migrations that need to land in Phase 1 |
.github/workflows/deploy-astra.yml (lines 157-185 area) | The pattern for adding Flyway blocks — copy + swap names for shared-kb |
infra/migrations/internal-shared/V006__seed_model_registry.sql | Reference for the seed-data pattern Phase 2 will use |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/common-tools/shared_kb/SharedKbSearchTool.kt | The existing (currently no-op) shared-kb tool to extend |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/common-tools/shared_kb/CanonicalTermsLookupTool.kt | The canonical-terms tool that memory writes/reads will depend on |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/EmbeddingClient.kt | The existing embedding client (verify which model — pin in SPEC-023) |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/PostgresClientFactory.kt | The factory pattern for adding a new logical DB (agent_memory_db) |
docs/specs/TEMPLATE.md | The frontmatter format SPEC-023 must match (machine-validated by tools/scripts/validate-spec-frontmatter.sh) |
What this handover is NOT
To keep scope honest:
- Not a forgetting policy design — explicitly deferred to v2
- Not a cross-agent memory sharing design — agents share via the canonical kb, not by reading each other's private memory
- Not a multi-tenant scope design — single tenant assumed for v1; multi-tenant needs separate SPEC
- Not a UI design — Astra console page is mentioned as a phase 4 deliverable but the page design is left to the implementing agent
- Not a token-budget calibration — recall returns top k=10 by default; whether that fits in context windows is a tuning concern for after Phase 4
Suggested PR breakdown for the implementing agent
- PR 1: Phase 1 — Add SHARED_KB credentials secret + Flyway block in
deploy-astra.yml. Triggers schema creation inshared_kb_db. Small, infra-only. - PR 2: Phase 2 (part 1) — Build the seed-job Kubernetes Job. Run manually once. Verify
canonical_terms,approved_specs,adrs,system_topologypopulated. - PR 3: Phase 2 (part 2) — Add seed-job to
deploy-astra.ymlso it runs after spec/ADR merges. - PR 4: Phase 3 — Draft SPEC-023 in
docs/specs/drafts/. Get review. Approve, move toapproved/. - PR 5: Phase 4 (part 1) —
agent_memory_dbmigrations + repository + tool stubs. Tools return placeholder responses. - PR 6: Phase 4 (part 2) — Wire embedding + dedup-on-write + canonical_terms integration. Tools functional but no personality prompt yet.
- PR 7: Phase 4 (part 3) — Personality prompt updates + integration test that proves cross-channel continuity.
- PR 8: Phase 4 (part 4) — Astra UI memory listing page (read-only).
- PR 9: Phase 5 —
agent_memory_recall_zero_hits_totalmetric + alert.
Starting prompt for the parallel chat
Paste this into the new Claude session to bootstrap the work:
Read
docs/internal/docs/agents/per-agent-memory-handover-2026-05-06.mdend to end, then read the files listed in the "Context" section of that handover. Confirm understanding by summarising in your first response: (a) the locked-in Phase 0 decisions, (b) why Phase 1 + 2 are hard prerequisites for Phase 3+, (c) the integration mechanism between per-agent memory and canonical terms. Do not make any code changes until I approve the plan. Start with Phase 1 (the Flyway block addition for shared-kb) — it's the smallest, most independent piece and unblocks everything else.
References
- 2026-05-06 design conversation (this conversation; transcript in claude.ai)
- SPEC-005 — Spec agent v0.1 design (defines
shared_kb_dbschema) - ADR-005 — PostgreSQL JSONB for KG (the database choice this work inherits from)
- ADR-008 amendment 2026-05-03 — Foundry deployment naming (relevant only if memory ever uses Foundry-hosted embedding models)
- 2026-05-06 incident: shared_kb_db discovered empty — the loud-silence failure mode this design must avoid