Per-agent memory across channels — handover (2026-05-06)

Amended 2026-05-08 — partial supersession. Phases 2 and 3 below (seeding shared_kb_db via a Kubernetes job and a workflow trigger) are deferred. The shared knowledge base now uses a wiki-indexed pattern (docs/specs/INDEX.md + .context/decisions/INDEX.md, auto-generated from frontmatter) instead of vector-RAG over shared_kb_db. Phase 1 still applies (PR-1 shipped 2026-05-07 — shared_kb_db schema is live). Phase 4 (per-agent memory in agent_memory_db) still applies unchanged. See shared-kb-bifurcation-decision-2026-05-08.md for the rationale, the deferred-seeding option preserved for re-evaluation, and the triggers that would reactivate it.

Handover for the agent (or human + agent pair) picking up the per-agent memory work after this conversation. The entire design conversation is captured here — the Phase 0 decisions are locked in. The work is to (a) write a formal SPEC, (b) ensure prerequisites are met, (c) build the tables/tools/prompts.

TL;DR

Today atlas has no persistent memory. Every new conversation starts cold. Even the shared knowledge base (shared_kb_db — canonical terms, approved specs, ADRs, system topology) is fully designed but not deployed — its database has zero tables. This blocks scaling conversations across channels: a thread in Drive doesn't carry forward into a thread in Slack, and atlas re-derives the same conclusions every time.

The mission is to ship persistent memory in two layers:

Shared memory (the shared_kb_db from SPEC-005) — canonical terms, indexed approved specs/ADRs/topology. Read by all agents. Already designed; just needs to ship.
Per-agent memory (new, this handover) — what atlas remembers across conversations. Per-agent for v1; per-agent-per-user when personal assistants come online.

The two layers integrate via canonical terms: every memory write goes through canonical_terms_lookup so memories are coherent across the shared/private boundary.

Phase 0 decisions — locked in (do not re-litigate)

These were debated and decided in the 2026-05-06 design conversation. The new SPEC must reflect them as accepted.

Question	Decision	Rationale
Memory scope	Per-agent for v1; per-agent-per-user later (when building personal assistants)	Simpler v1; schema admits `user_id` so adding it later is a NOT NULL flip
Cross-channel join key	`agent_id` from Astra (NOT email)	Email is outbound platform identity (`atlas-agent@aucert.dev`); rotation breaks memory. agent_id is opaque, internal, stable. Astra already owns it.
Write triggers	Aggressive — recall before reasoning on any new topic, evaluate write at end of substantive turn, forced write on "remember that…"	Agent should bias toward remembering rather than selectively curating
Dedup-on-write	Required. Before INSERT, recall on the same topic; if strong-similarity (>0.92 cosine) hit exists, UPDATE that memory's `content`/`updated_at`/`access_count` instead of INSERT	Without this, "aggressive remembering" turns into memory bloat within a week — top-k retrieval slots fill with near-duplicates
Forgetting	Never (for v1). Memories soft-deleted only on explicit user request	Revisit when storage becomes a real concern; not now
Recency ranking	Vector similarity multiplied by recency decay `exp(-Δdays / 90)`, then sort	Memories don't disappear, they just compete worse for retrieval slots over time
Retrieval blending	Blend at retrieval (single ranked list combining shared kb + private memory), tag at presentation (each retrieved fact carries source: private vs canonical)	Agent reasons over a unified list; user/auditor sees attribution
Vocabulary normalization	Mandatory canonical_terms_lookup at write AND recall time. Store both `topic_raw` (verbatim) and `topic` (canonical-resolved). Auto-propose new terms via `canonical_terms_propose`	Prevents drift between memory stores; creates a virtuous loop where memory drives canonical vocabulary growth
Privacy boundary	Per-agent memory in a separate database (`agent_memory_db`) from shared_kb_db	Different access patterns, different sensitivity (agent memory will eventually carry user PII). Future-proofs for per-user role scoping.

Schema sketch

New database agent_memory_db. One table to start:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
  id              UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
  agent_id        UUID         NOT NULL,           -- references astra_db.agents(id) by value, not FK (cross-DB)
  user_id         UUID         NULL,               -- NOT NULL once personal-assistant mode lights up
  topic           VARCHAR(256) NOT NULL,           -- canonical-term-resolved, used for retrieval matching
  topic_raw       VARCHAR(256) NOT NULL,           -- original phrasing as encountered, preserved for citation fidelity
  is_canonical    BOOLEAN      NOT NULL,           -- true if topic was found in canonical_terms; false if fallback
  content         TEXT         NOT NULL,           -- the memory body — never normalize this, store verbatim
  source          JSONB        NOT NULL,           -- { platform, channel_id, thread_id, message_id, observed_at }
  embedding       vector(1536) NOT NULL,           -- embed `topic || content` so canonical form drives retrieval
  access_count    INTEGER      NOT NULL DEFAULT 0, -- popularity signal — incremented on each recall hit
  created_at      TIMESTAMPTZ  NOT NULL DEFAULT now(),
  updated_at      TIMESTAMPTZ  NOT NULL DEFAULT now(),
  deleted_at      TIMESTAMPTZ  NULL                -- soft delete; never hard-delete
);

CREATE INDEX ON agent_memories USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON agent_memories (agent_id, updated_at DESC);
CREATE INDEX ON agent_memories (agent_id, deleted_at) WHERE deleted_at IS NULL;

Tool contracts

Three new tools, registered in the shared tool registry (so all agents inherit them — atlas first, then any future agent):

1. `agent_memory_recall(query: string, k: int = 10) → MemoryHit[]`

Internally: canonical_terms_lookup(query) → query_canonical
Vector search on agent_memories.embedding for query_canonical || query (or just query_canonical if the lookup found a match)
Filter: agent_id = current_agent_id AND deleted_at IS NULL
Re-rank: cosine_similarity * exp(-Δdays / 90) where Δdays is now() - updated_at in days
Return top k as [{memory_id, topic, topic_raw, content, source, similarity, recency_score, source_tag: "private"}]
Also: increment access_count on each returned hit (best-effort UPDATE in the same transaction)
ALSO blend with shared_kb_search results on the same query, tagged source_tag: "canonical". Agent's prompt teaches it to reason over the unified list and attribute by tag.

2. `agent_memory_remember(topic_raw: string, content: string, source: object) → {memory_id, was_update: bool}`

canonical_terms_lookup(topic_raw) → topic_canonical, found: bool
- If found: topic = topic_canonical, is_canonical = true
- If not found: topic = topic_raw, is_canonical = false, AND call canonical_terms_propose(topic_raw, ...) to queue the new term for human review
Compute embedding via EmbeddingClient.embed(topic || content)
Dedup: query top-1 nearest neighbor on (agent_id, topic). If cosine_similarity > 0.92, UPDATE that row's content, updated_at, access_count++; return was_update: true
Otherwise: INSERT new row; return was_update: false

3. `agent_memory_forget(memory_id: UUID) → void`

Soft delete: UPDATE agent_memories SET deleted_at = now() WHERE id = memory_id AND agent_id = current_agent_id
Operator/user-driven; agent calls this only on explicit user request ("forget that I said X")
Audit row in audit_log (internal_shared_db) for the deletion event

Personality prompt updates

The new tools are useless without prompt changes. Atlas's personality prompt (composable fragment in astra_db.personalities) needs new sections:

Memory recall discipline (added to context-first behavior):

"Before reasoning about any topic in the user's message, call agent_memory_recall(topic). If the recall returns hits with high recency_score, surface them in your reasoning as 'I remember (from {date}): {content}'. If the recall returns canonical-tagged hits (from shared kb), cite them as 'Per {spec_id|adr_id}: {content}'. Always distinguish private memory ('you told me') from canonical knowledge ('decided in spec/ADR')."
Memory write discipline (added to tool discipline footer):

"At the end of substantive turns, evaluate whether to call agent_memory_remember. Bias toward remembering. Write a memory when: the user shared a preference (agent_memory_remember(topic='vivek_prefers_terraform_managed_infra', ...)), an entity was named (agent_memory_remember(topic='atlas_agent_email', ...)), a decision was made or referenced, or a context shift happened. Skip writes only for ephemeral chat (greetings, clarifications). Never refuse a forced 'remember that…' request."
Vocabulary discipline (already exists for canonical_terms_lookup; extend):

"When proposing a new canonical term via canonical_terms_propose (because agent_memory_remember couldn't find it in the dictionary), include the source memory_id in the proposal so reviewers see the provenance."

Phasing (hard prerequisites in order)

The dependency graph is sequential, not parallel. Memory v1 cannot ship without shared_kb being live first.

Phase 1 — Activate shared_kb deployment (HARD prerequisite, ~1-2 days)

Add SHARED_KB_DB_URL/USER/PASSWORD to the astra-db-credentials secret via Terraform
Add a third Flyway block to .github/workflows/deploy-astra.yml that mounts infra/migrations/shared-kb/ and runs Flyway against shared_kb_db. Pattern: copy the existing astra_db block, swap names + secret keys
Verify pgvector is enabled on the Flexible Server (Terraform azurerm_postgresql_flexible_server_configuration)
Trigger workflow → 9 rows materialise in shared_kb_db.flyway_schema_history (V001-V009 from infra/migrations/shared-kb/)
Smoke test: shared_kb_search from the worker pod should return [] cleanly (tables exist, just empty)

Phase 2 — Seed shared_kb (HARD prerequisite, ~3-5 days)

Build a one-shot Kubernetes Job that ingests:
- docs/specs/approved/*.md → approved_specs + approved_spec_embeddings
- .context/decisions/*.md → adrs + adr_embeddings
- .context/ARCHITECTURE.md → system_topology (parse component names + dependencies)
- Seed canonical_terms with Aucert's existing vocabulary (mine from .context/GLOSSARY.md, ADRs, frontend domain models). Without this, atlas's agent_memory_remember will hit canonical_terms_lookup misses on every common Aucert term and propose them all on day one.
Schedule on PR merges (extend deploy-astra.yml or new workflow)
Verify: a [kimi] test where atlas is asked about a known ADR retrieves via shared_kb_search instead of reading the file

After Phase 2, atlas has organizational memory — a complete shared kb. The Phase 3 spec can now be written against a real working substrate.

Phase 3 — Write SPEC-023 (this handover's deliverable, ~1-2 days)

Use docs/specs/TEMPLATE.md. Required sections:

Decisions: copy the locked-in Phase 0 table verbatim
Schema: the agent_memories table; explicitly call out the agent_memory_db separate-database choice
Tool contracts: the three signatures above with full input/output schemas
Retrieval blending + tagging rules: how the unified ranked list is computed and presented
Recency-decay formula: cosine * exp(-Δdays / 90) — flag for tuning later
Personality prompt fragments: the three discipline sections
Dependencies: SPEC-005 (canonical_terms), Phase 1 + 2 above
Deferred to v2: per-user mode, real forgetting policy, cross-agent privacy boundaries (today every agent sees only its own memories — but no enforcement at the DB layer beyond agent_id filter), batch re-canonicalization job
Open questions to nail before approval:
- Do we expose memory entries via Astra UI for operator audit? (Probably yes — read-only listing, similar to "files atlas is watching" page from drive-watch handover)
- What's the indexed embedding model? (Should match what EmbeddingClient uses today — verify and pin in spec)
- Do we expose agent_memory_recall as a debug tool atlas can call for itself? (Yes — useful for "what do you remember about X")

Submit as draft, get one human review (likely Vivek), approve, archive.

Phase 4 — Build per-agent memory (~1-2 weeks)

Standard sequence after the spec lands:

Migration files in infra/migrations/agent-memory/V001__create_agent_memories.sql etc.
Add Flyway block in deploy-astra.yml for agent_memory_db (fourth block)
Schema in internal/backend/src/main/kotlin/.../shared/clients/AgentMemoryRepository.kt
Three tools in internal/backend/src/main/kotlin/.../common-tools/memory/
Tool registry registration in ToolContext / shared executor
Personality prompt fragments — update astra_db.personalities rows for atlas
Wire agent_memory_recall to also call shared_kb_search and merge results
Integration test: post a memory, recall it across a different conversation in a different channel
Astra UI page for memory listing/audit (read-only; deletion is a separate UI gate)

Phase 5 — Loud-silence detection (~half day)

Add metric agent_memory_recall_zero_hits_total{agent_id, query_topic}. When the agent calls recall with a query that the LLM rated as "I should know this" and gets zero hits, log and emit metric. Without this, "memory subsystem is silently empty" failures (like the shared_kb_db one we hit on 2026-05-06) recur.

Context (read these before starting)

In this order:

File	Why
`docs/internal/docs/agents/per-agent-memory-handover-2026-05-06.md` (this file)	Mission + decisions + sequence
`docs/specs/drafts/SPEC-005-spec-agent-v0.1.md` §8 (`shared_kb_db` schema)	The shared-memory schema this work integrates with — canonical terms, approved specs, ADRs, system topology
`infra/migrations/shared-kb/V001..V009*.sql`	The 9 migrations that need to land in Phase 1
`.github/workflows/deploy-astra.yml` (lines 157-185 area)	The pattern for adding Flyway blocks — copy + swap names for shared-kb
`infra/migrations/internal-shared/V006__seed_model_registry.sql`	Reference for the seed-data pattern Phase 2 will use
`internal/backend/src/main/kotlin/dev/aucert/internal/agents/common-tools/shared_kb/SharedKbSearchTool.kt`	The existing (currently no-op) shared-kb tool to extend
`internal/backend/src/main/kotlin/dev/aucert/internal/agents/common-tools/shared_kb/CanonicalTermsLookupTool.kt`	The canonical-terms tool that memory writes/reads will depend on
`internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/EmbeddingClient.kt`	The existing embedding client (verify which model — pin in SPEC-023)
`internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/PostgresClientFactory.kt`	The factory pattern for adding a new logical DB (agent_memory_db)
`docs/specs/TEMPLATE.md`	The frontmatter format SPEC-023 must match (machine-validated by `tools/scripts/validate-spec-frontmatter.sh`)

What this handover is NOT

To keep scope honest:

Not a forgetting policy design — explicitly deferred to v2
Not a cross-agent memory sharing design — agents share via the canonical kb, not by reading each other's private memory
Not a multi-tenant scope design — single tenant assumed for v1; multi-tenant needs separate SPEC
Not a UI design — Astra console page is mentioned as a phase 4 deliverable but the page design is left to the implementing agent
Not a token-budget calibration — recall returns top k=10 by default; whether that fits in context windows is a tuning concern for after Phase 4

Suggested PR breakdown for the implementing agent

PR 1: Phase 1 — Add SHARED_KB credentials secret + Flyway block in deploy-astra.yml. Triggers schema creation in shared_kb_db. Small, infra-only.
PR 2: Phase 2 (part 1) — Build the seed-job Kubernetes Job. Run manually once. Verify canonical_terms, approved_specs, adrs, system_topology populated.
PR 3: Phase 2 (part 2) — Add seed-job to deploy-astra.yml so it runs after spec/ADR merges.
PR 4: Phase 3 — Draft SPEC-023 in docs/specs/drafts/. Get review. Approve, move to approved/.
PR 5: Phase 4 (part 1) — agent_memory_db migrations + repository + tool stubs. Tools return placeholder responses.
PR 6: Phase 4 (part 2) — Wire embedding + dedup-on-write + canonical_terms integration. Tools functional but no personality prompt yet.
PR 7: Phase 4 (part 3) — Personality prompt updates + integration test that proves cross-channel continuity.
PR 8: Phase 4 (part 4) — Astra UI memory listing page (read-only).
PR 9: Phase 5 — agent_memory_recall_zero_hits_total metric + alert.

Starting prompt for the parallel chat

Paste this into the new Claude session to bootstrap the work:

Read docs/internal/docs/agents/per-agent-memory-handover-2026-05-06.md end to end, then read the files listed in the "Context" section of that handover. Confirm understanding by summarising in your first response: (a) the locked-in Phase 0 decisions, (b) why Phase 1 + 2 are hard prerequisites for Phase 3+, (c) the integration mechanism between per-agent memory and canonical terms. Do not make any code changes until I approve the plan. Start with Phase 1 (the Flyway block addition for shared-kb) — it's the smallest, most independent piece and unblocks everything else.

References

2026-05-06 design conversation (this conversation; transcript in claude.ai)
SPEC-005 — Spec agent v0.1 design (defines shared_kb_db schema)
ADR-005 — PostgreSQL JSONB for KG (the database choice this work inherits from)
ADR-008 amendment 2026-05-03 — Foundry deployment naming (relevant only if memory ever uses Foundry-hosted embedding models)
2026-05-06 incident: shared_kb_db discovered empty — the loud-silence failure mode this design must avoid

TL;DR​

Phase 0 decisions — locked in (do not re-litigate)​

Schema sketch​

Tool contracts​

1. agent_memory_recall(query: string, k: int = 10) → MemoryHit[]​

2. agent_memory_remember(topic_raw: string, content: string, source: object) → {memory_id, was_update: bool}​

3. agent_memory_forget(memory_id: UUID) → void​

Personality prompt updates​

Phasing (hard prerequisites in order)​

Phase 1 — Activate shared_kb deployment (HARD prerequisite, ~1-2 days)​

Phase 2 — Seed shared_kb (HARD prerequisite, ~3-5 days)​

Phase 3 — Write SPEC-023 (this handover's deliverable, ~1-2 days)​

Phase 4 — Build per-agent memory (~1-2 weeks)​

Phase 5 — Loud-silence detection (~half day)​

Context (read these before starting)​

What this handover is NOT​

Suggested PR breakdown for the implementing agent​

Starting prompt for the parallel chat​

References​