ADR-005: PostgreSQL+JSONB for Knowledge Graph MVP

Context

The Knowledge Graph (KG) stores relationships between mobile app elements — screens, components, actions, test coverage, bug history, and device capabilities. It needs to support graph-like queries (traversal, neighbor lookup, path finding) while remaining operationally simple during MVP.

The KG data model is defined in proto/knowledge-graph.proto: nodes (KGNode) have labels, properties (JSONB), and timestamps. Edges (KGEdge) connect nodes with typed relationships, weights, and properties.

Decision

Use PostgreSQL 18 with JSONB columns for the Knowledge Graph MVP. Node and edge properties are stored as JSONB, enabling flexible schema evolution. Graph traversal uses recursive CTEs.

Migrate to a dedicated graph database (Neo4j or similar) only after reaching 20+ customers, when query complexity and data volume justify the operational overhead.

Alternatives considered

Option	Pros	Cons
PostgreSQL+JSONB (chosen)	Already in stack, flexible schema, ACID transactions, familiar to team	Recursive CTEs less efficient than native graph traversal at scale
Neo4j	Purpose-built for graphs, Cypher query language, excellent traversal	Separate database to operate, backup, monitor; operational overhead for fewer than 20 customers
Amazon Neptune / Azure Cosmos (Gremlin)	Managed, scalable	Cloud lock-in (violates cloud-agnostic principle), expensive at low volume
PostgreSQL + Apache AGE	Graph extension for PG	Immature, limited tooling, unclear maintenance trajectory

Consequences

What becomes easier

One database technology to operate (PG already used for product and internal data)
JSONB allows schema-free node/edge properties during rapid iteration
Standard SQL + JSONB operators for simple queries
Transaction safety across KG and non-KG data in the same database

What becomes harder

Complex multi-hop traversals require recursive CTEs (performance degrades at depth > 5)
No native graph visualization tools (need custom tooling or export to visualization libraries)
Migration to Neo4j later requires data export/transform/load effort

Migration trigger

When graph traversal queries consistently exceed 200ms at production load, or when the KG exceeds 1M nodes, evaluate Neo4j migration.

Amendment 2026-05-15 — superseded in scope by SPEC-035

This ADR is superseded in scope by SPEC-035 — Validation Graph, which provides a much richer rationale and design for what was scaffolded here as "Knowledge Graph MVP."

Key differences captured by SPEC-035:

Renamed and reframed — "Knowledge Graph" becomes the validation graph / knowledge layer, comprising two graphs (per-tenant Tenant Graph + singular Aucert-owned Ecosystem Graph). The original ADR scoped only what we now call the Tenant Graph.
Storage choice unchanged but context-richer — Postgres + pgvector + JSONB + recursive CTEs remains the day-1 engine for both cloud and on-prem. SPEC-035 D26 captures this with a fuller rationale.
Tenancy is tiered AND horizontally scaled (D15 + D15.1) — small tenants pooled across multiple Postgres pods (Salesforce-pod pattern); mid-tier schema-per-tenant; enterprise database-per-tenant or dedicated. This is new vs the original ADR.
Plug-in GraphStorageEngine abstraction (P12 + P13) — application-layer code interacts with the graph through a graph-native abstraction; no SQL leaks above the adapter. This makes future migration tractable without forcing it on day 1.
Migration triggers refined — beyond the original "200ms / 1M nodes" thresholds, SPEC-035 D26 adds: aggregate ≥100M entities, p95 traversal latency >500ms at depth ≥3, shortest-path latency >500ms at depth ≥5 (the likely first trigger to fire), vector index ≥50M embeddings, single-shard write throughput saturated. Triggers are measurements, not deadlines.
Replay-based per-tenant migration playbook — leveraging the append-only claim log (D3 + C1), per-tenant migration to a new engine becomes incremental and reversible, not big-bang.
On-prem stays on Postgres permanently — single-tenant scale comfortably fits.
Multi-SQL-backend support (MariaDB / MySQL / SQL Server) is custom-engagement only — not a generic promise.

For the full rationale, design, alternatives considered, and consequences, see SPEC-035. This ADR is preserved for historical context.

Context​

Decision​

Alternatives considered​

Consequences​

What becomes easier​

What becomes harder​

Migration trigger​

Amendment 2026-05-15 — superseded in scope by SPEC-035​