ADR-005: PostgreSQL+JSONB for Knowledge Graph MVP
Context
The Knowledge Graph (KG) stores relationships between mobile app elements — screens, components, actions, test coverage, bug history, and device capabilities. It needs to support graph-like queries (traversal, neighbor lookup, path finding) while remaining operationally simple during MVP.
The KG data model is defined in proto/knowledge-graph.proto: nodes (KGNode) have labels, properties (JSONB), and timestamps. Edges (KGEdge) connect nodes with typed relationships, weights, and properties.
Decision
Use PostgreSQL 16 with JSONB columns for the Knowledge Graph MVP. Node and edge properties are stored as JSONB, enabling flexible schema evolution. Graph traversal uses recursive CTEs.
Migrate to a dedicated graph database (Neo4j or similar) only after reaching 20+ customers, when query complexity and data volume justify the operational overhead.
Alternatives considered
| Option | Pros | Cons |
|---|---|---|
| PostgreSQL+JSONB (chosen) | Already in stack, flexible schema, ACID transactions, familiar to team | Recursive CTEs less efficient than native graph traversal at scale |
| Neo4j | Purpose-built for graphs, Cypher query language, excellent traversal | Separate database to operate, backup, monitor; operational overhead for fewer than 20 customers |
| Amazon Neptune / Azure Cosmos (Gremlin) | Managed, scalable | Cloud lock-in (violates cloud-agnostic principle), expensive at low volume |
| PostgreSQL + Apache AGE | Graph extension for PG | Immature, limited tooling, unclear maintenance trajectory |
Consequences
What becomes easier
- One database technology to operate (PG already used for product and internal data)
- JSONB allows schema-free node/edge properties during rapid iteration
- Standard SQL + JSONB operators for simple queries
- Transaction safety across KG and non-KG data in the same database
What becomes harder
- Complex multi-hop traversals require recursive CTEs (performance degrades at depth > 5)
- No native graph visualization tools (need custom tooling or export to visualization libraries)
- Migration to Neo4j later requires data export/transform/load effort
Migration trigger
When graph traversal queries consistently exceed 200ms at production load, or when the KG exceeds 1M nodes, evaluate Neo4j migration.