Skip to main content

ADR-005: PostgreSQL+JSONB for Knowledge Graph MVP

Context

The Knowledge Graph (KG) stores relationships between mobile app elements — screens, components, actions, test coverage, bug history, and device capabilities. It needs to support graph-like queries (traversal, neighbor lookup, path finding) while remaining operationally simple during MVP.

The KG data model is defined in proto/knowledge-graph.proto: nodes (KGNode) have labels, properties (JSONB), and timestamps. Edges (KGEdge) connect nodes with typed relationships, weights, and properties.

Decision

Use PostgreSQL 16 with JSONB columns for the Knowledge Graph MVP. Node and edge properties are stored as JSONB, enabling flexible schema evolution. Graph traversal uses recursive CTEs.

Migrate to a dedicated graph database (Neo4j or similar) only after reaching 20+ customers, when query complexity and data volume justify the operational overhead.

Alternatives considered

OptionProsCons
PostgreSQL+JSONB (chosen)Already in stack, flexible schema, ACID transactions, familiar to teamRecursive CTEs less efficient than native graph traversal at scale
Neo4jPurpose-built for graphs, Cypher query language, excellent traversalSeparate database to operate, backup, monitor; operational overhead for fewer than 20 customers
Amazon Neptune / Azure Cosmos (Gremlin)Managed, scalableCloud lock-in (violates cloud-agnostic principle), expensive at low volume
PostgreSQL + Apache AGEGraph extension for PGImmature, limited tooling, unclear maintenance trajectory

Consequences

What becomes easier

  • One database technology to operate (PG already used for product and internal data)
  • JSONB allows schema-free node/edge properties during rapid iteration
  • Standard SQL + JSONB operators for simple queries
  • Transaction safety across KG and non-KG data in the same database

What becomes harder

  • Complex multi-hop traversals require recursive CTEs (performance degrades at depth > 5)
  • No native graph visualization tools (need custom tooling or export to visualization libraries)
  • Migration to Neo4j later requires data export/transform/load effort

Migration trigger

When graph traversal queries consistently exceed 200ms at production load, or when the KG exceeds 1M nodes, evaluate Neo4j migration.