Knowledge Graph internals

The Knowledge Graph (KG) is the foundation of Aucert's test generation intelligence. It ingests multiple source types, stores them as a node-edge graph in PostgreSQL, and serves context to the Generation layer (L1) via protobuf queries.

Data model (protobuf)

Defined in proto/knowledge-graph.proto — this is the contract between the KG engine and all consumers:

message KGQuery {
  string customer_id = 1;    // Tenant isolation
  string app_id = 2;         // Which app to query
  string query_type = 3;     // "all_screens", "critical_paths", "high_risk_nodes"
  map<string, string> filters = 4;  // e.g. {"min_risk_score": "0.7"}
}

message KGResponse {
  repeated KGNode nodes = 1;
  repeated KGEdge edges = 2;
}

message KGNode {
  string id = 1;             // UUID
  string type = 2;           // "screen", "component", "endpoint", "data_model"
  string name = 3;           // Human-readable: "LoginScreen", "POST /auth/login"
  map<string, string> properties = 4;  // Extensible: {"has_form": "true", "risk_score": "0.8"}
}

message KGEdge {
  string source_id = 1;
  string target_id = 2;
  string relationship = 3;   // "navigates_to", "calls", "depends_on", "contains"
}

Storage: PostgreSQL + JSONB

Per ADR-005, the KG uses PostgreSQL with JSONB columns rather than a dedicated graph database. This decision balances flexibility against operational complexity.

info

For server provisioning, credentials, ORM wiring (Exposed), and the rotation runbook, see PostgreSQL configuration.

Consideration	PostgreSQL + JSONB	Neo4j
Query flexibility	JSONB operators + GIN indexes	Cypher query language
Ops overhead	Standard PG — backup, monitoring, scaling well-understood	Separate cluster, different tooling, specialized knowledge
Customer scale	Handles fewer than 20 customers easily	Designed for massive graphs
Schema evolution	JSONB — add properties without migrations	Schema-free, but index changes needed
Migration path	Export JSONB → import to Neo4j if needed	N/A

Table structure

-- Nodes table
CREATE TABLE kg_nodes (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id UUID NOT NULL,
  app_id UUID NOT NULL,
  type VARCHAR(50) NOT NULL,     -- 'screen', 'component', 'endpoint', 'data_model'
  name VARCHAR(255) NOT NULL,
  properties JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Edges table
CREATE TABLE kg_edges (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id UUID NOT NULL,
  source_id UUID NOT NULL REFERENCES kg_nodes(id),
  target_id UUID NOT NULL REFERENCES kg_nodes(id),
  relationship VARCHAR(50) NOT NULL,  -- 'navigates_to', 'calls', 'depends_on', 'contains'
  properties JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Performance indexes
CREATE INDEX idx_kg_nodes_customer_app ON kg_nodes(customer_id, app_id);
CREATE INDEX idx_kg_nodes_type ON kg_nodes(type);
CREATE INDEX idx_kg_nodes_properties ON kg_nodes USING GIN(properties);
CREATE INDEX idx_kg_edges_source ON kg_edges(source_id);
CREATE INDEX idx_kg_edges_target ON kg_edges(target_id);

Ingestion pipeline

The ingestion engine processes five source types, each with a specialized parser:

1. AST ingestion

Input: Application source code (Kotlin, Java, Swift)

Parses code into abstract syntax trees to extract:

Screen definitions — Activities, Fragments, Composables (Android); ViewControllers, SwiftUI Views (iOS)
Navigation paths — Intent launches, NavGraph routes, deeplinks
State management — ViewModel fields, StateFlow/LiveData streams, Redux stores

Each extracted entity becomes a KG node. Navigation between screens becomes navigates_to edges.

2. API schema ingestion

Input: OpenAPI specs (.yaml/.json) or Protobuf definitions (.proto)

Extracts:

Endpoints as nodes — method, path, authentication requirements
Request/response shapes — Field names, types, validation rules
Endpoint-to-screen relationships — Which screens call which endpoints (from AST cross-reference)

3. UI map ingestion

Input: Android XML layouts, Compose hierarchies, iOS Storyboards

Extracts:

Screen hierarchy — View tree structure, nested layouts
Interactive elements — Buttons, text fields, toggles, with accessibility labels
Scroll containers — Lists, grids, and their item types

4. Historical data ingestion

Input: Past TestRunResult records from the database

Overlays the graph with risk signals:

Bug frequency — Nodes associated with past failures get a higher risk_score property
Regression patterns — Edges where transitions have historically failed
Flaky areas — Nodes with inconsistent pass/fail history

5. PRD ingestion

Input: Markdown or structured product requirement documents

Extracts:

User stories — Mapped to screen flows (e.g., "As a user, I can reset my password" → PasswordResetScreen)
Acceptance criteria — Become expected outcomes in generated test scenarios
Feature flags — Conditional behavior that generates test variations

Query patterns

The Generation layer (L1) queries the KG via KGQuery. Common query types:

`query_type`	Returns	Used for
`all_screens`	All screen nodes + navigation edges	Building the full app model for generation
`critical_paths`	Highest-connectivity screen traversals	Prioritizing which flows to test first
`high_risk_nodes`	Nodes with `risk_score > threshold`	Focusing regression testing on known problem areas
`screen_context`	Single screen + all connected nodes/edges	Generating tests for a specific screen

Self-healing (Phase 2)

The KG will eventually self-heal when app structure changes break existing graph nodes. When ingestion detects a renamed screen or removed endpoint:

Match by structure — If a node disappears but a new one with 80%+ similar properties appears, treat it as a rename
Update edges — Redirect existing edges to the new node
Archive old node — Mark as archived rather than deleting (preserves history)
Flag changes — Notify the Generation layer that the graph has changed so it can regenerate affected scenarios

What's next

5-layer deep dive — Full pipeline architecture
Verification Cascade — Decision-making process
Model orchestration — How models are assigned to layers

Data model (protobuf)​

Storage: PostgreSQL + JSONB​

Table structure​

Ingestion pipeline​

1. AST ingestion​

2. API schema ingestion​

3. UI map ingestion​

4. Historical data ingestion​

5. PRD ingestion​

Query patterns​

Self-healing (Phase 2)​

What's next​

Data model (protobuf)

Storage: PostgreSQL + JSONB

Table structure

Ingestion pipeline

1. AST ingestion

2. API schema ingestion

3. UI map ingestion

4. Historical data ingestion

5. PRD ingestion

Query patterns

Self-healing (Phase 2)

What's next