Skip to main content

Knowledge Graph internals

The Knowledge Graph (KG) is the foundation of Aucert's test generation intelligence. It ingests multiple source types, stores them as a node-edge graph in PostgreSQL, and serves context to the Generation layer (L1) via protobuf queries.

Data model (protobuf)

Defined in proto/knowledge-graph.proto — this is the contract between the KG engine and all consumers:

message KGQuery {
string customer_id = 1; // Tenant isolation
string app_id = 2; // Which app to query
string query_type = 3; // "all_screens", "critical_paths", "high_risk_nodes"
map<string, string> filters = 4; // e.g. {"min_risk_score": "0.7"}
}

message KGResponse {
repeated KGNode nodes = 1;
repeated KGEdge edges = 2;
}

message KGNode {
string id = 1; // UUID
string type = 2; // "screen", "component", "endpoint", "data_model"
string name = 3; // Human-readable: "LoginScreen", "POST /auth/login"
map<string, string> properties = 4; // Extensible: {"has_form": "true", "risk_score": "0.8"}
}

message KGEdge {
string source_id = 1;
string target_id = 2;
string relationship = 3; // "navigates_to", "calls", "depends_on", "contains"
}

Storage: PostgreSQL + JSONB

Per ADR-005, the KG uses PostgreSQL with JSONB columns rather than a dedicated graph database. This decision balances flexibility against operational complexity.

info

For server provisioning, credentials, ORM wiring (Exposed), and the rotation runbook, see PostgreSQL configuration.

ConsiderationPostgreSQL + JSONBNeo4j
Query flexibilityJSONB operators + GIN indexesCypher query language
Ops overheadStandard PG — backup, monitoring, scaling well-understoodSeparate cluster, different tooling, specialized knowledge
Customer scaleHandles fewer than 20 customers easilyDesigned for massive graphs
Schema evolutionJSONB — add properties without migrationsSchema-free, but index changes needed
Migration pathExport JSONB → import to Neo4j if neededN/A

Table structure

-- Nodes table
CREATE TABLE kg_nodes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL,
app_id UUID NOT NULL,
type VARCHAR(50) NOT NULL, -- 'screen', 'component', 'endpoint', 'data_model'
name VARCHAR(255) NOT NULL,
properties JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Edges table
CREATE TABLE kg_edges (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL,
source_id UUID NOT NULL REFERENCES kg_nodes(id),
target_id UUID NOT NULL REFERENCES kg_nodes(id),
relationship VARCHAR(50) NOT NULL, -- 'navigates_to', 'calls', 'depends_on', 'contains'
properties JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Performance indexes
CREATE INDEX idx_kg_nodes_customer_app ON kg_nodes(customer_id, app_id);
CREATE INDEX idx_kg_nodes_type ON kg_nodes(type);
CREATE INDEX idx_kg_nodes_properties ON kg_nodes USING GIN(properties);
CREATE INDEX idx_kg_edges_source ON kg_edges(source_id);
CREATE INDEX idx_kg_edges_target ON kg_edges(target_id);

Ingestion pipeline

The ingestion engine processes five source types, each with a specialized parser:

1. AST ingestion

Input: Application source code (Kotlin, Java, Swift)

Parses code into abstract syntax trees to extract:

  • Screen definitions — Activities, Fragments, Composables (Android); ViewControllers, SwiftUI Views (iOS)
  • Navigation paths — Intent launches, NavGraph routes, deeplinks
  • State management — ViewModel fields, StateFlow/LiveData streams, Redux stores

Each extracted entity becomes a KG node. Navigation between screens becomes navigates_to edges.

2. API schema ingestion

Input: OpenAPI specs (.yaml/.json) or Protobuf definitions (.proto)

Extracts:

  • Endpoints as nodes — method, path, authentication requirements
  • Request/response shapes — Field names, types, validation rules
  • Endpoint-to-screen relationships — Which screens call which endpoints (from AST cross-reference)

3. UI map ingestion

Input: Android XML layouts, Compose hierarchies, iOS Storyboards

Extracts:

  • Screen hierarchy — View tree structure, nested layouts
  • Interactive elements — Buttons, text fields, toggles, with accessibility labels
  • Scroll containers — Lists, grids, and their item types

4. Historical data ingestion

Input: Past TestRunResult records from the database

Overlays the graph with risk signals:

  • Bug frequency — Nodes associated with past failures get a higher risk_score property
  • Regression patterns — Edges where transitions have historically failed
  • Flaky areas — Nodes with inconsistent pass/fail history

5. PRD ingestion

Input: Markdown or structured product requirement documents

Extracts:

  • User stories — Mapped to screen flows (e.g., "As a user, I can reset my password" → PasswordResetScreen)
  • Acceptance criteria — Become expected outcomes in generated test scenarios
  • Feature flags — Conditional behavior that generates test variations

Query patterns

The Generation layer (L1) queries the KG via KGQuery. Common query types:

query_typeReturnsUsed for
all_screensAll screen nodes + navigation edgesBuilding the full app model for generation
critical_pathsHighest-connectivity screen traversalsPrioritizing which flows to test first
high_risk_nodesNodes with risk_score > thresholdFocusing regression testing on known problem areas
screen_contextSingle screen + all connected nodes/edgesGenerating tests for a specific screen

Self-healing (Phase 2)

The KG will eventually self-heal when app structure changes break existing graph nodes. When ingestion detects a renamed screen or removed endpoint:

  1. Match by structure — If a node disappears but a new one with 80%+ similar properties appears, treat it as a rename
  2. Update edges — Redirect existing edges to the new node
  3. Archive old node — Mark as archived rather than deleting (preserves history)
  4. Flag changes — Notify the Generation layer that the graph has changed so it can regenerate affected scenarios

What's next