Knowledge Graph internals
The Knowledge Graph (KG) is the foundation of Aucert's test generation intelligence. It ingests multiple source types, stores them as a node-edge graph in PostgreSQL, and serves context to the Generation layer (L1) via protobuf queries.
Data model (protobuf)
Defined in proto/knowledge-graph.proto — this is the contract between the KG engine and all consumers:
message KGQuery {
string customer_id = 1; // Tenant isolation
string app_id = 2; // Which app to query
string query_type = 3; // "all_screens", "critical_paths", "high_risk_nodes"
map<string, string> filters = 4; // e.g. {"min_risk_score": "0.7"}
}
message KGResponse {
repeated KGNode nodes = 1;
repeated KGEdge edges = 2;
}
message KGNode {
string id = 1; // UUID
string type = 2; // "screen", "component", "endpoint", "data_model"
string name = 3; // Human-readable: "LoginScreen", "POST /auth/login"
map<string, string> properties = 4; // Extensible: {"has_form": "true", "risk_score": "0.8"}
}
message KGEdge {
string source_id = 1;
string target_id = 2;
string relationship = 3; // "navigates_to", "calls", "depends_on", "contains"
}
Storage: PostgreSQL + JSONB
Per ADR-005, the KG uses PostgreSQL with JSONB columns rather than a dedicated graph database. This decision balances flexibility against operational complexity.
For server provisioning, credentials, ORM wiring (Exposed), and the rotation runbook, see PostgreSQL configuration.
| Consideration | PostgreSQL + JSONB | Neo4j |
|---|---|---|
| Query flexibility | JSONB operators + GIN indexes | Cypher query language |
| Ops overhead | Standard PG — backup, monitoring, scaling well-understood | Separate cluster, different tooling, specialized knowledge |
| Customer scale | Handles fewer than 20 customers easily | Designed for massive graphs |
| Schema evolution | JSONB — add properties without migrations | Schema-free, but index changes needed |
| Migration path | Export JSONB → import to Neo4j if needed | N/A |
Table structure
-- Nodes table
CREATE TABLE kg_nodes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL,
app_id UUID NOT NULL,
type VARCHAR(50) NOT NULL, -- 'screen', 'component', 'endpoint', 'data_model'
name VARCHAR(255) NOT NULL,
properties JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Edges table
CREATE TABLE kg_edges (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL,
source_id UUID NOT NULL REFERENCES kg_nodes(id),
target_id UUID NOT NULL REFERENCES kg_nodes(id),
relationship VARCHAR(50) NOT NULL, -- 'navigates_to', 'calls', 'depends_on', 'contains'
properties JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Performance indexes
CREATE INDEX idx_kg_nodes_customer_app ON kg_nodes(customer_id, app_id);
CREATE INDEX idx_kg_nodes_type ON kg_nodes(type);
CREATE INDEX idx_kg_nodes_properties ON kg_nodes USING GIN(properties);
CREATE INDEX idx_kg_edges_source ON kg_edges(source_id);
CREATE INDEX idx_kg_edges_target ON kg_edges(target_id);
Ingestion pipeline
The ingestion engine processes five source types, each with a specialized parser:
1. AST ingestion
Input: Application source code (Kotlin, Java, Swift)
Parses code into abstract syntax trees to extract:
- Screen definitions — Activities, Fragments, Composables (Android); ViewControllers, SwiftUI Views (iOS)
- Navigation paths — Intent launches, NavGraph routes, deeplinks
- State management — ViewModel fields, StateFlow/LiveData streams, Redux stores
Each extracted entity becomes a KG node. Navigation between screens becomes navigates_to edges.
2. API schema ingestion
Input: OpenAPI specs (.yaml/.json) or Protobuf definitions (.proto)
Extracts:
- Endpoints as nodes — method, path, authentication requirements
- Request/response shapes — Field names, types, validation rules
- Endpoint-to-screen relationships — Which screens call which endpoints (from AST cross-reference)
3. UI map ingestion
Input: Android XML layouts, Compose hierarchies, iOS Storyboards
Extracts:
- Screen hierarchy — View tree structure, nested layouts
- Interactive elements — Buttons, text fields, toggles, with accessibility labels
- Scroll containers — Lists, grids, and their item types
4. Historical data ingestion
Input: Past TestRunResult records from the database
Overlays the graph with risk signals:
- Bug frequency — Nodes associated with past failures get a higher
risk_scoreproperty - Regression patterns — Edges where transitions have historically failed
- Flaky areas — Nodes with inconsistent pass/fail history
5. PRD ingestion
Input: Markdown or structured product requirement documents
Extracts:
- User stories — Mapped to screen flows (e.g., "As a user, I can reset my password" → PasswordResetScreen)
- Acceptance criteria — Become expected outcomes in generated test scenarios
- Feature flags — Conditional behavior that generates test variations
Query patterns
The Generation layer (L1) queries the KG via KGQuery. Common query types:
query_type | Returns | Used for |
|---|---|---|
all_screens | All screen nodes + navigation edges | Building the full app model for generation |
critical_paths | Highest-connectivity screen traversals | Prioritizing which flows to test first |
high_risk_nodes | Nodes with risk_score > threshold | Focusing regression testing on known problem areas |
screen_context | Single screen + all connected nodes/edges | Generating tests for a specific screen |
Self-healing (Phase 2)
The KG will eventually self-heal when app structure changes break existing graph nodes. When ingestion detects a renamed screen or removed endpoint:
- Match by structure — If a node disappears but a new one with 80%+ similar properties appears, treat it as a rename
- Update edges — Redirect existing edges to the new node
- Archive old node — Mark as
archivedrather than deleting (preserves history) - Flag changes — Notify the Generation layer that the graph has changed so it can regenerate affected scenarios
What's next
- 5-layer deep dive — Full pipeline architecture
- Verification Cascade — Decision-making process
- Model orchestration — How models are assigned to layers