Model orchestration

Aucert uses a tiered model strategy to balance quality and cost across the 5-layer pipeline. Each pipeline layer has different requirements — generation needs large context and vision, analysis needs cheap bulk processing, decision needs strong reasoning — so a single model cannot optimize all layers.

Model tiers

Aucert separates AI workloads into three cost tiers, each serving a different purpose:

Tier	Description	Cost/1M tokens	Use case	Provider
Tier S	Self-hosted fine-tuned models	$0.01–0.05	High-volume, domain-specific tasks	Future (Month 12+)
Tier Foundry	Azure AI Foundry (serverless)	$0.14–8.00	Product inference — per-layer optimization	Azure AI Foundry
Tier API	Direct API (Claude, GPT)	~$3.00–15.00	Coding agents, not product inference	Anthropic / Bedrock

info

Why separate tiers? Product inference (testing customer apps) and developer tooling (coding agents) have different cost profiles, latency requirements, and billing streams. Keeping them separate means Founders Hub credits cover product inference while coding agents bill to a separate Anthropic/AWS account.

Current model assignments

Phase 1 uses static per-layer model assignment via Kubernetes ConfigMap (ADR-009):

Pipeline layer	Model	Deployment	Cost/1M input	Why this model
L1 Generation	Kimi K2.6	`Kimi-K2.6`	~$0.28	Multimodal thinking model with 128K context window. Handles full KG context snapshot + UI screenshots for scenario design. Agent workflow optimization.
L2 Execution	N/A	N/A	N/A	Engine-driven (ADB commands), not LLM-powered
L3 Analysis	DeepSeek V3.2	`DeepSeek-V3.2`	~$0.14	Cheapest capable model for bulk visual analysis. L3 processes every screenshot in every test run — per-token cost dominates. Sufficient visual reasoning quality.
L4 Decision	GPT-5.4	`gpt-5.4`	~$4.00	Best reasoning quality for the Verification Cascade. Only invoked for ambiguous results (Stages 3–4), so high per-token cost is acceptable due to low volume.
L5 Reporting	Kimi K2.6 (shared)	`Kimi-K2.6`	~$0.28	Placeholder — shares L1 deployment. Reporting is pure NLG (bug summaries, severity classification); no need for a separate model until L1+L5 contention emerges.

Model capability comparison

Capability	Kimi K2.6	DeepSeek V3.2	GPT-5.4
Context window	128K	128K	256K
Vision (multimodal)	Yes	Yes	Yes
Reasoning depth	Medium (thinking model)	Low-medium	High
Structured output	Good	Good	Excellent
Speed (tokens/sec)	Medium	Fast	Medium
Cost tier	Low	Low	Medium-high

Configuration

Model routing is controlled entirely by the llm-config ConfigMap — no code changes needed to switch models:

# k8s/aucert-dev/llm-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: llm-config
  namespace: aucert-dev
data:
  LLM_PROVIDER: "azure-foundry"
  FOUNDRY_ENDPOINT: "https://aucert-ai.cognitiveservices.azure.com/"
  FOUNDRY_MODEL_GENERATION: "Kimi-K2.6"      # L1
  FOUNDRY_MODEL_ANALYSIS: "DeepSeek-V3.2"    # L3
  FOUNDRY_MODEL_REASONING: "gpt-5.4"         # L4
  FOUNDRY_MODEL_REPORTING: "Kimi-K2.6"       # L5 — shares L1 deployment

Switching a model requires two steps:

Deploy the new model in Foundry (foundry.tf → terraform apply)
Update the ConfigMap deployment name → restart pod

The backend reads these values at startup through the LLMConfig class, which maps each pipeline layer to its Foundry deployment name.

Cost analysis

Per-test-run cost estimate

A typical test run generates ~20 test scenarios, each with ~5 steps:

Layer	Invocations/run	Avg tokens/call	Cost/run
L1 Generation	1 (full batch)	~8,000 in + ~4,000 out	$0.002
L3 Analysis	~100 (20 scenarios × 5 screenshots)	~2,000 in + ~500 out	$0.035
L4 Decision	~20 (one per scenario, Stage 1 only)	~1,000 in + ~200 out	$0.010
L5 Reporting	1 (summary)	~3,000 in + ~1,500 out	$0.001
Total			~$0.048/run

Monthly cost projections

Scale	Runs/day	Monthly cost	Notes
Development	5–10	$7–15	Current stage
Early customers	30–50	$45–75	Target for Month 3–6
Growth	100–200	$150–300	Revisit self-hosting at this point
Scale (Tier S needed)	500+	$750+	Self-hosted models become cheaper

All costs are covered by Founders Hub credits ($1,000) for the first 10–22 months at development scale.

LLM API access

Use case	Provider	Status	Notes
Product inference	Azure AI Foundry (`aucert-ai`)	Active — 4 models deployed	Founders Hub credits
Coding agents	Anthropic Direct API	Active (fallback)	Separate billing
Coding agents	AWS Bedrock	Pending Activate credits	Will be primary for coding agents

Planned: dynamic routing (Phase 2+)

warning

Dynamic routing is designed but not built. Phase 1 uses static ConfigMap routing. This section describes the Phase 2 architecture.

Phase 2 introduces a RouterPolicyEngine that dynamically selects the optimal model per request based on task characteristics:

Routing dimensions

Dimension	Signal	Effect
Task complexity	Prompt length, required reasoning steps, prior-stage confidence	Simple tasks → cheapest model; complex tasks → reasoning model
Required capabilities	Image attachments, structured output schema, function calling	Routes to models with the needed capability
Cost budget	Per-customer monthly budget, current spend	Downgrades to cheaper model when approaching budget limit
Latency SLA	Customer tier, interactive vs batch	Fast models for interactive; slow-but-better for batch
Quality feedback	Historical accuracy for this task type + model	Learns which models perform best for specific task patterns

RouterPolicyEngine interface

interface RouterPolicyEngine {
    suspend fun selectModel(
        layer: PipelineLayer,
        taskProfile: TaskProfile,
        customerBudget: BudgetConstraints
    ): ModelSelection
}

data class TaskProfile(
    val complexity: Complexity,          // SIMPLE, MODERATE, COMPLEX
    val requiresVision: Boolean,
    val requiresReasoning: Boolean,
    val estimatedInputTokens: Int,
    val estimatedOutputTokens: Int
)

data class ModelSelection(
    val deployment: String,              // Foundry deployment name
    val tier: ModelTier,                 // S, FOUNDRY, API
    val estimatedCost: BigDecimal,
    val fallback: String?                // Fallback deployment if primary fails
)

When Tier S (self-hosting) makes sense

Self-hosting open models (Llama, DeepSeek, Mistral) on GPU VMs becomes cost-effective when:

Condition	Threshold	Current
Monthly API spend	> $10,000/month	~$50/month
Request volume	> 500 runs/day sustained	~5/day
Domain-specific quality	Fine-tuned model > general model	Not yet measured
Latency requirements	P99 < 500ms needed	Not a constraint

We are nowhere near self-hosting thresholds. Revisit at Month 12+ if paying customers generate high inference volume.

What's next

5-layer deep dive — Full pipeline architecture
Verification Cascade — Multi-stage verification that uses model escalation
Foundry architecture — Why no LLMs in AKS
ADR-009 — Model selection decision record

Model tiers​

Current model assignments​

Model capability comparison​

Configuration​

Cost analysis​

Per-test-run cost estimate​

Monthly cost projections​

LLM API access​

Planned: dynamic routing (Phase 2+)​

Routing dimensions​

RouterPolicyEngine interface​

When Tier S (self-hosting) makes sense​

What's next​