Device Twin architecture

The AI Device Twin bridges the gap between emulator and real-device behavior by applying predictive models to emulator execution results. It operates as a sidecar to the Execution layer (L2), processing results before they reach Analysis (L3).

warning

The Device Twin is designed but not built in Phase 1. The current pipeline uses direct emulator execution without prediction overlay. This document describes the planned architecture for Phase 2 (Month 4-6).

Architecture

The Device Twin has two phases: calibration (offline, periodic) and prediction (online, per test run).

Calibration phase

Calibration runs offline by comparing paired test runs — the same test scenario executed on both an emulator and a physical device.

Input: Paired ExecutionTrace records (one emulator, one device) for the same TestScenario

Process:

Screenshot diff — Structural comparison of emulator vs device screenshots per step, identifying layout shifts, rendering differences, and timing gaps
Timing analysis — Compare action-to-render latency between emulator and device, building per-device-family timing models
Interaction diff — Compare touch target responsiveness, scroll behavior, and gesture handling
Model training — Feed divergence patterns into a lightweight regression model that predicts device behavior from emulator results

Output: A per-device-family calibration profile stored as a JSON document:

{
  "device_family": "pixel-7",
  "os_version": "android-14",
  "timing_multiplier": 1.3,
  "touch_target_adjustment_dp": 4,
  "known_rendering_diffs": [
    {"element_type": "gradient_background", "severity": "low", "description": "LinearGradient renders with banding on Mali GPU"}
  ],
  "calibration_date": "2026-04-07",
  "paired_runs_count": 50
}

Prediction phase

Prediction runs online during every test execution, processing emulator results before they reach L3 Analysis.

Input: ExecutionTrace from L2 + calibration profile for target device family

Process:

Timing adjustment — Apply timing_multiplier to animation and transition expectations. If an animation takes 200ms on emulator and the multiplier is 1.3x, expect 260ms on device.
Touch target analysis — Check if interactive elements meet minimum touch target size (48dp) when adjusted by touch_target_adjustment_dp. Flag elements that pass on emulator but would fail on device.
Rendering risk assessment — Check screenshots against known_rendering_diffs for the target device family. Flag potential visual issues.
Confidence scoring — Reduce confidence scores for test steps where predictions indicate high divergence risk.

Output: Modified ExecutionTrace with adjusted confidence scores and risk annotations passed to L3.

Integration with L2

The Device Twin integrates with the Execution layer through the same interface+adapter pattern used between all layers:

interface DeviceTwinService {
    suspend fun adjustResults(
        trace: ExecutionTrace,
        targetDevice: DeviceProfile
    ): AdjustedExecutionTrace
}

// Phase 1: No-op passthrough
class NoOpDeviceTwinService : DeviceTwinService {
    override suspend fun adjustResults(trace: ExecutionTrace, targetDevice: DeviceProfile) =
        AdjustedExecutionTrace(trace, adjustments = emptyList())
}

// Phase 2: Full prediction
class PredictiveDeviceTwinService(
    private val calibrationStore: CalibrationStore
) : DeviceTwinService { ... }

The config flag aucert.device-twin.enabled=true|false switches between the no-op passthrough (Phase 1) and the full prediction engine (Phase 2).

Divergence categories

Category	Risk level	Detection method	Example
Timing	Medium	Latency comparison	Animation stutters on low-end Snapdragon 400 series
Touch targets	High	Size calculation + DPI adjustment	44dp button passes on emulator, fails 48dp minimum on device
GPU rendering	Low	Known diff database	Gradient banding on ARM Mali GPUs
Network	Medium	Latency injection simulation	Timeout handling under 3G conditions
OS skin	High	Manufacturer-specific test runs	Samsung One UI permission dialog differs from stock Android
Sensor	Low	Mock data quality assessment	GPS accuracy affects map pin placement test

Data collection strategy

Phase 2 requires paired test runs to build calibration profiles. The collection strategy:

Device farm integration — Partner with cloud device farms (AWS Device Farm, Firebase Test Lab) to run a subset of tests on physical devices
Calibration frequency — Run paired tests weekly, or when a new OS version is detected
Device family coverage — Start with the 5 most popular Android device families (by global market share), expand based on customer device analytics
Minimum data — Require 50+ paired runs per device family before enabling prediction for that family

MVP status

Component	Phase 1	Phase 2
Emulator execution	Built — Android via ADB	Unchanged
Calibration model	Not started	Paired test runs + regression model
Prediction engine	No-op passthrough	Full prediction with per-device profiles
Confidence adjustment	Not applied	Reduces scores for high-risk divergences
Device farm integration	Not started	AWS Device Farm or Firebase Test Lab
Supported devices	Emulator only	Top 5 Android device families

What's next

5-layer deep dive — Full pipeline architecture
Verification Cascade — Multi-stage verification that uses Device Twin confidence scores

Architecture​

Calibration phase​

Prediction phase​

Integration with L2​

Divergence categories​

Data collection strategy​

MVP status​

What's next​