Skip to main content

Device Twin architecture

The AI Device Twin bridges the gap between emulator and real-device behavior by applying predictive models to emulator execution results. It operates as a sidecar to the Execution layer (L2), processing results before they reach Analysis (L3).

warning

The Device Twin is designed but not built in Phase 1. The current pipeline uses direct emulator execution without prediction overlay. This document describes the planned architecture for Phase 2 (Month 4-6).

Architecture

The Device Twin has two phases: calibration (offline, periodic) and prediction (online, per test run).

Calibration phase

Calibration runs offline by comparing paired test runs — the same test scenario executed on both an emulator and a physical device.

Input: Paired ExecutionTrace records (one emulator, one device) for the same TestScenario

Process:

  1. Screenshot diff — Structural comparison of emulator vs device screenshots per step, identifying layout shifts, rendering differences, and timing gaps
  2. Timing analysis — Compare action-to-render latency between emulator and device, building per-device-family timing models
  3. Interaction diff — Compare touch target responsiveness, scroll behavior, and gesture handling
  4. Model training — Feed divergence patterns into a lightweight regression model that predicts device behavior from emulator results

Output: A per-device-family calibration profile stored as a JSON document:

{
"device_family": "pixel-7",
"os_version": "android-14",
"timing_multiplier": 1.3,
"touch_target_adjustment_dp": 4,
"known_rendering_diffs": [
{"element_type": "gradient_background", "severity": "low", "description": "LinearGradient renders with banding on Mali GPU"}
],
"calibration_date": "2026-04-07",
"paired_runs_count": 50
}

Prediction phase

Prediction runs online during every test execution, processing emulator results before they reach L3 Analysis.

Input: ExecutionTrace from L2 + calibration profile for target device family

Process:

  1. Timing adjustment — Apply timing_multiplier to animation and transition expectations. If an animation takes 200ms on emulator and the multiplier is 1.3x, expect 260ms on device.
  2. Touch target analysis — Check if interactive elements meet minimum touch target size (48dp) when adjusted by touch_target_adjustment_dp. Flag elements that pass on emulator but would fail on device.
  3. Rendering risk assessment — Check screenshots against known_rendering_diffs for the target device family. Flag potential visual issues.
  4. Confidence scoring — Reduce confidence scores for test steps where predictions indicate high divergence risk.

Output: Modified ExecutionTrace with adjusted confidence scores and risk annotations passed to L3.

Integration with L2

The Device Twin integrates with the Execution layer through the same interface+adapter pattern used between all layers:

interface DeviceTwinService {
suspend fun adjustResults(
trace: ExecutionTrace,
targetDevice: DeviceProfile
): AdjustedExecutionTrace
}

// Phase 1: No-op passthrough
class NoOpDeviceTwinService : DeviceTwinService {
override suspend fun adjustResults(trace: ExecutionTrace, targetDevice: DeviceProfile) =
AdjustedExecutionTrace(trace, adjustments = emptyList())
}

// Phase 2: Full prediction
class PredictiveDeviceTwinService(
private val calibrationStore: CalibrationStore
) : DeviceTwinService { ... }

The config flag aucert.device-twin.enabled=true|false switches between the no-op passthrough (Phase 1) and the full prediction engine (Phase 2).

Divergence categories

CategoryRisk levelDetection methodExample
TimingMediumLatency comparisonAnimation stutters on low-end Snapdragon 400 series
Touch targetsHighSize calculation + DPI adjustment44dp button passes on emulator, fails 48dp minimum on device
GPU renderingLowKnown diff databaseGradient banding on ARM Mali GPUs
NetworkMediumLatency injection simulationTimeout handling under 3G conditions
OS skinHighManufacturer-specific test runsSamsung One UI permission dialog differs from stock Android
SensorLowMock data quality assessmentGPS accuracy affects map pin placement test

Data collection strategy

Phase 2 requires paired test runs to build calibration profiles. The collection strategy:

  1. Device farm integration — Partner with cloud device farms (AWS Device Farm, Firebase Test Lab) to run a subset of tests on physical devices
  2. Calibration frequency — Run paired tests weekly, or when a new OS version is detected
  3. Device family coverage — Start with the 5 most popular Android device families (by global market share), expand based on customer device analytics
  4. Minimum data — Require 50+ paired runs per device family before enabling prediction for that family

MVP status

ComponentPhase 1Phase 2
Emulator executionBuilt — Android via ADBUnchanged
Calibration modelNot startedPaired test runs + regression model
Prediction engineNo-op passthroughFull prediction with per-device profiles
Confidence adjustmentNot appliedReduces scores for high-risk divergences
Device farm integrationNot startedAWS Device Farm or Firebase Test Lab
Supported devicesEmulator onlyTop 5 Android device families

What's next