Device Twin architecture
The AI Device Twin bridges the gap between emulator and real-device behavior by applying predictive models to emulator execution results. It operates as a sidecar to the Execution layer (L2), processing results before they reach Analysis (L3).
The Device Twin is designed but not built in Phase 1. The current pipeline uses direct emulator execution without prediction overlay. This document describes the planned architecture for Phase 2 (Month 4-6).
Architecture
The Device Twin has two phases: calibration (offline, periodic) and prediction (online, per test run).
Calibration phase
Calibration runs offline by comparing paired test runs — the same test scenario executed on both an emulator and a physical device.
Input: Paired ExecutionTrace records (one emulator, one device) for the same TestScenario
Process:
- Screenshot diff — Structural comparison of emulator vs device screenshots per step, identifying layout shifts, rendering differences, and timing gaps
- Timing analysis — Compare action-to-render latency between emulator and device, building per-device-family timing models
- Interaction diff — Compare touch target responsiveness, scroll behavior, and gesture handling
- Model training — Feed divergence patterns into a lightweight regression model that predicts device behavior from emulator results
Output: A per-device-family calibration profile stored as a JSON document:
{
"device_family": "pixel-7",
"os_version": "android-14",
"timing_multiplier": 1.3,
"touch_target_adjustment_dp": 4,
"known_rendering_diffs": [
{"element_type": "gradient_background", "severity": "low", "description": "LinearGradient renders with banding on Mali GPU"}
],
"calibration_date": "2026-04-07",
"paired_runs_count": 50
}
Prediction phase
Prediction runs online during every test execution, processing emulator results before they reach L3 Analysis.
Input: ExecutionTrace from L2 + calibration profile for target device family
Process:
- Timing adjustment — Apply
timing_multiplierto animation and transition expectations. If an animation takes 200ms on emulator and the multiplier is 1.3x, expect 260ms on device. - Touch target analysis — Check if interactive elements meet minimum touch target size (48dp) when adjusted by
touch_target_adjustment_dp. Flag elements that pass on emulator but would fail on device. - Rendering risk assessment — Check screenshots against
known_rendering_diffsfor the target device family. Flag potential visual issues. - Confidence scoring — Reduce confidence scores for test steps where predictions indicate high divergence risk.
Output: Modified ExecutionTrace with adjusted confidence scores and risk annotations passed to L3.
Integration with L2
The Device Twin integrates with the Execution layer through the same interface+adapter pattern used between all layers:
interface DeviceTwinService {
suspend fun adjustResults(
trace: ExecutionTrace,
targetDevice: DeviceProfile
): AdjustedExecutionTrace
}
// Phase 1: No-op passthrough
class NoOpDeviceTwinService : DeviceTwinService {
override suspend fun adjustResults(trace: ExecutionTrace, targetDevice: DeviceProfile) =
AdjustedExecutionTrace(trace, adjustments = emptyList())
}
// Phase 2: Full prediction
class PredictiveDeviceTwinService(
private val calibrationStore: CalibrationStore
) : DeviceTwinService { ... }
The config flag aucert.device-twin.enabled=true|false switches between the no-op passthrough (Phase 1) and the full prediction engine (Phase 2).
Divergence categories
| Category | Risk level | Detection method | Example |
|---|---|---|---|
| Timing | Medium | Latency comparison | Animation stutters on low-end Snapdragon 400 series |
| Touch targets | High | Size calculation + DPI adjustment | 44dp button passes on emulator, fails 48dp minimum on device |
| GPU rendering | Low | Known diff database | Gradient banding on ARM Mali GPUs |
| Network | Medium | Latency injection simulation | Timeout handling under 3G conditions |
| OS skin | High | Manufacturer-specific test runs | Samsung One UI permission dialog differs from stock Android |
| Sensor | Low | Mock data quality assessment | GPS accuracy affects map pin placement test |
Data collection strategy
Phase 2 requires paired test runs to build calibration profiles. The collection strategy:
- Device farm integration — Partner with cloud device farms (AWS Device Farm, Firebase Test Lab) to run a subset of tests on physical devices
- Calibration frequency — Run paired tests weekly, or when a new OS version is detected
- Device family coverage — Start with the 5 most popular Android device families (by global market share), expand based on customer device analytics
- Minimum data — Require 50+ paired runs per device family before enabling prediction for that family
MVP status
| Component | Phase 1 | Phase 2 |
|---|---|---|
| Emulator execution | Built — Android via ADB | Unchanged |
| Calibration model | Not started | Paired test runs + regression model |
| Prediction engine | No-op passthrough | Full prediction with per-device profiles |
| Confidence adjustment | Not applied | Reduces scores for high-risk divergences |
| Device farm integration | Not started | AWS Device Farm or Firebase Test Lab |
| Supported devices | Emulator only | Top 5 Android device families |
What's next
- 5-layer deep dive — Full pipeline architecture
- Verification Cascade — Multi-stage verification that uses Device Twin confidence scores