ADR-004: ML as top-level workspace
Context
The ML component (Python 3.12+) trains and serves models for test generation, analysis, and decision-making. It could live inside backend/ as a subdirectory, as a separate repository, or as a top-level workspace.
AI agents working on the codebase need visibility into ML code — understanding what models exist, their training pipelines, and their API contracts. Co-location in the monorepo is required (ADR-001). The question is where in the monorepo it belongs.
Decision
Place ML code as a top-level workspace at ml/ with its own pyproject.toml, managed by uv. It is a peer to backend/, frontend/, and infra/ — not nested inside any of them.
Structure
ml/
pyproject.toml # uv-managed dependencies
.context/ML.md # L2 context file
models/ # Per-model directories
{model}/
spec/ # L3 context
train.py
evaluate.py
config.yaml
tests/
unit/
integration/
Alternatives considered
| Option | Pros | Cons |
|---|---|---|
Top-level ml/ (chosen) | Clear separation, own dependency management, visible to agents at L1 scan | Slightly more complex workspace config |
Inside backend/ml/ | Closer to serving code | Python/Kotlin dependency confusion, clutters backend context |
| Separate repository | Independent CI/CD | Breaks agent context (ADR-001), version sync overhead |
Consequences
What becomes easier
- ML engineers work in a familiar Python project structure
uvmanages Python deps without conflicting with Gradle/pnpm- Bazel can build ML targets alongside backend/frontend
- Agent context:
ml/.context/ML.mdis a standard L2 file
What becomes harder
- Proto contract changes require rebuilding ML generated types
- ML deployment (serving) needs its own container or sidecar pattern