Skip to main content

ADR-004: ML as top-level workspace

Context

The ML component (Python 3.12+) trains and serves models for test generation, analysis, and decision-making. It could live inside backend/ as a subdirectory, as a separate repository, or as a top-level workspace.

AI agents working on the codebase need visibility into ML code — understanding what models exist, their training pipelines, and their API contracts. Co-location in the monorepo is required (ADR-001). The question is where in the monorepo it belongs.

Decision

Place ML code as a top-level workspace at ml/ with its own pyproject.toml, managed by uv. It is a peer to backend/, frontend/, and infra/ — not nested inside any of them.

Structure

ml/
pyproject.toml # uv-managed dependencies
.context/ML.md # L2 context file
models/ # Per-model directories
{model}/
spec/ # L3 context
train.py
evaluate.py
config.yaml
tests/
unit/
integration/

Alternatives considered

OptionProsCons
Top-level ml/ (chosen)Clear separation, own dependency management, visible to agents at L1 scanSlightly more complex workspace config
Inside backend/ml/Closer to serving codePython/Kotlin dependency confusion, clutters backend context
Separate repositoryIndependent CI/CDBreaks agent context (ADR-001), version sync overhead

Consequences

What becomes easier

  • ML engineers work in a familiar Python project structure
  • uv manages Python deps without conflicting with Gradle/pnpm
  • Bazel can build ML targets alongside backend/frontend
  • Agent context: ml/.context/ML.md is a standard L2 file

What becomes harder

  • Proto contract changes require rebuilding ML generated types
  • ML deployment (serving) needs its own container or sidecar pattern