Skip to main content

ADR-004: ML as top-level workspace

Context

The ML component (Python 3.12+) trains and serves models for test generation, analysis, and decision-making. It could live inside backend/ as a subdirectory, as a separate repository, or as a top-level workspace.

AI agents working on the codebase need visibility into ML code — understanding what models exist, their training pipelines, and their API contracts. Co-location in the monorepo is required (ADR-001). The question is where in the monorepo it belongs.

Decision

Place ML code as a top-level workspace at ml/ with its own pyproject.toml, managed by uv. It is a peer to backend/, frontend/, and infra/ — not nested inside any of them.

Structure

ml/
  pyproject.toml       # uv-managed dependencies
  .context/ML.md       # L2 context file
  models/              # Per-model directories
    {model}/
      spec/            # L3 context
      train.py
      evaluate.py
      config.yaml
  tests/
    unit/
    integration/

Alternatives considered

Option	Pros	Cons
Top-level `ml/` (chosen)	Clear separation, own dependency management, visible to agents at L1 scan	Slightly more complex workspace config
Inside `backend/ml/`	Closer to serving code	Python/Kotlin dependency confusion, clutters backend context
Separate repository	Independent CI/CD	Breaks agent context (ADR-001), version sync overhead

Consequences

What becomes easier

ML engineers work in a familiar Python project structure
uv manages Python deps without conflicting with Gradle/pnpm
Bazel can build ML targets alongside backend/frontend
Agent context: ml/.context/ML.md is a standard L2 file

What becomes harder

Proto contract changes require rebuilding ML generated types
ML deployment (serving) needs its own container or sidecar pattern

Context
Decision
- Structure
Alternatives considered
Consequences
- What becomes easier
- What becomes harder