Spec agent v0.1 — handover (2026-05-01)
TL;DR
The spec agent (atlas) runs end-to-end on production infrastructure. A smoke run completes in ~60 seconds, processes ~150K input tokens, calls real tools against the cloned codebase, and produces a publication-quality summary persisted to task_events.metadata.output_payload. The full plumbing chain (Bedrock LLM → 51 tools → Astra HTTP API → Postgres → Temporal workflow) is operational.
The next milestone is replacing the manual smoke trigger with a real event source — webhooks from Google Docs (and Slack / GitHub / Plane) flowing through the dispatcher service, into Temporal, into the spec agent, ending with the agent posting resolutions back to the originating doc.
Status snapshot
| Component | Status | Notes |
|---|---|---|
astra-backend (Ktor, port 8081) | ✅ Live | Image: aucertacr41e0x5.azurecr.io/astra-backend:latest |
astra-frontend (Next.js console) | ✅ Live | Image: aucertacr41e0x5.azurecr.io/astra-frontend:latest |
astra-proxy (nginx) | ✅ Live | Routes /api/ → backend, / → frontend |
dispatcher (Ktor, webhook → Temporal starter) | ✅ Live | 43h+ uptime; manual + webhook paths exercised end-to-end through public Cloudflare tunnel (verified 2026-05-01); /health + /webhooks/* reachable, /api/tasks gated by CF Access |
spec-agent-worker (Temporal worker) | ✅ Live | Polls spec-agent-queue; ran iter 20 to completion |
Temporal cluster (temporal.aucert.dev) | ✅ Live | UI at https://temporal.aucert.dev |
| Plane (project tracker) | ✅ Live | Various plane-* deployments |
Cloudflare tunnel (cloudflared) | ✅ Live | Public ingress for Astra, Temporal, Plane, dispatcher |
astra_db schema (Postgres) | ✅ Migrated | 13 Flyway migrations applied (V001–V013) |
| Bedrock model access | ✅ Granted | Sonnet 4.6, Opus 4.7, Kimi K2.5, GLM 4.7 |
| AWS IAM admin user | ✅ Created | vivek.soneja with AdministratorAccess, MFA, CLI keys |
What's working today
Spec agent end-to-end
Atlas can be invoked via the smoke script and will:
- Bootstrap — clone the repo into
/workspace/aucertusing a freshly-minted GitHub App installation token, register all 51 tools, fetch agent metadata + composable personalities fromastra_db, build the system prompt. - Pick the right model per operation — Sonnet 4.6 (
us.anthropic.claude-sonnet-4-6) for conversational tasks, Opus 4.7 (us.anthropic.claude-opus-4-7) forspec_finalize/spec_generate/spec_synthesize. Per-operation routing is inSpecAgentConfig.resolveModel(operation). - Tag every response with a model label —
[S46],[O47],[K26](and any future label registered inModelLabels.kt) at the start of every assistant turn, enforced via system-prompt instruction with code-level fallback indoneOutputPayload. Operators can request a specific model via tags in Drive comments —[kimi],[opus],[opus-direct], multi-tag[kimi][opus]for parallel runs. Full reference: Model routing and operator labels. - Dispatch tools through the registry — local repo (read-only filesystem ops), GitHub, Slack, Plane, Google Docs/Drive, Brave web search, knowledge base vector search, embeddings.
- Persist lifecycle to Astra —
fetchTaskContext→updateTaskStatus(IN_PROGRESS)→logTaskEvent("agent_started")→ run loop →completeTaskRun(outcome="success")→logTaskEvent("agent_completed", { output_payload, model_used, ... }). - Be Temporal-retry-safe — failures during the activity body do not push the row to a terminal state; Temporal can retry the activity any number of times without state-machine 422s.
Iter 20 verification
Smoke iter 20 produced this row in astra_db.task_runs:
id = 028a4f80-648a-4f16-bd6b-2fa171d97bce
outcome = success
duration_ms = 59,587
tokens_input = 147,398
tokens_output = 2,590
output_type = inline
And atlas's actual output (in task_events.metadata.output_payload for the agent_completed event) was a publication-quality 12-spec summary including a real bug it caught — LocalRepoClient.listAdrs() scans docs/adrs/ instead of the actual .context/decisions/. Spawn-task chip queued for the fix.
The 8-PR journey
The smoke loop took 20 iterations and 8 PRs to close. Each PR fixed one structural bug surfaced by the previous iteration:
| PR | Layer | Triggered by |
|---|---|---|
| #52 | Server-side GET /api/internal/astra/v1/task-runs/{id}/context route was missing | iter 13 |
| #53 | updateTaskStatus URL/body/enum mismatched server contract | iter 14 |
| #54 | AstraClient was hitting /api/... not /api/internal/astra/v1/... | iter 15 |
| #55 | task_events table was never created (V012 migration) | iter 16 |
| #56 | Activity body marked terminal state, breaking Temporal retries | iter 17 |
| #57 | Bedrock model IDs wrong format + per-operation routing + [S46]/[O47] labels | iter 17 |
| #58 | completeTaskRun POST→PATCH + body shape | iter 18 |
| #59 | Bedrock SDK socket/attempt timeouts too tight for 88K-token calls | iter 19 |
Next milestone — end-to-end webhook flow
The smoke script is a manual trigger that bypasses the dispatcher. The real flow is:
Google Docs comment
↓ (webhook)
dispatcher.aucert.dev
↓ (Temporal workflow start)
SpecAgentWorkflow on spec-agent-queue
↓ (activity dispatch)
spec-agent-worker → SpecAgentExecutor → AgentLoop → Bedrock
↓ (tool calls)
GoogleDocsClient.postComment / GoogleDocsClient.replyToThread
↓
Comment resolution visible in the originating Google Doc
All four webhook handlers (Slack/Drive/GitHub/Plane) are shipped in
internal/dispatcher/src/main/kotlin/dev/aucert/internal/dispatcher/routing/WebhookRoutes.kt.
SignatureValidator does HMAC-SHA256 for Slack and GitHub. The public webhook → DB → Temporal → agent → outcome
chain was verified end-to-end on 2026-05-01: a probe POST to
https://dispatcher.aucert.dev/webhooks/drive (empty body, no headers) produced a real workflow that ran to
outcome=success in 19s (30K input tokens, 647 output, ~$0.10 Bedrock spend). Manual trigger endpoint is
POST /api/tasks (auth: X-API-Key header); exercised 6+ times with 201 responses in the same window.
SPEC-021 documents the design (Wave 5: W5-A standalone Ktor service, W5-B webhook handlers for Slack/Google Drive/GitHub/Plane, W5-C Dockerfile + manifests + tunnel config).
Three known dispatcher issues discovered during the probe run (hardening pass pending — #62):
WebhookRoutesdrops the classifier'sinitialMessagefromtrigger_event; the agent currently sees only raw headers/body rather than a synthesised description of the event.DriveWebhookHandlerdispatches on empty/missing resource headers — the empty-body probe accidentally created a real workflow.DispatcherConfigtreats empty-string K8s secrets as "validation enabled" —SLACK_SIGNING_SECRETandGITHUB_WEBHOOK_SECRETare currently 0-byte placeholders, so HMAC validation fails for those sources.
What needs to happen, in order
1. Webhook source configuration (per source, ~30 min each)
Each source needs to be told to POST to https://dispatcher.aucert.dev/webhooks/<source> with appropriate
authentication. Handler code is already deployed; this is external configuration only.
Google Docs / Drive:
- Create a Drive
changes.watchsubscription targeting the docs you want monitored (or set up at the folder level for a topic-folder). - Subscription POSTs to
https://dispatcher.aucert.dev/webhooks/google-drivewithX-Goog-Resource-State,X-Goog-Channel-Tokenheaders. - Drive subscriptions expire (max 7 days for change watches). Need a renewal job — either a Temporal cron or a K8s CronJob — that re-creates the subscription before expiry. Out of scope for v0.1; renew manually until v0.2.
- Reference: SPEC-021 §W5-B for handler details.
- Prerequisite: apply the dispatcher hardening PR first so empty-header probe traffic no longer triggers real workflows.
Slack:
- Slack app → Event Subscriptions → Request URL:
https://dispatcher.aucert.dev/webhooks/slack. Slack POSTs a URL verification challenge first; the handler echoeschallengeback. - Subscribe to
app_mention,message.channels,message.im,reaction_added(for:atlas:reactions as a trigger). - Slack signs every webhook with HMAC-SHA256 using your signing secret. Handler validates
X-Slack-Signatureagainstv0:<timestamp>:<body>. - Prerequisite: populate
SLACK_SIGNING_SECRETindispatcher-secrets(currently a 0-byte placeholder).
GitHub:
- For each watched repo: Settings → Webhooks → Add webhook → Payload URL
https://dispatcher.aucert.dev/webhooks/github, secret = HMAC key, events =pull_request,pull_request_review_comment,issue_comment,issues. - The GitHub App you already provisioned can also push events org-wide via App-level webhooks if you want one-time setup vs per-repo.
- Validate
X-Hub-Signature-256against the secret. - Prerequisite: populate
GITHUB_WEBHOOK_SECRETindispatcher-secrets(currently a 0-byte placeholder).
Plane:
- Plane workspace settings → webhooks → URL
https://dispatcher.aucert.dev/webhooks/plane, events = issue created/updated/commented.
2. Verify dispatcher → Temporal hop
Verified as of 2026-05-01: manual trigger exercised 6+ times with 201 responses, each kicking a real
SpecAgentWorkflow on spec-agent-queue. A Drive probe confirmed the full DB-persistence and workflow-start path.
To re-run manually from inside the cluster:
# From inside the cluster (Cloudflare Access bypass):
kubectl run curl-test --rm -i --restart=Never -n internal-platform --image=curlimages/curl:8.11.1 -- \
curl -sS -X POST http://dispatcher.internal-platform.svc.cluster.local/api/tasks \
-H "X-API-Key: $(kubectl get secret dispatcher-secrets -n internal-platform -o jsonpath='{.data.API_KEY}' | base64 -d)" \
-H 'Content-Type: application/json' \
-d '{"message":"Manual dispatcher → workflow test"}'
Note: hitting https://dispatcher.aucert.dev/api/tasks from outside the cluster requires a Cloudflare Access
service token — not yet configured. The /webhooks/* paths are CF Access bypassed for HMAC-authenticated callers.
3. End-to-end Google Docs round trip
Once the dispatcher is verified and Google Drive webhooks are configured:
Setup (one-time):
- Pick a test doc — e.g. a draft spec in your Drive.
- Create a Drive
changes.watchsubscription targeting that doc (or its parent folder). Save the subscription ID for cleanup. - Confirm the subscription is active:
aws sts get-caller-identityequivalent for Drive API ishttps://www.googleapis.com/drive/v3/changes/watchreturning a 200.
Test:
- Open the test doc in Google Docs as a real user.
- Add a comment that mentions atlas — e.g. "@atlas can you summarise the open questions in this doc?"
- Within seconds, the dispatcher should receive the webhook. Verify in dispatcher logs.
- Within ~30 seconds, the workflow should be visible in Temporal UI under
aucert-defaultnamespace, ID matching the doc + comment ID. - Within ~60–120 seconds, atlas should:
- Read the doc via
GoogleDocsClient. - Generate a response keyed to the comment.
- Post the response as a reply on the comment via
GoogleDocsClient.replyToComment(or whatever the tool is named inagents/spec/tools/). - Mark
task_runs.outcome = success.
- Read the doc via
What to look for in the doc:
- A new comment reply from the atlas service account (verify the
agent-emailtoken in vault matches the service account writing comments). - The reply should start with
[S46](model label). - The reply text should be relevant to the comment, not a generic "I read the doc" response.
4. Resolve flow
The harder problem: atlas should be able to resolve a comment thread when the user signals they're satisfied (e.g. "thanks atlas, resolved" or a 👍 reaction). This requires:
- A separate webhook event when a comment is resolved (Drive emits
comment.resolved). - Dispatcher routes to a
resolve_commentoperation. - Atlas reads the thread, optionally posts a final summary, and calls
GoogleDocsClient.resolveComment(commentId).
This is genuinely new agent behavior — not just "respond to comments" but "decide when a comment is fully addressed." Worth scoping carefully:
- v0.1: explicit resolve trigger only (user reacts with ✅ or types
/atlas resolve). - v0.2: atlas decides autonomously based on conversation state.
Recommend starting with v0.1 (explicit trigger).
Outstanding cleanup tasks
In rough priority order:
High value, small scope
| Task | Why | Where |
|---|---|---|
Dispatcher webhook hardening (initialMessage in trigger_event, Drive null-resourceId guard, empty-string secret foot-gun) | Probe traffic exposed three small issues blocking clean Drive E2E. #62 | internal/dispatcher/src/main/kotlin/dev/aucert/internal/dispatcher/{routing,webhooks,config}/ |
Fix LocalRepoClient.listAdrs() ADR path | Atlas caught a real bug — points at docs/adrs/ but actual is .context/decisions/. Already spawned as a separate task. | internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/LocalRepoClient.kt |
Promote SPEC-017 to approved/ | Atlas observed it's fully implemented but still in drafts/. Mechanical move. | docs/specs/drafts/SPEC-017-...md → docs/specs/approved/ |
| Update SPEC-010 / SPEC-020 stale Google notes | Both have amendment notes about service-account auth superseded by ADR-014 (OAuth refresh tokens). Should be reconciled. | docs/specs/drafts/SPEC-010...md, SPEC-020-...md |
| Workflow-level final-failure marking | Today, when Temporal exhausts retries, the row stays in running forever. The workflow should mark outcome = failure after final retry. | internal/backend/src/main/kotlin/dev/aucert/internal/agents/spec/temporal/SpecAgentWorkflowImpl.kt |
Quota / capacity
| Task | Why |
|---|---|
| Request Bedrock Opus 4.7 quota increase | Iter 19 + 20 confirmed Sonnet 4.6 works fine on the default quota. Opus 4.7 throttled in earlier tests. Console → Service Quotas → AWS Bedrock → "Cross-region model invocation tokens per minute for Claude Opus 4.7" → request increase. Free, usually approved within hours. |
changes.watch | Shipped in PR #XX — see "Drive watch renewal" section below. |
Deferred from PRs (not blocking)
| Task | Deferred from |
|---|---|
Per-call timeout override via RuntimeConfig.timeoutMs | #59 — today every Bedrock call gets the generous 5-min budget, regardless of expected duration |
Streaming responses (InvokeModelWithResponseStream / Converse Stream) | #59 — would let us start processing tokens as they arrive instead of waiting for the full response |
Widen task_runs.output_ref from varchar(500) to TEXT or jsonb | #58 — today the actual output lives in task_events.metadata.output_payload to avoid truncation |
Per-token cost computation (estimated_cost column on task_runs) | #58 — reporting layer can backfill from tokens_input * input_rate + tokens_output * output_rate using model_used from agent_completed event |
Future agents (Kimi / GLM)
Update 2026-05-06. Kimi shipped, but via a different path than the one originally sketched here. We did not add Kimi via Bedrock Converse — instead, we deployed Kimi K2.6 on Azure AI Foundry and wrote a new
FoundryOpenAIAdapter(OpenAI-compatible chat completions) alongside the existingBedrockAdapter. Atlas now invokes Kimi via the[kimi]operator tag in Google Docs comments, and the route is validated end-to-end on a real Drive comment thread. See Model routing and operator labels for the full operator + developer reference, and ADR-011's 2026-05-06 amendment for the architectural framing.GLM is still future. When it lands, the same one-line
ModelRegistry+ModelLabelspattern applies — adapter choice depends on whether GLM is hosted on Bedrock (extendBedrockAdapterwith a Converse-API variant), Foundry (reuseFoundryOpenAIAdapter), or somewhere else (new adapter).
The original analysis below is preserved for context but the conclusion has shifted: the simpler refactor turned out to be a third adapter (FoundryOpenAIAdapter), not extending BedrockAdapter. The provider-agnostic interface paid for itself.
Both models were provisioned and verified callable on Bedrock (
moonshotai.kimi-k2.5,zai.glm-4.7). When a future non-spec agent needs them:
- Add a
BedrockMessageTranslatorvariant for non-Anthropic body shapes (Anthropic Messages API doesn't apply to Kimi/GLM).- Or use the Bedrock Converse API instead of
InvokeModel— Converse is provider-agnostic and would let the existingBedrockAdapterwork for any model.The simpler refactor is the second one. SPEC for whichever future agent comes first should pick the path.
Architecture decisions reaffirmed
The smoke loop took 8 PRs to close, which prompted a re-examination of the harness strategy decided on 2026-04-15. The conclusion is to keep the existing layered architecture unchanged, with two operational addenda below.
Why the existing decision still holds
Categorising the 8 PRs (#52–#59) honestly against the four-layer model:
| PR | Layer the bug lived in | Would a framework have prevented it? |
|---|---|---|
| #52 | L2 — server-side route handler | No (API design, framework-agnostic) |
| #53 | L2 — client/server contract drift | No (HTTP shape mismatch) |
| #54 | L2 — mount-path prefix mistake | No (path constant typo) |
| #55 | L2 — missing migration | No (schema bug) |
| #56 | L3/4 — activity body marked terminal state | No (Temporal idiom we had to learn) |
| #57 | Provider integration — Bedrock model IDs | No (provider-specific config) |
| #58 | L2 — same class as #52/#53 | No |
| #59 | Provider integration — Bedrock SDK timeouts | No (provider-specific tuning) |
Zero of the 8 PRs were in Layer 1 (the loop). The loop is ~50 lines and worked correctly throughout. Pain was concentrated in API contracts, schema, and provider-specific configuration — none of which a different harness choice would have addressed. Adopting an off-the-shelf framework (Anthropic Agent SDK, LangChain4j, etc.) would have introduced a polyglot service boundary or framework coupling without removing any of the bugs we actually hit.
The original decision document (2026-04-15) should be promoted to a canonical ADR, with the two addenda below incorporated.
Addendum 1 — provider adapters need real-credential integration tests
Three Bedrock-specific quirks only surfaced in production, despite passing unit tests:
- Bare model IDs (
anthropic.claude-sonnet-4-6) rejectInvokeModel— must use cross-region inference profile (us.anthropic.claude-sonnet-4-6). - Default AWS SDK socket timeout (~30s) is tight for large-context LLM calls; needed bumping to 5 min.
agent_tokensvault path versus Astra Token Vault path needed reconciliation for the AWS credentials.
Rule: every new provider adapter (Foundry, Moonshot, Zhipu, etc.) ships with a real-credential integration test that exercises a multi-tool, multi-step interaction end-to-end — not just translation correctness. The BedrockAdapterTest today only covers BedrockMessageTranslator (pure JSON in / JSON out); future adapters need an integration counterpart that runs against the actual provider with @Tag("integration").
Addendum 2 — dispatcher line-count is a real trigger
The harness doc characterises the dispatcher as "thin, ~500 lines." The dispatcher pod has 43h+ uptime; the full public webhook path was exercised end-to-end on 2026-05-01 and the claim held — the handler code is thin and the volume of source-specific logic is still well below the trigger threshold.
Trigger to revisit the build-vs-buy calculation for the dispatcher: if per-source handler complexity (Drive subscription renewal + Slack rate limits + GitHub event filtering + Plane payload variants) pushes the dispatcher past 1500 lines of source-specific code, the cost calculus shifts toward a managed alternative (Trigger.dev, n8n, or similar webhook-to-workflow platform). Until then, keep custom.
This trigger should be added to the "When to reconsider" section of the canonical ADR.
Multi-tenancy considerations
Today everything is single-tenant Aucert-internal. The system was deliberately built that way for v0.1 — getting one tenant working end-to-end is hard enough. But several design choices will reverberate when we onboard external customers, and starting the conversation now (before v0.5+ build) is cheaper than retrofitting.
What's tenant-coupled today
| Surface | Coupling | Hard to change later? |
|---|---|---|
astra_db schemas (agents, task_runs, task_events, agent_tokens, personalities) | Single shared database; no tenant_id column anywhere | Medium — add column + backfill + scope every query |
specs_db, shared_kb_db, internal_shared_db | Single instance per database | Same as above |
Workspace clones (/workspace/aucert) | Hardcoded repo, single GitHub App installation | High — workspace bootstrap assumes one repo |
SystemPromptAssembler | Reads "Aucert core context" — proprietary IP in the prompt | High — context leakage risk if reused unchanged for tenants |
| Bedrock IAM principal | One bedrock-access-key shared across all agent runs | Low — easy to scope per-tenant once IAM is parameterised |
| Astra service token | Single DISPATCHER_SERVICE_TOKEN from astra-secrets | Low — token broker can mint per-tenant tokens |
| Cloudflare tunnel routes | astra.aucert.dev, dispatcher.aucert.dev, temporal.aucert.dev (single tenant) | Medium — needs subdomain or path scheme per tenant |
| K8s namespace | Single internal-platform namespace for all workloads | Medium — per-tenant namespace gives isolation but multiplies operational cost |
| Worker pool concurrency | spec-agent-worker polls one spec-agent-queue | Medium — Temporal task queues per tenant are cheap; worker pool sizing per tenant is the real question |
Open design questions
These should be decided (not necessarily built) before v0.5+ work begins:
-
Database isolation model. Three options, in increasing strength:
- Shared schema,
tenant_idcolumn on every row + row-level security policies. Cheapest, weakest. Risk: missedWHERE tenant_id = ?clause leaks data across tenants. - Per-tenant Postgres schema (
astra_db.tenant_<id>.task_runs). Stronger isolation, requires application to setsearch_path. Migrations applied per schema. - Per-tenant Postgres database (
astra_db_tenant_<id>). Strongest isolation, highest operational cost. Backup/restore scoped per tenant. - Recommendation to pre-decide: start with the second option (per-tenant schema). Strong-enough isolation; reasonable migration story; doesn't multiply infra cost. Only escalate to per-database if a customer's compliance requires it.
- Shared schema,
-
Workspace clone strategy. The spec agent clones
aucert/aucerton activity start. For a customer running atlas on their own private repo:- Per-tenant GitHub App installation (each customer installs the App on their own org).
- Per-tenant workspace volume (different
/workspace/<tenant>/paths). - Per-tenant clone-on-start IAM (no cross-tenant credential leakage).
- Open question: is the workspace shared across activities for the same tenant (cache-friendly, locking risk) or fresh per activity (slower, isolated)?
-
Prompt isolation.
SystemPromptAssemblertoday reads the "Aucert core context" — a section that bleeds Aucert-specific framing ("the Aucert spec workflow", "atlas the spec agent", etc.) into the system prompt. For multi-tenant:- Move the per-Aucert framing into a personality fragment or a tenant config that's swappable.
- The fixed-section "core directive" / "tool discipline" can stay genuinely tenant-agnostic.
- Personalities themselves are already keyed per-agent in
astra_db.agent_personalities— extending to per-tenant is a small change.
-
Audit and compliance.
task_runsis the canonical audit log. Per-tenant access controls + retention policies need to exist before any compliance-sensitive customer onboards. Today everything is in one table queryable by any IAM principal withastra:task_runs.view. -
Cost attribution.
tokens_input/tokens_outputper task_run gives us the raw data, but allocating to a tenant (for invoicing or capacity planning) needstenant_idon the row. Theagent_completedtask_events row already carriesmodel_usedso per-model + per-tenant rate cards become tractable oncetenant_idlands. -
Worker pool sizing. A noisy-neighbour tenant could starve atlas of capacity. Options:
- Per-tenant Temporal task queues with per-queue worker pools (operational cost: more pods).
- Single shared queue with rate limiting at the activity level (cheaper, weaker isolation).
- Recommendation to pre-decide: start with shared queue + activity-level rate limiting; escalate to per-tenant queues if a customer's load justifies dedicated capacity.
What to do now
Don't build any of this yet. But:
- Add
tenant_id UUIDto the next schema migration that touches a tenant-coupled table (with default value'aucert'for backfill). - Wrap the "Aucert core context" section of
SystemPromptAssemblerin a config-driven fragment the next time that file is touched. - When the next agent role's worker pod is being designed, make the IAM principal scoped per-role (not shared with atlas), so per-tenant scoping is a small extension later rather than a refactor.
- Capture this section's open design questions in a new SPEC (e.g. "SPEC-NNN — Multi-tenancy strategy for the agent platform") before the first paying customer is committed to a launch date. The SPEC's job is to pick option (1) and (2) above with concrete tradeoffs.
The handover author's view: the existing single-tenant build is correct for where Aucert is today. None of the choices above need to be made tonight. But picking them in the right order — schema strategy first, workspace second, prompt isolation third — saves weeks compared to retrofitting under deadline pressure.
Operational runbook
Deploy the spec agent worker
./tools/scripts/deploy-spec-agent-worker.sh
Builds internal/backend/Dockerfile.spec-agent, pushes to ACR, restarts the deployment, waits for rollout. Tag defaults to git rev-parse --short HEAD.
Deploy astra-backend
CI handles this automatically on push to main when internal/backend/**, internal/frontend/**, infra/migrations/**, or infra/k8s/internal-platform/astra/** change. Workflow: .github/workflows/deploy-astra.yml.
Manual deploy:
./tools/scripts/astra-deploy.sh # full deploy
./tools/scripts/astra-deploy.sh --backend-only # backend only
./tools/scripts/astra-deploy.sh --skip-migrations # skip Flyway
Run a smoke test
./tools/scripts/smoke-test-spec-agent.sh
# Or with a custom prompt:
./tools/scripts/smoke-test-spec-agent.sh "Read SPEC-005 and SPEC-013, identify any contradictions in the workspace prep section."
The script:
- Resolves atlas's agent UUID via
psqlagainstastra_db. - Inserts a fresh
task_runsrow with the prompt astask_title. - Starts a
SpecAgentWorkflowonspec-agent-queueviatemporalio/admin-toolspod. - Tails worker logs (
kubectl logs -f).
Query a task run's outcome
TASK_RUN_ID=028a4f80-648a-4f16-bd6b-2fa171d97bce # from smoke output
kubectl run pg-probe-$(date +%s) --rm -i --restart=Never -n internal-platform \
--image=postgres:18.3-alpine \
--env="PGPASSWORD=$(kubectl get secret astra-db-credentials -n internal-platform -o jsonpath='{.data.ASTRA_DB_PASSWORD}' | base64 -d)" \
--env="PGSSLMODE=require" \
--command -- psql -h aucert-internal-pg.postgres.database.azure.com \
-U internaladmin -d astra_db -X -A \
-c "SELECT outcome, duration_ms, tokens_input, tokens_output FROM task_runs WHERE id = '$TASK_RUN_ID'::uuid;"
Read the actual agent output:
... -c "SELECT metadata->'output_payload'->>'text' FROM task_events WHERE task_run_id = '$TASK_RUN_ID'::uuid AND event_type = 'agent_completed';"
Check Bedrock model availability
aws bedrock list-foundation-models --region us-west-2 \
--by-provider Anthropic \
--query 'modelSummaries[].[modelId,modelName,modelLifecycle.status]' \
--output table
aws bedrock list-inference-profiles --region us-west-2 \
--query 'inferenceProfileSummaries[].[inferenceProfileId,status]' \
--output table
Verify a model is callable:
aws bedrock-runtime converse \
--region us-west-2 \
--model-id us.anthropic.claude-sonnet-4-6 \
--messages '[{"role":"user","content":[{"text":"hi"}]}]' \
--inference-config '{"maxTokens":20}'
Inspect a workflow in Temporal
UI: https://temporal.aucert.dev → namespace aucert-default → workflow ID matches spec-agent-smoke-<short-uuid> for smoke runs.
CLI from inside the cluster:
kubectl run -it --rm --restart=Never \
--image=temporalio/admin-tools:1.31.0 \
-n temporal \
-- temporal workflow describe \
--address temporal-frontend.temporal.svc.cluster.local:7233 \
--namespace aucert-default \
--workflow-id spec-agent-smoke-<short-uuid>
Drive watch renewal (post-merge operator setup)
Drive watch channels have a 24-hour TTL in Workspace tenants. The renewal job
is a Temporal cron workflow (DriveWatchRenewalWorkflow) registered on the
spec-agent-worker. It runs every 12 h, looks ahead 6 h, and re-registers
any channel that would otherwise expire before the next tick.
Incident that motivated this: a watch registered at 06:55 on 2026-04-30 expired at 06:55 on 2026-05-01. A comment posted 14 minutes after expiry was never seen by atlas. The watch had to be re-registered manually.
Start the cron schedule (one-time, after deploying the updated worker)
kubectl run -it --rm --restart=Never \
--image=temporalio/admin-tools:1.31.0 \
-n temporal \
-- temporal workflow start \
--address temporal-frontend.temporal.svc.cluster.local:7233 \
--namespace aucert-default \
--type DriveWatchRenewalWorkflow \
--task-queue spec-agent-queue \
--cron "0 */12 * * *" \
--workflow-id drive-watch-renewal-cron
The --workflow-id is stable. Re-running this command after a future worker
deployment is safe — Temporal replaces the existing cron handle without
creating duplicates.
Verify the cron is running
In the Temporal UI: https://temporal.aucert.dev → namespace aucert-default
→ search workflow ID drive-watch-renewal-cron. You should see a Running
cron workflow with completed runs on a 12-hour cadence.
CLI:
kubectl run -it --rm --restart=Never \
--image=temporalio/admin-tools:1.31.0 \
-n temporal \
-- temporal workflow describe \
--address temporal-frontend.temporal.svc.cluster.local:7233 \
--namespace aucert-default \
--workflow-id drive-watch-renewal-cron
Check current watch expiry times
kubectl run pg-probe-$(date +%s) --rm -i --restart=Never -n internal-platform \
--image=postgres:18.3-alpine \
--env="PGPASSWORD=$(kubectl get secret astra-db-credentials -n internal-platform -o jsonpath='{.data.ASTRA_DB_PASSWORD}' | base64 -d)" \
--env="PGSSLMODE=require" \
--command -- psql -h aucert-internal-pg.postgres.database.azure.com \
-U internaladmin -d astra_db -X -A \
-c "SELECT channel_id, file_id, agent_id, expiration_time,
expiration_time - now() AS time_left
FROM drive_watches
ORDER BY expiration_time ASC;"
How old-channel cleanup works
The renewal job creates a new channel for each near-expiring watch (Drive
returns a new channel_id). The old channel row is not deleted immediately;
it is left to expire naturally and is purged at the end of each renewal pass
once it is more than 7 days past its expiration_time. This keeps the table
tidy without a separate cleanup job.
References
Specs
- SPEC-005 — Spec agent v0.1 definition (
docs/specs/drafts/SPEC-005-spec-agent-v0.1.md) - SPEC-010 — Parallel execution plan (
docs/specs/drafts/SPEC-010-spec-agent-v0.1-execution-plan.md) - SPEC-012, SPEC-013 — Codebase access amendments
- SPEC-020 — Unified platform and token management (GitHub App auth, integration registry)
- SPEC-021 — Wave 5 dispatcher design (the next milestone)
ADRs
- ADR-002 — Bazel staged adoption (relevant to build-system context)
- ADR-011 — Agent harness layering (Layer 1 =
AgentExecutor, Layer 2 =AgentLoop) - ADR-014 — Google integration via OAuth 2.0 refresh tokens (supersedes earlier service-account approach)
All ADRs live in .context/decisions/ (canonical) and sync to docs/internal/docs/decisions/ via .github/workflows/docs-adr-sync.yml.
Source code entry points
| File | Role |
|---|---|
internal/backend/src/main/kotlin/dev/aucert/internal/agents/spec/temporal/SpecAgentActivityImpl.kt | Temporal activity that spins up the executor |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/spec/SpecAgentExecutor.kt | Atlas-specific executor — model routing, label injection, output wrapping |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/spec/SpecAgentConfig.kt | Atlas config — default runtime config, model constants, per-operation routing |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/executor/AgentExecutor.kt | Base executor (Layer 1) — lifecycle, retry semantics, event logging |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/executor/AgentLoop.kt | Loop (Layer 2) — LLM ↔ tool dispatch |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/AstraClient.kt | HTTP client for astra-backend (task_runs lifecycle) + direct-JDBC reads (agent metadata, tokens) |
internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/model/adapters/BedrockAdapter.kt | Bedrock LLM provider (with the new 5-min timeouts) |
internal/backend/src/main/kotlin/dev/aucert/internal/astra/api/TaskRunApi.kt | Server-side task_runs HTTP routes |
internal/backend/src/main/kotlin/dev/aucert/internal/astra/AstraModule.kt | Mounts all Astra routes under /api/internal/astra/v1 |
Infrastructure
| Path | Role |
|---|---|
infra/migrations/astra/ | Flyway migrations (V001–V013) for astra_db |
infra/k8s/internal-platform/spec-agent-worker/ | K8s deployment for the worker |
infra/k8s/internal-platform/astra/ | K8s manifests for backend / frontend / proxy |
infra/k8s/internal-platform/cloudflared/tunnel.yaml | Cloudflare tunnel routing |
infra/terraform/foundation/ | UAMI, Federated Identity Credentials, Key Vault role assignments |
tools/scripts/smoke-test-spec-agent.sh | The smoke loop trigger |
tools/scripts/deploy-spec-agent-worker.sh | Worker build + deploy |
tools/scripts/astra-deploy.sh | Astra-backend / frontend / proxy deploy |