PostgreSQL configuration
This page is the single reference for how Postgres is wired across the Aucert stack: where it is provisioned, what credentials exist, how pods and developers connect, how migrations and the ORM are organized, and which gaps are known (no role hierarchy, no automated rotation, no pre-merge terraform plan). Read this before making any schema or infra change that touches a Postgres instance.
Credentials flow Terraform random_password → Key Vault → K8s Secret → pod env for every working service. Spec-agent-worker is the one exception — it tries to read Key Vault directly via KeyVaultSecretProvider, but Workload Identity for that pod is not yet wired so every fetch fails. See drift-2026-04-29-spec-agent-keyvault-auth.md and Secrets management → Migration to Workload Identity.
Rotation today is manual at every hop; see Rotating credentials.
Quick reference
| Property | Value |
|---|---|
| Servers (as of 2026-04-23) | aucert-internal-pg (foundation tier), aucertdev-product-pg (dev environment tier) |
| Engine | PostgreSQL 18 Flexible Server, SKU B_Standard_B2s, westus |
| Network | Private VNet only — no public endpoint, sslmode=require |
| Private DNS zones | aucert-internal.postgres.database.azure.com, aucertdev.postgres.database.azure.com |
| Admin user (internal) | internaladmin (hard-coded in foundation Terraform) |
| Admin user (product) | from var.pg_admin_username (env-dev variable) |
| Key Vault | aucertdev-kv-41e0x5 |
| ORM | Exposed 0.58.0 (Kotlin) + HikariCP 6.2.1 pool, driver org.postgresql:postgresql:42.7.5 |
| Migrations | Flyway 11.4.0, four migration sets under infra/migrations/ |
| Terraform files | infra/terraform/foundation/database.tf, infra/terraform/environments/dev/database.tf, infra/terraform/internal-platform/databases.tf |
Servers and tiers
Two Flexible Server instances are provisioned across two Terraform tiers. A third instance for production is planned but not yet created.
| Server | Terraform tier | Admin user | Extensions allow-list | Destroyable | Databases |
|---|---|---|---|---|---|
aucert-internal-pg | foundation/ | internaladmin | BTREE_GIN (for Temporal) | prevent_destroy = true | plane_db, internal_shared_db, astra_db, specs_db, shared_kb_db, temporal, temporal_visibility |
aucertdev-product-pg | environments/dev/ | var.pg_admin_username | Azure default | No (dev can be rebuilt) | aucert |
| Future production PG | environments/prod/ (reserved — subnet 10.0.8.0/24) | TBD | TBD | prevent_destroy planned | TBD |
aucert-internal-pg is tagged destroyable=false and has lifecycle { prevent_destroy = true } in infra/terraform/foundation/database.tf. Destroying a dev environment can never reach it. Any attempt to terraform destroy the foundation tier will be blocked by Terraform.
The azure.extensions server parameter is a dynamic allow-list — Flexible Server rejects CREATE EXTENSION for anything not in the list. Today only BTREE_GIN is listed for the internal PG (needed by Temporal's advanced_visibility migration). pgvector works because it is already in the Azure default allow-list for PG 16; if you need a non-default extension, add it to azurerm_postgresql_flexible_server_configuration.internal_pg_extensions and terraform apply before the migration runs.
Databases
The internal PG multi-tenants seven logical databases on one server — each with a distinct owner and migration set. Separation is by database, not by role.
| Database | Owner path | Migrations | Populated by |
|---|---|---|---|
internal_shared_db | Astra backend (Exposed) | infra/migrations/internal-shared/ (V001–V006) | Startup Flyway in DatabaseFactory |
astra_db | Astra backend (Exposed) | infra/migrations/astra/ (V001–V010) | Startup Flyway in DatabaseFactory |
specs_db | Spec agent (raw JDBC) | infra/migrations/specs/ (V001–V007) | Flyway pod in deploy workflow |
shared_kb_db | All agents (raw JDBC, read-heavy) | infra/migrations/shared-kb/ (V001–V009, includes pgvector) | Flyway pod in deploy workflow |
plane_db | Plane CE (external tool) | Managed by Plane | Plane's own bootstrap |
temporal / temporal_visibility | Temporal server | Managed by Temporal Helm chart | Temporal's own schema tool |
aucert (on product PG) | Product backend | Not yet populated | TBD |
Access control — known gap
There is no role hierarchy today. Every backend connection uses the server admin (internaladmin on the internal PG, the pg_admin_username variable on product PG). Defense lives entirely at the network layer:
- Both servers have
public_network_access_enabled = false - Both are delegated to dedicated subnets (
internal-platform-subnet,postgres-subnet) - Private DNS zones are VNet-linked, so pods resolve
*.postgres.database.azure.comto private IPs - TLS is enforced via
sslmode=requirein every connection string
We have not carved out app_rw / app_ro / migrator roles per database. Every service has full DDL + DML as the admin. Adding a role hierarchy is tracked as future work; until then, treat any app-level bug that leaks SQL as a worst-case blast radius.
Credentials
Key Vault secrets
All passwords live in the foundation-tier Key Vault aucertdev-kv-41e0x5. Generated by Terraform random_password (32 chars, special included) and written as secrets.
| Secret name | Written by | Purpose | Consumers |
|---|---|---|---|
internal-pg-admin-password | infra/terraform/foundation/database.tf | Internal PG admin password | setup-astra-secrets.sh, setup-temporal-secrets.sh |
internal-pg-connection-string | infra/terraform/foundation/database.tf | Full postgresql://… URL (defaults to plane_db) | Convenience only — services build their own per-DB URLs |
pg-admin-password | infra/terraform/environments/dev/database.tf | Product PG admin password | Product backend (future) |
pg-connection-string | infra/terraform/environments/dev/database.tf | Full URL pointing at aucert DB | Product backend (future) |
The internal-pg-connection-string secret encodes /plane_db as the default database. It is convenience metadata — every actual consumer constructs its own JDBC URL with the correct database name at secret-sync time.
Kubernetes secrets
tools/scripts/setup-astra-secrets.sh mirrors Key Vault into K8s. The script builds two JDBC URLs (one per database) and writes six keys into astra-db-credentials:
PG_PASS=$(az keyvault secret show --vault-name "$KV_NAME" \
--name "$KV_SECRET_NAME" --query value -o tsv)
SHARED_DB_URL="jdbc:postgresql://${PG_HOST}:${PG_PORT}/internal_shared_db?sslmode=require"
ASTRA_DB_URL="jdbc:postgresql://${PG_HOST}:${PG_PORT}/astra_db?sslmode=require"
kubectl create secret generic "$DB_SECRET_NAME" -n "$NAMESPACE" \
--from-literal=INTERNAL_SHARED_DB_URL="$SHARED_DB_URL" \
--from-literal=INTERNAL_SHARED_DB_USER="$PG_USER" \
--from-literal=INTERNAL_SHARED_DB_PASSWORD="$PG_PASS" \
--from-literal=ASTRA_DB_URL="$ASTRA_DB_URL" \
--from-literal=ASTRA_DB_USER="$PG_USER" \
--from-literal=ASTRA_DB_PASSWORD="$PG_PASS" \
--dry-run=client -o yaml | kubectl apply -f -
| K8s secret | Namespace | Keys | Source |
|---|---|---|---|
astra-db-credentials | internal-platform | INTERNAL_SHARED_DB_URL/USER/PASSWORD, ASTRA_DB_URL/USER/PASSWORD | tools/scripts/setup-astra-secrets.sh |
astra-secrets | internal-platform | ENCRYPTION_MASTER_KEY, CF_AUDIENCE, DISPATCHER_SERVICE_TOKEN | Same script (generated, not mirrored) |
temporal-postgres-credentials | temporal | password | tools/scripts/setup-temporal-secrets.sh |
See Secrets management for the full Key Vault inventory.
Connecting from a pod
Deployments inject credentials via explicit secretKeyRef entries, not envFrom — this keeps each variable's source auditable in the manifest. Non-sensitive runtime config is split into a separate ConfigMap.
env:
- name: INTERNAL_SHARED_DB_URL
valueFrom:
secretKeyRef:
name: astra-db-credentials
key: INTERNAL_SHARED_DB_URL
- name: INTERNAL_SHARED_DB_USER
valueFrom:
secretKeyRef:
name: astra-db-credentials
key: INTERNAL_SHARED_DB_USER
- name: INTERNAL_SHARED_DB_PASSWORD
valueFrom:
secretKeyRef:
name: astra-db-credentials
key: INTERNAL_SHARED_DB_PASSWORD
Network path from pod → server: AKS egresses from the internal-platform-subnet; the Private DNS zone aucert-internal.postgres.database.azure.com is linked to the VNet, so the FQDN resolves to a private IP. No NAT, no public traffic, no bastion.
Connecting for local development
Local dev runs a real Postgres in Docker — no port-forward or bastion needed.
services:
postgres:
image: postgres:18.3-alpine
environment:
POSTGRES_DB: aucert
POSTGRES_USER: aucert
POSTGRES_PASSWORD: aucert_dev
ports: ["5432:5432"]
volumes: [pgdata:/var/lib/postgresql/data]
healthcheck:
test: ["CMD-SHELL", "pg_isready -U aucert"]
interval: 5s
Tilt (Tiltfile) starts the compose stack alongside the backend. The image is pinned to postgres:18.3-alpine to match production's engine version — keep them aligned so SQL dialect behavior matches.
psql postgresql://aucert:aucert_dev@localhost:5432/aucert
ORM: Exposed 0.58.0
The Kotlin backend uses Exposed as the primary persistence layer. It is wired into two databases (internal_shared_db and astra_db) through a single DatabaseFactory that also owns the HikariCP pools and runs Flyway migrations on startup.
DataSource + Exposed wiring
override fun init() {
val sharedDs = hikariDataSource(
name = "internal-shared",
url = requireEnv("INTERNAL_SHARED_DB_URL"),
user = requireEnv("INTERNAL_SHARED_DB_USER"),
password = requireEnv("INTERNAL_SHARED_DB_PASSWORD"),
)
internalSharedDb = Database.connect(sharedDs)
runMigrations(sharedDs, "internal-shared")
val astraDs = hikariDataSource(name = "astra", /* … */)
astraDb = Database.connect(astraDs)
runMigrations(astraDs, "astra")
}
private fun hikariDataSource(name: String, url: String, user: String, password: String) =
HikariDataSource(HikariConfig().apply {
poolName = "hikari-$name"
jdbcUrl = url; username = user; this.password = password
driverClassName = "org.postgresql.Driver"
maximumPoolSize = 10
isAutoCommit = false
transactionIsolation = "TRANSACTION_REPEATABLE_READ"
validate()
})
Conventions to honor when wiring a new database:
- Read credentials with
requireEnv(...)— a missing env var must fail-fast, not fall back. - Set
transactionIsolation = "TRANSACTION_REPEATABLE_READ"andisAutoCommit = false. These are project defaults; do not override per-service. - Call
runMigrations(dataSource, "<migration-dir>")immediately afterDatabase.connectso Flyway uses the same pool.
Declaring a table
Exposed tables are Kotlin object singletons — they describe schema, not rows. Use UUIDTable("<name>") as the base for new tables; use jsonb(...) with kotlinx-serialization encoders for JSON columns and array<String>(...) for native text[] columns.
object AgentTable : UUIDTable("agents") {
val funName = varchar("fun_name", 32).uniqueIndex()
val email = varchar("email", 120).uniqueIndex()
val role = varchar("role", 32)
val tags = array<String>("tags", TextColumnType()).default(emptyList())
val runtimeConfig = jsonb(
"runtime_config",
{ Json.encodeToString(JsonObject.serializer(), it) },
{ Json.decodeFromString(JsonObject.serializer(), it) },
)
val status = varchar("status", 16).default("provisioning")
val createdAt = timestampWithTimeZone("created_at")
}
Writing a repository
Always use the project's suspended-transaction wrapper dbFactory.dbQuery(db) { ... } — never call Exposed's top-level transaction { } directly. The wrapper binds the IO dispatcher and the specific Database handle so the right connection pool is used.
override suspend fun findAll(role: String?, status: String?, tech: String?, tag: String?): List<Agent> =
dbFactory.dbQuery(dbFactory.astraDb) {
val query = AgentTable.selectAll().orderBy(AgentTable.funName to SortOrder.ASC)
role?.let { r -> query.andWhere { AgentTable.role eq r } }
status?.let { s -> query.andWhere { AgentTable.status eq s } }
tech?.let { t -> query.andWhere { AgentTable.tech eq t } }
tag?.let { t -> query.andWhere { arrayContainsElement(AgentTable.tags, t) } }
query.map { it.toAgent() }
}
Inserts use insertAndGetId { it[col] = value }, returning the generated UUID in one roundtrip. Convert kotlinx.datetime.Instant to OffsetDateTime at the DB boundary (Exposed's timestampWithTimeZone expects the Java type):
override suspend fun store(token: AgentToken): AgentToken =
dbFactory.dbQuery(dbFactory.astraDb) {
val now = OffsetDateTime.now(ZoneOffset.UTC)
val id = AgentTokenTable.insertAndGetId {
it[agentId] = UUID.fromString(token.agentId)
it[encryptedToken] = token.encryptedToken
it[issuedAt] = token.issuedAt?.let { ia ->
OffsetDateTime.ofInstant(ia.toJavaInstant(), ZoneOffset.UTC)
} ?: now
it[status] = token.status
}
token.copy(id = id.value.toString())
}
Raw JDBC: the spec-agent escape hatch
A parallel PostgresClient (raw JDBC + HikariCP) exists in internal/backend/src/main/kotlin/dev/aucert/internal/agents/shared/clients/PostgresClient.kt. It is documented as the spec-agent path because that code needs primitives Exposed does not expose cleanly:
pg_advisory_lock/pg_advisory_unlockfor single-writer semantics on a topic- A
transaction { tx -> ... }block that pins one physical connection so nestedquery/execute/ advisory lock calls share it - Hand-tuned SQL (e.g. window functions) that is awkward to express in the Exposed DSL
PostgresClient in new codeUnless you genuinely need advisory locks or hand-tuned SQL, use Exposed through DatabaseFactory. Mixing JDBC and Exposed in the same bounded context will drift over time.
Migrations
Flyway 11.4.0 runs the four migration sets under infra/migrations/. Each set is an independent Flyway history (its own flyway_schema_history table in its own database), so version numbers never collide across bounded contexts.
| Set | Path | Files | Runs where |
|---|---|---|---|
internal-shared | infra/migrations/internal-shared/ | V001–V006 | On backend startup via DatabaseFactory.runMigrations(...) |
astra | infra/migrations/astra/ | V001–V010 | On backend startup |
specs | infra/migrations/specs/ | V001–V007 | Flyway pod in .github/workflows/deploy-astra.yml |
shared-kb | infra/migrations/shared-kb/ | V001–V009 (includes V001__enable_pgvector.sql) | Flyway pod in deploy workflow |
baselineOnMigrate(true) is set in DatabaseFactory.kt:74 — the first run against an empty database creates the baseline without replaying any migrations. Once baselined, migrations are strictly forward-only.
See How to create a database migration for the end-to-end task workflow (file naming, port-forward testing, CI behaviour).
Rotating credentials
Rotation today is manual end-to-end. All four steps must succeed for pods to resume serving requests; plan a short maintenance window.
Steps:
-
Taint the Terraform password resource, then apply:
cd infra/terraform/foundation
terraform taint random_password.internal_pg_password
terraform applyTerraform writes the new value to
internal-pg-admin-passwordin Key Vault and calls the Azure API to update the server admin password atomically. -
Re-run the K8s secret sync to refresh
astra-db-credentials:./tools/scripts/setup-astra-secrets.shAnswer
ywhen prompted to overwriteastra-db-credentials. Leaveastra-secretsalone — the encryption key must never rotate (see the warning printed by the script). -
Rolling-restart every Deployment that consumes the secret:
kubectl rollout restart deploy/astra-backend -n internal-platform
kubectl rollout status deploy/astra-backend -n internal-platform -
Repeat for the product PG using
random_password.pg_passwordinenvironments/dev/once the product backend is deployed.
ENCRYPTION_MASTER_KEYThe Astra Token Vault encrypts agent platform tokens with this key. Rotating it makes every existing encrypted token permanently undecryptable — setup-astra-secrets.sh will warn you. Rotation requires a data-migration plan, which does not exist yet.
Guardrails
GitHub Actions enforce part of the contract; several gaps remain.
| Enforced | Where | What it does |
|---|---|---|
| Context-drift gate | .github/workflows/context-drift-check.yml | Blocks PRs that change infra/terraform/** without updating infra/.context/*.md |
| Auto-apply Flyway on merge | .github/workflows/deploy-astra.yml (lines 146–176) | Creates a ConfigMap from SQL files and runs a Flyway pod in the cluster |
| Smoke test after deploy | Same workflow | HTTP health check against astra.aucert.dev |
| Approved-spec immutability | context-drift-check.yml | Blocks direct edits to docs/specs/approved/ |
Gaps — explicitly not enforced:
- No
terraform plan/terraform validateruns on PRs. Drift betweenmainand deployed state is invisible until someone applies locally. - No schema drift detection (no
pg_dumpdiff against Flyway history). - No pre-merge SQL lint or migration-naming check.
- No approval gate on migrations — they auto-apply on push to
main.
Closing any of these is a tracked future task; when you ship one, update this table.
Integration testing
Integration tests use Testcontainers with postgres:18.3-alpine — the same engine version as production. The pattern is in internal/backend/src/test/kotlin/dev/aucert/internal/agents/shared/clients/PostgresClientTest.kt:
@Testcontainers
class PostgresClientTest {
companion object {
@Container @JvmStatic
val pg = PostgreSQLContainer("postgres:18.3-alpine")
.withDatabaseName("w1c_test").withUsername("test").withPassword("test")
}
@BeforeTest fun setup() = runBlocking {
ds = HikariDataSource(HikariConfig().apply {
jdbcUrl = pg.jdbcUrl; username = pg.username; password = pg.password
driverClassName = "org.postgresql.Driver"
maximumPoolSize = 5
transactionIsolation = "TRANSACTION_REPEATABLE_READ"
})
client = PostgresClient(ds)
}
}
Conventions:
- One container per test class (
@Containeron acompanion object) for isolation between classes. - Keep
postgres:18.3-alpinepinned — if prod moves again, update here at the same time. TESTCONTAINERS_RYUK_DISABLED=trueis set inbuild.gradle.ktsbecause Ryuk's docker-java client doesn't honor our pinned API version on modern daemons. The JUnit 5@Testcontainersextension handles container teardown without it.- On macOS, the test task auto-detects the Docker Desktop socket at
~/.docker/run/docker.sock— no need to setDOCKER_HOSTby hand.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
Connection refused from pod | Private DNS zone not linked to the VNet, or pod not in a subnet with access | az network private-dns link vnet list -g aucert-foundation-rg -z aucert-internal.postgres.database.azure.com |
FATAL: password authentication failed after a Terraform apply | Secret rotated in Key Vault but K8s secret not re-synced | Re-run ./tools/scripts/setup-astra-secrets.sh, then kubectl rollout restart the affected Deployment |
ERROR: extension "X" is not allow-listed | Extension not in azure.extensions server parameter | Add to azurerm_postgresql_flexible_server_configuration.internal_pg_extensions and terraform apply before the migration |
Flyway Migration checksum mismatch | Someone edited an applied migration | Revert the edit. Never modify applied migrations — create a new V{next}__...sql instead |
Backend readiness probe fails with SELECT 1 timeout | Hikari pool exhausted or network partition | kubectl logs deploy/astra-backend — look for HikariPool-1 - Timeout failure |
What's next
- How to create a database migration — task-oriented runbook for writing a new
V{NNN}__…sql - Secrets management — full Key Vault inventory and K8s secret wiring
- Azure resource topology — where PG sits in the wider Azure diagram
- Terraform three-tier architecture — which tier owns which PG instance
- ADR-005: PostgreSQL + JSONB for the Knowledge Graph — design rationale for using PG as the KG store