Troubleshooting
Common errors and their solutions, organized by platform.
Backend (Kotlin)
"Connection refused" — Database not accessible
Symptom: org.postgresql.util.PSQLException: Connection to localhost:5432 refused
Cause: PostgreSQL is not running or not port-forwarded.
Fix:
# Port-forward the product database
kubectl port-forward -n aucert-dev svc/product-pg 5432:5432
# Or the internal platform database
kubectl port-forward -n internal-platform svc/internal-pg 5433:5432
Ensure your DATABASE_URL environment variable matches the forwarded port.
"Class not found" errors during build
Symptom: ClassNotFoundException or NoClassDefFoundError at compile time.
Fix:
cd backend/platform
./gradlew clean build
If using generated Protobuf classes:
bazel build //proto:all
./gradlew clean build
Gradle daemon issues
Symptom: Build hangs, OOM errors, or stale cache.
Fix:
./gradlew --stop
rm -rf ~/.gradle/caches/
./gradlew clean build
Frontend (TypeScript)
pnpm install fails with peer dependency errors
Symptom: ERR_PNPM_PEER_DEP_ISSUES during install.
Fix:
# Clear the pnpm store and reinstall
pnpm store prune
rm -rf node_modules
pnpm install
Check that your Node.js version matches (22+):
node --version # Should be v22.x
API calls fail in development (proxy issues)
Symptom: ERR_CONNECTION_REFUSED or 502 errors when the frontend calls the API.
Cause: The backend is not running on the expected port.
Fix:
- Ensure the backend is running on port 8080
- Check the proxy config in
next.config.ts - Verify with:
curl http://localhost:8080/health
TypeScript type errors after proto changes
Symptom: Type errors referencing generated types in schemas/generated/.
Fix:
# Regenerate from proto
bazel build //proto:all
# Restart the TS server in your IDE
# VS Code: Cmd+Shift+P → "TypeScript: Restart TS Server"
Kubernetes / AKS
Pod stuck in Pending
Symptom: Pod stays in Pending state indefinitely.
Diagnose:
kubectl describe pod -n <namespace> <pod-name>
Common causes:
- Insufficient resources: Node pool doesn't have enough CPU/memory. Check
Eventssection. - PVC not bound: Persistent volume claim waiting for storage.
- Node affinity: Pod can't be scheduled to any available node.
Fix: Scale up the node pool or reduce resource requests in the Helm values file.
Pod in CrashLoopBackOff
Symptom: Pod starts, crashes, restarts repeatedly.
Diagnose:
# Check crash logs
kubectl logs -n <namespace> <pod-name> --previous
# Check environment variables
kubectl exec -n <namespace> <pod-name> -- env | sort
Common causes:
- Missing environment variables (especially database URLs)
- Database connection failure (PG not accessible from pod)
- Application startup error (check logs for stack trace)
Pod in ImagePullBackOff
Symptom: Pod can't pull the Docker image from ACR.
Fix:
# Re-authenticate with ACR
az acr login --name aucertacr41e0x5
# Verify the image exists
az acr repository show-tags --name aucertacr41e0x5 --repository <image-name>
# Check AKS has ACR pull permissions
az aks check-acr --name aucert-aks --resource-group aucert-foundation-rg \
--acr aucertacr41e0x5.azurecr.io
Helm upgrade fails
Symptom: helm upgrade returns an error about conflicting resources.
Fix:
# Check current release status
helm list -n <namespace>
helm history <release-name> -n <namespace>
# If stuck in "pending-upgrade" state
helm rollback <release-name> -n <namespace>
Terraform
State lock — "Error acquiring the state lock"
Symptom: Terraform can't acquire the state lock.
Diagnose: Another terraform apply may be running. Check with your team.
Fix (only if confirmed no other process is running):
terraform force-unlock <lock-id>
Only force-unlock if you are certain no other process is running. Forcing an unlock while another apply is in progress can corrupt state.
"Resource already exists" on apply
Symptom: Terraform tries to create a resource that already exists (created manually or by another process).
Fix:
# Import the existing resource into state
terraform import <resource_type>.<name> <azure-resource-id>
# Then plan to verify no changes
terraform plan
Provider authentication errors
Fix:
# Re-authenticate with Azure
az login
# Set the correct subscription
az account set --subscription "<subscription-id>"
# Verify
az account show
Database / Flyway
Migration failed — "Detected resolved migration not applied"
Symptom: Flyway detects a migration file that should have been applied before an already-applied one.
Cause: Migration version numbers are out of sequence (e.g., V003 was applied but V002 was added later).
Fix: Ensure migration version numbers are strictly sequential. If a migration was skipped, either:
- Apply it manually and update the
flyway_schema_historytable - Use
-baselineOnMigrate=true(already set in our CI workflow)
"Connection refused" from Flyway pod
Cause: The Flyway pod can't reach PostgreSQL on the private VNet.
Check:
# Verify the PG service is accessible from within the cluster
kubectl exec -n internal-platform -it <any-pod> -- \
pg_isready -h internal-pg -p 5432
Ensure the Flyway pod is running in the correct namespace with VNet access.
CI / GitHub Actions
Workflow not triggering on push
Check:
- Verify the
pathsfilter matches your changed files - Check that the workflow file is on the
mainbranch - Look at the Actions tab for skipped runs
# View recent workflow runs
gh run list --workflow=ci.yml --limit=5
OIDC authentication failure in CI
Symptom: azure/login step fails with OIDC token error.
Check:
AZURE_CLIENT_ID,AZURE_TENANT_ID,AZURE_SUBSCRIPTION_IDsecrets are set- App registration federated credentials match the repo/branch
permissions: id-token: writeis set in the workflow
What's next
- How to debug AKS pods — Detailed pod debugging
- How to deploy to dev — Deployment process
- Secrets management — Credential issues