Azure AI Foundry usage guide
This guide covers how to call the LLM models deployed in Azure AI Foundry from different environments.
All Foundry endpoints use OpenAI-compatible API format. Any library that works with OpenAI works with Foundry — just change the base URL and auth. The model parameter in API calls equals the deployment name, not the underlying model name.
Quick reference
| Property | Value |
|---|---|
| Resource name | aucert-ai |
| Resource group | aucert-foundation-rg |
| Region | West US |
| Endpoint | https://aucert-ai.cognitiveservices.azure.com/ |
| API key location | Key Vault aucertdev-kv-41e0x5, secret foundry-api-key |
| Auth methods | API key (local dev, CI) or Managed Identity (AKS pods) |
| API format | OpenAI-compatible (/openai/deployments/{name}/chat/completions) |
| Terraform | infra/terraform/foundation/foundry.tf |
| ADR | ADR-008 |
Deployed models
All models are Global Standard (serverless) — zero idle cost, pay-per-token only.
Deployment names exactly match the Foundry model name (case + dots) — required for opencode and portal triage.
| Deployment name | Model | Pipeline layer | API endpoint | Input $/1M | Output $/1M |
|---|---|---|---|---|---|
gpt-5.1-codex | GPT-5.1-Codex | Decision (L4) — retired from routing | /responses only | ~$2.00 | ~$8.00 |
gpt-5.4 | GPT-5.4 | Decision standard (L4) | /chat/completions | ~$4.00 | ~$16.00 |
gpt-5.3-codex | GPT-5.3-Codex | Decision code-specific (L4) | /chat/completions | ~$5.00 | ~$20.00 |
Kimi-K2.6 | Kimi K2.6 | Generation (L1) + Reporting (L5 placeholder) | /chat/completions | ~$0.28 | ~$0.77 |
DeepSeek-V3.2 | DeepSeek V3.2 | Analysis (L3) | /chat/completions | ~$0.14 | ~$0.42 |
All are "Direct from Azure" — covered by Founders Hub credits. Estimated total: $55-135/month.
- GPT-5.1-Codex: Does NOT support
/chat/completions(chatCompletion: false). Uses the Responses API (/responses). The backend LLM adapter must use the Responses API format for this model. - Kimi K2.6: Thinking model. Responses go into
reasoning_content(notcontent). Use highermax_tokens(at least 100+) to allow reasoning to complete before generating output.
Access pattern 1: From your laptop (local dev)
Retrieve the API key from Key Vault and export as environment variables. Never commit API keys — use .env.local (gitignored).
# Get API key from Key Vault
export FOUNDRY_API_KEY=$(az keyvault secret show \
--vault-name aucertdev-kv-41e0x5 \
--name foundry-api-key \
--query value -o tsv)
export FOUNDRY_ENDPOINT="https://aucert-ai.cognitiveservices.azure.com/"
Network path: laptop → public internet → Foundry endpoint.
curl
curl -s -X POST \
"${FOUNDRY_ENDPOINT}openai/deployments/gpt-5.1-codex/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${FOUNDRY_API_KEY}" \
-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":50}'
Python (openai SDK)
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.environ["FOUNDRY_API_KEY"],
api_version="2024-10-21",
azure_endpoint=os.environ["FOUNDRY_ENDPOINT"],
)
response = client.chat.completions.create(
model="gpt-5.1-codex", # deployment name, not model name
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
Kotlin (Ktor HttpClient)
val response = httpClient.post("${foundryEndpoint}openai/deployments/gpt-5.1-codex/chat/completions") {
parameter("api-version", "2024-10-21")
header("api-key", foundryApiKey)
contentType(ContentType.Application.Json)
setBody("""{"messages":[{"role":"user","content":"Hello"}],"max_tokens":50}""")
}
Access pattern 2: From AKS pods (production path)
Two options. Managed Identity is recommended — no API key rotation needed.
Network path: pod → AKS egress → public internet → Foundry. Latency ~2-5ms network + 1-10s inference.
Option A: Managed Identity (recommended)
The AKS kubelet identity already has Cognitive Services User role on the Foundry resource (assigned in foundry.tf). Pods authenticate automatically via DefaultAzureCredential.
// build.gradle.kts: implementation("com.azure:azure-identity:1.14.0")
import com.azure.identity.DefaultAzureCredentialBuilder
val credential = DefaultAzureCredentialBuilder().build()
val token = credential.getToken(
TokenRequestContext().addScopes("https://cognitiveservices.azure.com/.default")
).block()
// Use bearer token instead of api-key header
httpClient.post("${foundryEndpoint}openai/deployments/${deploymentName}/chat/completions") {
parameter("api-version", "2024-10-21")
header("Authorization", "Bearer ${token.token}")
contentType(ContentType.Application.Json)
setBody(requestBody)
}
Option B: API key via K8s secret
If Managed Identity is not available, inject the API key from Key Vault via External Secrets Operator:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: foundry-credentials
namespace: aucert-dev
spec:
refreshInterval: 1h
secretStoreRef:
name: azure-keyvault
kind: ClusterSecretStore
target:
name: foundry-credentials
data:
- secretKey: FOUNDRY_API_KEY
remoteRef:
key: foundry-api-key
Then reference in your deployment:
env:
- name: FOUNDRY_API_KEY
valueFrom:
secretKeyRef:
name: foundry-credentials
key: FOUNDRY_API_KEY
Access pattern 3: From CI/CD (GitHub Actions)
Store the API key as a GitHub Actions secret FOUNDRY_API_KEY. Use in workflow steps:
- name: Smoke test Foundry models
env:
FOUNDRY_API_KEY: ${{ secrets.FOUNDRY_API_KEY }}
FOUNDRY_ENDPOINT: https://aucert-ai.cognitiveservices.azure.com/
run: |
for MODEL in gpt-5.1-codex gpt-5.4 gpt-5.3-codex Kimi-K2.6 DeepSeek-V3.2; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
"${FOUNDRY_ENDPOINT}openai/deployments/${MODEL}/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${FOUNDRY_API_KEY}" \
-d '{"messages":[{"role":"user","content":"ping"}],"max_tokens":5}')
echo "$MODEL: HTTP $STATUS"
[ "$STATUS" -eq 200 ] || exit 1
done
Access pattern 4: Private endpoint (future)
Private endpoints are not set up. Implement when an enterprise customer requires traffic to stay within the Azure backbone.
What private endpoints do: Foundry traffic routes through a private IP in your VNet instead of the public internet. This satisfies data residency and network isolation requirements.
Impact when enabled: local laptop access and CI/CD access break unless VPN or self-hosted GitHub runners are configured. Only enable when a customer's security policy requires it.
Conceptual Terraform (reference only — do NOT apply):
# resource "azurerm_private_endpoint" "foundry_pe" {
# name = "aucert-foundry-pe"
# location = azurerm_resource_group.main.location
# resource_group_name = azurerm_resource_group.main.name
# subnet_id = azurerm_subnet.keyvault.id
#
# private_service_connection {
# name = "foundry-connection"
# private_connection_resource_id = azurerm_cognitive_account.foundry.id
# is_manual_connection = false
# subresource_names = ["account"]
# }
# }
Access pattern 5: From Astra agents / Daytona sandboxes
Same as AKS pods (pattern 2). Agents running in the cluster use Managed Identity automatically.
Cost consideration: route bulk analysis to DeepSeek-V3.2 ($0.14/1M input tokens). Use $4.00/1M) only for reasoning tasks that require high quality.gpt-5.4 (
API compatibility note
All Foundry endpoints use OpenAI-compatible format:
POST {endpoint}openai/deployments/{deployment-name}/chat/completions?api-version=2024-10-21
The model parameter in API requests is the deployment name. As of 2026-05-03 the deployment names match the underlying Foundry model names exactly (case + dots), so what was previously a dash-vs-dot footgun no longer applies — gpt-5.1-codex is both the model name and the deployment name.
Any library that works with OpenAI (openai, langchain, litellm, etc.) works with Foundry by changing:
- Base URL to the Foundry endpoint
- API key to the Foundry API key
- Model name to the deployment name
Troubleshooting
| HTTP code | Cause | Fix |
|---|---|---|
| 400 "operation unsupported" | Model does not support /chat/completions | Check deployment capabilities: az cognitiveservices account deployment show --name aucert-ai --resource-group aucert-foundation-rg --deployment-name <name> --query properties.capabilities. GPT-5.1-Codex requires /responses API. |
| 401 | Invalid or missing API key | Re-retrieve key from Key Vault: az keyvault secret show --vault-name aucertdev-kv-41e0x5 --name foundry-api-key |
| 403 | Managed Identity lacks Cognitive Services User role | Check RBAC: az role assignment list --scope $(terraform output -raw foundry_id) -o table |
| 404 | Wrong deployment name or API version | Verify deployment exists: az cognitiveservices account deployment list --name aucert-ai --resource-group aucert-foundation-rg -o table |
| 429 | Rate limit / TPM quota exceeded | Increase capacity in foundry.tf and terraform apply. Default is 10K TPM. |
| 500 | Foundry service error | Retry with exponential backoff. Check Azure status. |
content: null | Thinking model (Kimi K2.6) | Response is in reasoning_content, not content. Increase max_tokens to 100+ so the model finishes reasoning and generates output. |