Azure AI Foundry usage guide

This guide covers how to call the LLM models deployed in Azure AI Foundry from different environments.

info

All Foundry endpoints use OpenAI-compatible API format. Any library that works with OpenAI works with Foundry — just change the base URL and auth. The model parameter in API calls equals the deployment name, not the underlying model name.

Quick reference

Property	Value
Resource name	`aucert-ai`
Resource group	`aucert-foundation-rg`
Region	West US
Endpoint	`https://aucert-ai.cognitiveservices.azure.com/`
API key location	Key Vault `aucertdev-kv-41e0x5`, secret `foundry-api-key`
Auth methods	API key (local dev, CI) or Managed Identity (AKS pods)
API format	OpenAI-compatible (`/openai/deployments/{name}/chat/completions`)
Terraform	`infra/terraform/foundation/foundry.tf`
ADR	ADR-008

Deployed models

All models are Global Standard (serverless) — zero idle cost, pay-per-token only.

Deployment names exactly match the Foundry model name (case + dots) — required for opencode and portal triage.

Deployment name	Model	Pipeline layer	API endpoint	Input $/1M	Output $/1M
`gpt-5.1-codex`	GPT-5.1-Codex	Decision (L4) — retired from routing	`/responses` only	~$2.00	~$8.00
`gpt-5.4`	GPT-5.4	Decision standard (L4)	`/chat/completions`	~$4.00	~$16.00
`gpt-5.3-codex`	GPT-5.3-Codex	Decision code-specific (L4)	`/chat/completions`	~$5.00	~$20.00
`Kimi-K2.6`	Kimi K2.6	Generation (L1) + Reporting (L5 placeholder)	`/chat/completions`	~$0.28	~$0.77
`DeepSeek-V3.2`	DeepSeek V3.2	Analysis (L3)	`/chat/completions`	~$0.14	~$0.42

All are "Direct from Azure" — covered by Founders Hub credits. Estimated total: $55-135/month.

Model-specific API behavior

GPT-5.1-Codex: Does NOT support /chat/completions (chatCompletion: false). Uses the Responses API (/responses). The backend LLM adapter must use the Responses API format for this model.
Kimi K2.6: Thinking model. Responses go into reasoning_content (not content). Use higher max_tokens (at least 100+) to allow reasoning to complete before generating output.

Access pattern 1: From your laptop (local dev)

Retrieve the API key from Key Vault and export as environment variables. Never commit API keys — use .env.local (gitignored).

# Get API key from Key Vault
export FOUNDRY_API_KEY=$(az keyvault secret show \
  --vault-name aucertdev-kv-41e0x5 \
  --name foundry-api-key \
  --query value -o tsv)

export FOUNDRY_ENDPOINT="https://aucert-ai.cognitiveservices.azure.com/"

Network path: laptop → public internet → Foundry endpoint.

curl

curl -s -X POST \
  "${FOUNDRY_ENDPOINT}openai/deployments/gpt-5.1-codex/chat/completions?api-version=2024-10-21" \
  -H "Content-Type: application/json" \
  -H "api-key: ${FOUNDRY_API_KEY}" \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":50}'

Python (openai SDK)

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=os.environ["FOUNDRY_API_KEY"],
    api_version="2024-10-21",
    azure_endpoint=os.environ["FOUNDRY_ENDPOINT"],
)

response = client.chat.completions.create(
    model="gpt-5.1-codex",  # deployment name, not model name
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Kotlin (Ktor HttpClient)

val response = httpClient.post("${foundryEndpoint}openai/deployments/gpt-5.1-codex/chat/completions") {
    parameter("api-version", "2024-10-21")
    header("api-key", foundryApiKey)
    contentType(ContentType.Application.Json)
    setBody("""{"messages":[{"role":"user","content":"Hello"}],"max_tokens":50}""")
}

Access pattern 2: From AKS pods (production path)

Two options. Managed Identity is recommended — no API key rotation needed.

Network path: pod → AKS egress → public internet → Foundry. Latency ~2-5ms network + 1-10s inference.

Option A: Managed Identity (recommended)

The AKS kubelet identity already has Cognitive Services User role on the Foundry resource (assigned in foundry.tf). Pods authenticate automatically via DefaultAzureCredential.

// build.gradle.kts: implementation("com.azure:azure-identity:1.14.0")
import com.azure.identity.DefaultAzureCredentialBuilder

val credential = DefaultAzureCredentialBuilder().build()
val token = credential.getToken(
    TokenRequestContext().addScopes("https://cognitiveservices.azure.com/.default")
).block()

// Use bearer token instead of api-key header
httpClient.post("${foundryEndpoint}openai/deployments/${deploymentName}/chat/completions") {
    parameter("api-version", "2024-10-21")
    header("Authorization", "Bearer ${token.token}")
    contentType(ContentType.Application.Json)
    setBody(requestBody)
}

Option B: API key via K8s secret

If Managed Identity is not available, inject the API key from Key Vault via External Secrets Operator:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: foundry-credentials
  namespace: aucert-dev
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: azure-keyvault
    kind: ClusterSecretStore
  target:
    name: foundry-credentials
  data:
    - secretKey: FOUNDRY_API_KEY
      remoteRef:
        key: foundry-api-key

Then reference in your deployment:

env:
  - name: FOUNDRY_API_KEY
    valueFrom:
      secretKeyRef:
        name: foundry-credentials
        key: FOUNDRY_API_KEY

Access pattern 3: From CI/CD (GitHub Actions)

Store the API key as a GitHub Actions secret FOUNDRY_API_KEY. Use in workflow steps:

- name: Smoke test Foundry models
  env:
    FOUNDRY_API_KEY: ${{ secrets.FOUNDRY_API_KEY }}
    FOUNDRY_ENDPOINT: https://aucert-ai.cognitiveservices.azure.com/
  run: |
    for MODEL in gpt-5.1-codex gpt-5.4 gpt-5.3-codex Kimi-K2.6 DeepSeek-V3.2; do
      STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
        "${FOUNDRY_ENDPOINT}openai/deployments/${MODEL}/chat/completions?api-version=2024-10-21" \
        -H "Content-Type: application/json" \
        -H "api-key: ${FOUNDRY_API_KEY}" \
        -d '{"messages":[{"role":"user","content":"ping"}],"max_tokens":5}')
      echo "$MODEL: HTTP $STATUS"
      [ "$STATUS" -eq 200 ] || exit 1
    done

Access pattern 4: Private endpoint (future)

NOT CONFIGURED

Private endpoints are not set up. Implement when an enterprise customer requires traffic to stay within the Azure backbone.

What private endpoints do: Foundry traffic routes through a private IP in your VNet instead of the public internet. This satisfies data residency and network isolation requirements.

Impact when enabled: local laptop access and CI/CD access break unless VPN or self-hosted GitHub runners are configured. Only enable when a customer's security policy requires it.

Conceptual Terraform (reference only — do NOT apply):

# resource "azurerm_private_endpoint" "foundry_pe" {
#   name                = "aucert-foundry-pe"
#   location            = azurerm_resource_group.main.location
#   resource_group_name = azurerm_resource_group.main.name
#   subnet_id           = azurerm_subnet.keyvault.id
#
#   private_service_connection {
#     name                           = "foundry-connection"
#     private_connection_resource_id = azurerm_cognitive_account.foundry.id
#     is_manual_connection           = false
#     subresource_names              = ["account"]
#   }
# }

Access pattern 5: From Astra agents / Daytona sandboxes

Same as AKS pods (pattern 2). Agents running in the cluster use Managed Identity automatically.

Cost consideration: route bulk analysis to DeepSeek-V3.2 (~~$0.14/1M input tokens). Use gpt-5.4 (~~$4.00/1M) only for reasoning tasks that require high quality.

API compatibility note

All Foundry endpoints use OpenAI-compatible format:

POST {endpoint}openai/deployments/{deployment-name}/chat/completions?api-version=2024-10-21

The model parameter in API requests is the deployment name. As of 2026-05-03 the deployment names match the underlying Foundry model names exactly (case + dots), so what was previously a dash-vs-dot footgun no longer applies — gpt-5.1-codex is both the model name and the deployment name.

Any library that works with OpenAI (openai, langchain, litellm, etc.) works with Foundry by changing:

Base URL to the Foundry endpoint
API key to the Foundry API key
Model name to the deployment name

Troubleshooting

HTTP code	Cause	Fix
400 "operation unsupported"	Model does not support `/chat/completions`	Check deployment capabilities: `az cognitiveservices account deployment show --name aucert-ai --resource-group aucert-foundation-rg --deployment-name <name> --query properties.capabilities`. GPT-5.1-Codex requires `/responses` API.
401	Invalid or missing API key	Re-retrieve key from Key Vault: `az keyvault secret show --vault-name aucertdev-kv-41e0x5 --name foundry-api-key`
403	Managed Identity lacks `Cognitive Services User` role	Check RBAC: `az role assignment list --scope $(terraform output -raw foundry_id) -o table`
404	Wrong deployment name or API version	Verify deployment exists: `az cognitiveservices account deployment list --name aucert-ai --resource-group aucert-foundation-rg -o table`
429	Rate limit / TPM quota exceeded	Increase `capacity` in `foundry.tf` and `terraform apply`. Default is 10K TPM.
500	Foundry service error	Retry with exponential backoff. Check Azure status.
`content: null`	Thinking model (Kimi K2.6)	Response is in `reasoning_content`, not `content`. Increase `max_tokens` to 100+ so the model finishes reasoning and generates output.

Quick reference​

Deployed models​

Access pattern 1: From your laptop (local dev)​

curl​

Python (openai SDK)​

Kotlin (Ktor HttpClient)​

Access pattern 2: From AKS pods (production path)​

Option A: Managed Identity (recommended)​

Option B: API key via K8s secret​

Access pattern 3: From CI/CD (GitHub Actions)​

Access pattern 4: Private endpoint (future)​

Access pattern 5: From Astra agents / Daytona sandboxes​

API compatibility note​

Troubleshooting​