Skip to main content

Azure AI Foundry usage guide

This guide covers how to call the LLM models deployed in Azure AI Foundry from different environments.

info

All Foundry endpoints use OpenAI-compatible API format. Any library that works with OpenAI works with Foundry — just change the base URL and auth. The model parameter in API calls equals the deployment name, not the underlying model name.

Quick reference

PropertyValue
Resource nameaucert-ai
Resource groupaucert-foundation-rg
RegionWest US
Endpointhttps://aucert-ai.cognitiveservices.azure.com/
API key locationKey Vault aucertdev-kv-41e0x5, secret foundry-api-key
Auth methodsAPI key (local dev, CI) or Managed Identity (AKS pods)
API formatOpenAI-compatible (/openai/deployments/{name}/chat/completions)
Terraforminfra/terraform/foundation/foundry.tf
ADRADR-008

Deployed models

All models are Global Standard (serverless) — zero idle cost, pay-per-token only.

Deployment names exactly match the Foundry model name (case + dots) — required for opencode and portal triage.

Deployment nameModelPipeline layerAPI endpointInput $/1MOutput $/1M
gpt-5.1-codexGPT-5.1-CodexDecision (L4) — retired from routing/responses only~$2.00~$8.00
gpt-5.4GPT-5.4Decision standard (L4)/chat/completions~$4.00~$16.00
gpt-5.3-codexGPT-5.3-CodexDecision code-specific (L4)/chat/completions~$5.00~$20.00
Kimi-K2.6Kimi K2.6Generation (L1) + Reporting (L5 placeholder)/chat/completions~$0.28~$0.77
DeepSeek-V3.2DeepSeek V3.2Analysis (L3)/chat/completions~$0.14~$0.42

All are "Direct from Azure" — covered by Founders Hub credits. Estimated total: $55-135/month.

Model-specific API behavior
  • GPT-5.1-Codex: Does NOT support /chat/completions (chatCompletion: false). Uses the Responses API (/responses). The backend LLM adapter must use the Responses API format for this model.
  • Kimi K2.6: Thinking model. Responses go into reasoning_content (not content). Use higher max_tokens (at least 100+) to allow reasoning to complete before generating output.

Access pattern 1: From your laptop (local dev)

Retrieve the API key from Key Vault and export as environment variables. Never commit API keys — use .env.local (gitignored).

# Get API key from Key Vault
export FOUNDRY_API_KEY=$(az keyvault secret show \
--vault-name aucertdev-kv-41e0x5 \
--name foundry-api-key \
--query value -o tsv)

export FOUNDRY_ENDPOINT="https://aucert-ai.cognitiveservices.azure.com/"

Network path: laptop → public internet → Foundry endpoint.

curl

curl -s -X POST \
"${FOUNDRY_ENDPOINT}openai/deployments/gpt-5.1-codex/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${FOUNDRY_API_KEY}" \
-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":50}'

Python (openai SDK)

from openai import AzureOpenAI

client = AzureOpenAI(
api_key=os.environ["FOUNDRY_API_KEY"],
api_version="2024-10-21",
azure_endpoint=os.environ["FOUNDRY_ENDPOINT"],
)

response = client.chat.completions.create(
model="gpt-5.1-codex", # deployment name, not model name
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Kotlin (Ktor HttpClient)

val response = httpClient.post("${foundryEndpoint}openai/deployments/gpt-5.1-codex/chat/completions") {
parameter("api-version", "2024-10-21")
header("api-key", foundryApiKey)
contentType(ContentType.Application.Json)
setBody("""{"messages":[{"role":"user","content":"Hello"}],"max_tokens":50}""")
}

Access pattern 2: From AKS pods (production path)

Two options. Managed Identity is recommended — no API key rotation needed.

Network path: pod → AKS egress → public internet → Foundry. Latency ~2-5ms network + 1-10s inference.

The AKS kubelet identity already has Cognitive Services User role on the Foundry resource (assigned in foundry.tf). Pods authenticate automatically via DefaultAzureCredential.

// build.gradle.kts: implementation("com.azure:azure-identity:1.14.0")
import com.azure.identity.DefaultAzureCredentialBuilder

val credential = DefaultAzureCredentialBuilder().build()
val token = credential.getToken(
TokenRequestContext().addScopes("https://cognitiveservices.azure.com/.default")
).block()

// Use bearer token instead of api-key header
httpClient.post("${foundryEndpoint}openai/deployments/${deploymentName}/chat/completions") {
parameter("api-version", "2024-10-21")
header("Authorization", "Bearer ${token.token}")
contentType(ContentType.Application.Json)
setBody(requestBody)
}

Option B: API key via K8s secret

If Managed Identity is not available, inject the API key from Key Vault via External Secrets Operator:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: foundry-credentials
namespace: aucert-dev
spec:
refreshInterval: 1h
secretStoreRef:
name: azure-keyvault
kind: ClusterSecretStore
target:
name: foundry-credentials
data:
- secretKey: FOUNDRY_API_KEY
remoteRef:
key: foundry-api-key

Then reference in your deployment:

env:
- name: FOUNDRY_API_KEY
valueFrom:
secretKeyRef:
name: foundry-credentials
key: FOUNDRY_API_KEY

Access pattern 3: From CI/CD (GitHub Actions)

Store the API key as a GitHub Actions secret FOUNDRY_API_KEY. Use in workflow steps:

- name: Smoke test Foundry models
env:
FOUNDRY_API_KEY: ${{ secrets.FOUNDRY_API_KEY }}
FOUNDRY_ENDPOINT: https://aucert-ai.cognitiveservices.azure.com/
run: |
for MODEL in gpt-5.1-codex gpt-5.4 gpt-5.3-codex Kimi-K2.6 DeepSeek-V3.2; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
"${FOUNDRY_ENDPOINT}openai/deployments/${MODEL}/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${FOUNDRY_API_KEY}" \
-d '{"messages":[{"role":"user","content":"ping"}],"max_tokens":5}')
echo "$MODEL: HTTP $STATUS"
[ "$STATUS" -eq 200 ] || exit 1
done

Access pattern 4: Private endpoint (future)

NOT CONFIGURED

Private endpoints are not set up. Implement when an enterprise customer requires traffic to stay within the Azure backbone.

What private endpoints do: Foundry traffic routes through a private IP in your VNet instead of the public internet. This satisfies data residency and network isolation requirements.

Impact when enabled: local laptop access and CI/CD access break unless VPN or self-hosted GitHub runners are configured. Only enable when a customer's security policy requires it.

Conceptual Terraform (reference only — do NOT apply):

# resource "azurerm_private_endpoint" "foundry_pe" {
# name = "aucert-foundry-pe"
# location = azurerm_resource_group.main.location
# resource_group_name = azurerm_resource_group.main.name
# subnet_id = azurerm_subnet.keyvault.id
#
# private_service_connection {
# name = "foundry-connection"
# private_connection_resource_id = azurerm_cognitive_account.foundry.id
# is_manual_connection = false
# subresource_names = ["account"]
# }
# }

Access pattern 5: From Astra agents / Daytona sandboxes

Same as AKS pods (pattern 2). Agents running in the cluster use Managed Identity automatically.

Cost consideration: route bulk analysis to DeepSeek-V3.2 ($0.14/1M input tokens). Use gpt-5.4 ($4.00/1M) only for reasoning tasks that require high quality.

API compatibility note

All Foundry endpoints use OpenAI-compatible format:

POST {endpoint}openai/deployments/{deployment-name}/chat/completions?api-version=2024-10-21

The model parameter in API requests is the deployment name. As of 2026-05-03 the deployment names match the underlying Foundry model names exactly (case + dots), so what was previously a dash-vs-dot footgun no longer applies — gpt-5.1-codex is both the model name and the deployment name.

Any library that works with OpenAI (openai, langchain, litellm, etc.) works with Foundry by changing:

  1. Base URL to the Foundry endpoint
  2. API key to the Foundry API key
  3. Model name to the deployment name

Troubleshooting

HTTP codeCauseFix
400 "operation unsupported"Model does not support /chat/completionsCheck deployment capabilities: az cognitiveservices account deployment show --name aucert-ai --resource-group aucert-foundation-rg --deployment-name <name> --query properties.capabilities. GPT-5.1-Codex requires /responses API.
401Invalid or missing API keyRe-retrieve key from Key Vault: az keyvault secret show --vault-name aucertdev-kv-41e0x5 --name foundry-api-key
403Managed Identity lacks Cognitive Services User roleCheck RBAC: az role assignment list --scope $(terraform output -raw foundry_id) -o table
404Wrong deployment name or API versionVerify deployment exists: az cognitiveservices account deployment list --name aucert-ai --resource-group aucert-foundation-rg -o table
429Rate limit / TPM quota exceededIncrease capacity in foundry.tf and terraform apply. Default is 10K TPM.
500Foundry service errorRetry with exponential backoff. Check Azure status.
content: nullThinking model (Kimi K2.6)Response is in reasoning_content, not content. Increase max_tokens to 100+ so the model finishes reasoning and generates output.