OpenAI Backend¶
The OpenAI backend runs your agent on top of the OpenAI Agents SDK — OpenAI's native agent runtime. It is automatically selected when model.provider is openai or azure_openai.
This guide is for backend-specific behaviour: the openai: configuration block, reasoning effort, budget caps, model fallback, structured output, and tracing. For shared concepts like tools, observability, and vector stores, see the dedicated guides for each. It is the sibling of the Claude Backend guide.
Quick start¶
A minimal azure_openai agent with three function tools — deterministic enough to verify with holodeck test.
# agent.yaml
name: warehouse-agent
model:
provider: azure_openai
name: gpt-5.4 # MUST match your Azure deployment name
endpoint: ${AZURE_OPENAI_ENDPOINT}
api_key: ${AZURE_OPENAI_API_KEY}
temperature: 0.0
instructions:
inline: "You are a warehouse assistant. Use the tools to answer stock questions."
tools:
- name: get_inventory
type: function
description: Look up catalog stock and unit price for a SKU.
file: tools/warehouse.py
function: get_inventory
test_cases:
- name: "Single-tool lookup"
input: "Is SKU WIDGET-1 in stock, and what does one cost?"
expected_tools: [get_inventory]
ground_truth: "WIDGET-1 is in stock (120 units) at $12.50 each."
# .env
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-azure-key
holodeck test run agent.yaml -n 1
# => PASS Single-tool lookup (tool calls: get_inventory)
For plain OpenAI instead, set model.provider: openai, name: gpt-4o-mini (drop endpoint), and supply OPENAI_API_KEY.
How it works¶
BackendSelector routes provider: openai and provider: azure_openai to the OpenAI Agents backend; Anthropic and Ollama route to the Claude backend. The backend runs the SDK Runner loop in-process (no per-turn subprocess), driving up to openai.max_turns agent iterations per call. The openai / agents SDK is imported lazily — only when an OpenAI-provider agent is actually selected — so non-OpenAI agents never pay its import cost. HoloDeck tools (function, vectorstore, hierarchical_document, MCP) are adapted onto the SDK's tool surface; structured output, budget, fallback, and tracing are layered on via the SDK's output-schema, RunHooks, model-wrapping, and TracingProcessor extension points.
The openai: block¶
All OpenAI-specific settings live under the top-level openai: block in agent.yaml. Every field is optional.
openai:
max_turns: 20
effort: high
| Field | Type | Default | Constraint | Meaning |
|---|---|---|---|---|
max_turns |
int | 20 |
≥ 1 | Maximum agent loop iterations passed to Runner.run. Shipped. |
effort |
enum | – | low|medium|high|max |
Reasoning effort for reasoning models (see Reasoning effort). Shipped. |
max_budget_usd |
float | – | > 0 | Hard cap on session spend in USD (see Budget). Shipped. |
fallback_model |
str | – | – | Model used on a retryable primary failure (see Fallback). Shipped. |
disallowed_tools |
list[str] | – | – | Tools removed from the resolved agent at build time. Shipped. |
session_memory_estimate_mib |
int | 100 |
50–2000 | Estimated peak resident memory per active turn; used by serve to derive the concurrency cap. The backend runs in-process (no per-turn subprocess), so this is lower than the Claude default. Config accepted; serve auto-sizing lands in a later release. |
max_concurrent_sessions |
int | – (derived) | 1–500 | Hard cap on concurrent active turns per serve instance. When unset, derived from the replica's memory limit ÷ session_memory_estimate_mib. Config accepted; serve enforcement lands in a later release. |
permissions.allowed_tools |
list[str] | null (all) |
– | Explicit tool allowlist. Config accepted; full enforcement nuance lands in a later release — prefer disallowed_tools today. |
permissions.disallowed_tools |
list[str] | – | – | Deny list; takes precedence over allowed_tools. Config accepted; prefer the top-level disallowed_tools today. |
i_understand_this_is_unsafe |
bool | false |
– | Acknowledges that hosted tools (e.g. code interpreter) permit server-side code execution. Config accepted; hosted tools are not yet shipped — see Coming soon. |
disable_default_hooks |
bool | false |
– | Disables HoloDeck's default credential-redaction output guardrail. Config accepted; default redaction guardrails are not yet shipped, so this is currently a no-op. OTel attribute redaction runs independently and is unaffected. |
disable_subprocess_env_scrub |
bool | false |
– | Disables env scrubbing for stdio MCP servers / shelling-out function tools. Config accepted; full enforcement lands in a later release. |
Honest status labels
Fields marked config accepted validate and load today but their runtime feature ships later. They are documented so you can author forward-compatible configs, not because the behaviour is live. The four Shipped runtime features (effort, max_budget_usd, fallback_model, disallowed_tools) plus max_turns are fully enforced now.
Reasoning effort¶
effort requests deeper internal reasoning from reasoning models (o-series and gpt-5+). It only affects reasoning models; on non-reasoning models it has no effect.
openai:
effort: high
The level maps onto the SDK's ReasoningEffort literal:
effort |
SDK ReasoningEffort |
|---|---|
low |
low |
medium |
medium |
high |
high |
max |
xhigh |
max → xhigh is an intentional deviation
HoloDeck exposes max as the strongest level; it maps to the SDK's xhigh rather than a literal max. This is deliberate so the HoloDeck vocabulary stays stable as the SDK's effort ceiling evolves.
Setting effort also requests reasoning summaries (summary="auto"), which populate the thinking field on the result. With effort unset, no summary is requested.
Budget¶
max_budget_usd caps total spend across a session. A RunHooks cost accountant prices every LLM response from the SDK's per-response token usage against a bundled, versioned per-model price table and accumulates the running total.
openai:
max_budget_usd: 0.50
- When the accumulated cost reaches the cap, the hook aborts the turn (
BackendBudgetExceededError), and the backend surfaces it on the standard error path (is_error/error_reason) with the partial response preserved — whatever assistant text the model produced before the cap tripped. - The budget is per session: one accountant is shared across every turn of a session, so the cap covers the whole conversation.
- An unknown model (e.g. an opaque Azure deployment name that doesn't resolve to a base model) logs a single warning and contributes no cost — enforcement is silently disabled for that model rather than crashing the run.
Fallback¶
fallback_model wraps the primary model so that a retryable upstream failure is re-issued once against the fallback.
openai:
fallback_model: gpt-4o-mini
- Retryable set (fixed): HTTP 429 (rate limit) and 5xx (server-side). Everything else — 400/401/403/404/422, connection/timeout errors, non-OpenAI exceptions — propagates unchanged; the fallback is never consulted.
- Ordering: if you enable the SDK runner's own retries, the primary retries exhaust first, then exactly one fallback attempt. No double-fallback, no fallback mid-retry — the wrapper itself never retries the fallback.
- Streaming: the wrapper falls back only if the primary stream fails before its first event. Once any event has been emitted, a later failure propagates unchanged (restarting on the fallback would replay already-delivered deltas).
- Tracing: both the primary and the fallback attempt open their own generation span, so both are visible in the trace.
For Azure, build the fallback as another deployment on the same endpoint/credentials.
Structured output¶
response_format constrains every final response to a JSON schema. It accepts a JSON-schema dict inline, a string path to a JSON file, or None. The resolved schema drives the SDK's structured-output path, and the parsed object is populated on the ExecutionResult.structured_output field — downstream graders json.loads the response.
response_format:
type: object
additionalProperties: false
required: [answer]
properties:
answer:
anyOf:
- type: number
- type: string
Portability: use anyOf, never oneOf
OpenAI structured outputs reject oneOf. Use anyOf instead (we hit this live against Azure OpenAI). The same anyOf schema also works on the Claude backend, so authoring with anyOf keeps your schema portable across both backends.
Strict mode is auto-enabled only when the schema already qualifies: a top-level type: object with additionalProperties: false and every declared property listed in required. HoloDeck never rewrites your schema to force strictness — if it doesn't qualify, the schema is still enforced via jsonschema validation, just without provider strict-mode guarantees.
Thinking¶
The thinking field is populated from a reasoning model's summaries, which are only requested when openai.effort is set (it implies summary="auto"). Without effort, or on a non-reasoning model, thinking is empty.
Tools¶
| Tool type | Behaviour on this backend |
|---|---|
function |
Python callables (sync or async) wrapped as SDK function tools. |
vectorstore |
Wraps the tool's .search(); surfaced to the model as {name}_search. Requires an embedding_provider. |
hierarchical_document |
Same wrapping pattern; surfaced as {name}_search. Requires an embedding_provider. |
The {name}_search naming keeps disallowed_tools portable across backends. For RAG configuration depth (chunking, hybrid search, databases), see Tools and Vector Stores rather than duplicating it here.
MCP¶
MCP tools map onto the SDK's MCP server classes by transport:
transport |
SDK server |
|---|---|
stdio |
MCPServerStdio |
sse |
MCPServerSse |
http |
MCPServerStreamableHttp |
websocket |
Skipped with a warning — the SDK has no WebSocket transport. The load does not fail. |
A per-server allowed_tools list becomes a static SDK tool filter (only the listed MCP tools are exposed). See MCP CLI for transport configuration.
Tracing¶
The SDK runs its own tracing pipeline. HoloDeck installs an OTel-mirroring TracingProcessor that reconstructs each finished SDK span as an OTel span on HoloDeck's global tracer (carrying the redacting span processor and your configured exporters). The mirror is installed only when observability.enabled and observability.traces.enabled are both true.
| Configuration | platform.openai.com upload | OTel mirror |
|---|---|---|
provider: openai |
retained (default exporter kept) | ✓ |
provider: azure_openai |
none (mirror only) | ✓ |
observability.disable_provider_tracing: true (either provider) |
none (mirror only) | ✓ |
Sensitive data is not uploaded by default
The SDK's trace_include_sensitive_data is bound to observability.traces.capture_content (default false). With the default, tool input/output is not included in uploaded spans. Set capture_content: true only when the data is safe to capture. See Observability.
Per-backend semantics¶
Shipped surface only, OpenAI vs Claude:
| Capability | OpenAI | Claude |
|---|---|---|
| Function tools | ✓ | ✓ |
| RAG (vectorstore / hierarchical_document) | ✓ | ✓ |
| MCP stdio / sse / http | ✓ | ✓ |
| Structured output | ✓ (use anyOf, not oneOf) |
✓ |
Reasoning / thinking |
✓ | ✓ |
effort / max_budget_usd / fallback_model |
✓ (via openai:) |
— |
holodeck chat / holodeck test |
✓ | ✓ |
holodeck serve / holodeck deploy |
✗ (roadmap) | ✓ |
holodeck serve and holodeck deploy are fully supported on the Claude
backend today. On the OpenAI backend they are on the roadmap —
use holodeck chat and holodeck test for now.
Coming soon¶
The following are not yet available on this backend — they are roadmap, not shipped:
- Subagents / handoffs — a multi-agent
openai.agentsblock. - YAML hooks — user-defined
openai.hooks. - Hosted tools — web search, code interpreter, file search, image generation, hosted MCP (this is what
i_understand_this_is_unsafegates). holodeck serve&holodeck deploy— running this backend as a REST/AG-UI server or deploying it to a container platform is not yet wired. (Both are fully supported on the Claude backend.) Themax_concurrent_sessions/session_memory_estimate_mibknobs are accepted in config ahead of that work but are not yet enforced.- Default credential-redaction guardrails — the output guardrail that
disable_default_hookswould turn off.
Troubleshooting¶
Missing Azure credentials¶
Error: AZURE_OPENAI_API_KEY is required for provider 'azure_openai' or AZURE_OPENAI_ENDPOINT ...
- Set
AZURE_OPENAI_API_KEYandAZURE_OPENAI_ENDPOINT(env ormodel.api_key/model.endpoint). - Confirm
model.namematches your Azure deployment name, not a base model id. - For plain OpenAI, set
OPENAI_API_KEYand useprovider: openai(no endpoint).
oneOf schema rejected¶
Symptom: the provider rejects your response_format schema.
Fix: replace oneOf with anyOf — OpenAI structured outputs do not accept oneOf. The anyOf form also works on the Claude backend.
Reasoning-model sampling params ignored or erroring¶
Reasoning models (o-series, gpt-5+) reject temperature / top_p and use max_output_tokens rather than max_tokens. The backend detects reasoning models by name prefix; if your Azure deployment name is opaque (doesn't embed the base model), set sane sampling params explicitly and don't rely on auto-detection.
MCP WebSocket tool skipped¶
Symptom: a transport: websocket MCP tool logs Skipping MCP tool '...': websocket transport is not supported.
This is expected — the SDK has no WebSocket transport, so the tool is skipped (the load does not fail). Use stdio, sse, or http.
Next steps¶
- Agent Configuration — full agent.yaml structure
- Tools — extending agent capabilities
- Vector Stores — semantic search configuration
- Observability — tracing and metrics
- Claude Backend — the sibling backend