Claude Backend¶
The Claude backend runs your agent on top of the Claude Agent SDK — Anthropic's first-class agent runtime. It is automatically selected when model.provider is anthropic or ollama (local models).
Quick start¶
Prerequisites: Node.js 18+ (the SDK spawns a claude CLI subprocess — verify with node --version) and an Anthropic credential (a Claude Code OAuth token is simplest).
# agent.yaml
name: my-claude-agent
model:
provider: anthropic
name: claude-sonnet-4-20250514
auth_provider: oauth_token
instructions:
inline: "You are a helpful assistant."
# .env
CLAUDE_CODE_OAUTH_TOKEN=${CLAUDE_CODE_OAUTH_TOKEN}
holodeck chat agent.yaml
# → starts an interactive chat session backed by Claude
How it works¶
BackendSelector routes provider: anthropic (and ollama) to the Claude backend, which drives the Claude Agent SDK. Every Claude-specific capability lives under the top-level claude: block and defaults to disabled (least-privilege). The backend fully supports holodeck chat, test, serve, and deploy. Using OpenAI or Azure models instead? See the OpenAI Backend guide. For shared concepts, see the Tools, Observability, and Vector Stores guides.
Advanced configuration¶
All Claude-specific settings live under the top-level claude: block in agent.yaml. Every capability defaults to disabled (least-privilege).
Authentication providers¶
The auth_provider field on model selects how HoloDeck authenticates with Anthropic. Defaults to api_key when omitted.
| Method | Env variables | Use case |
|---|---|---|
api_key |
ANTHROPIC_API_KEY |
Direct API access (default) |
oauth_token |
CLAUDE_CODE_OAUTH_TOKEN |
Claude Code OAuth (recommended) |
bedrock |
AWS_REGION (or AWS_DEFAULT_REGION) plus AWS credentials |
AWS Bedrock |
vertex |
CLOUD_ML_REGION plus ANTHROPIC_VERTEX_PROJECT_ID and Google credentials |
Google Vertex AI |
foundry |
ANTHROPIC_FOUNDRY_RESOURCE or ANTHROPIC_FOUNDRY_BASE_URL |
Azure AI Foundry |
API key (default):
model:
provider: anthropic
name: claude-sonnet-4-20250514
auth_provider: api_key
export ANTHROPIC_API_KEY="sk-ant-..."
OAuth token (recommended for Claude Code users):
model:
provider: anthropic
name: claude-sonnet-4-20250514
auth_provider: oauth_token
export CLAUDE_CODE_OAUTH_TOKEN="your-oauth-token"
AWS Bedrock:
model:
provider: anthropic
name: claude-sonnet-4-20250514
auth_provider: bedrock
export AWS_REGION=us-east-1
# plus AWS credentials/profile (env vars, ~/.aws/credentials, or IAM role)
Google Vertex AI:
model:
provider: anthropic
name: claude-sonnet-4-20250514
auth_provider: vertex
export CLOUD_ML_REGION=us-east5
export ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
# plus Google credentials (GOOGLE_APPLICATION_CREDENTIALS or ADC)
Azure AI Foundry:
model:
provider: anthropic
name: claude-sonnet-4-20250514
auth_provider: foundry
export ANTHROPIC_FOUNDRY_RESOURCE=your-foundry-resource
# or:
export ANTHROPIC_FOUNDRY_BASE_URL=https://your-resource.services.ai.azure.com
Authentication is provided either via ANTHROPIC_API_KEY or the Azure credential chain.
Cloud routing is env-driven
For bedrock, vertex, and foundry, HoloDeck sets the Claude Code subprocess flag (CLAUDE_CODE_USE_BEDROCK=1, CLAUDE_CODE_USE_VERTEX=1, CLAUDE_CODE_USE_FOUNDRY=1) based on auth_provider. model.endpoint is not consulted for these modes. Missing routing variables fail fast at startup with a configuration error.
Permission modes¶
Controls how autonomously the agent can act:
| Value | Behaviour |
|---|---|
manual |
Manual approval for every action (default, safest) |
acceptEdits |
Auto-approve file edits; manual approval for other tool calls |
acceptAll |
Auto-approve all tool calls and actions |
claude:
permission_mode: acceptEdits
Extended thinking¶
Enable deep, internal reasoning before the agent responds:
claude:
extended_thinking:
enabled: true
budget_tokens: 10000 # 1,000 - 100,000
budget_tokens caps how many tokens the agent can spend on hidden reasoning per turn.
Web search¶
Built-in web search capability backed by the SDK:
claude:
web_search: true
Bash & file system scoping¶
Both are off by default. Enable explicitly to grant access:
claude:
bash:
enabled: true
excluded_commands: ["rm -rf", "shutdown"]
allow_unsafe: false # dangerous commands require explicit opt-in
file_system:
read: true
write: true
edit: true
working_directory further scopes file access to a specific path (passed as the subprocess cwd):
claude:
working_directory: ./workspace
Subagents¶
Define named subagents (a multi-agent team) under claude.agents. Each entry becomes an AgentDefinition registered with the SDK and is invocable by the parent via the Task tool.
claude:
agents:
researcher:
description: Gathers raw information from the web and primary sources.
prompt: |
You are a meticulous researcher. Search the web for primary sources.
Cite every claim with a URL. Do not analyze — just gather facts.
tools: [WebSearch, WebFetch]
model: haiku
analyst:
description: Analyses data, identifies patterns, surfaces key insights.
prompt: |
You are a data analyst. Study the researcher's findings, identify
patterns and outliers, and summarise the key insights clearly.
tools: [Read]
model: sonnet
writer:
description: Produces a polished final report from analyst insights.
prompt: |
You are an expert technical writer. Use headers, bullets, and inline
citations. Do not conduct research yourself.
# No tools list → inherits all parent tools.
# No model → inherits parent model.
Per-subagent fields:
| Field | Type | Required | Description |
|---|---|---|---|
description |
string | yes | Human-readable description used by the parent for routing |
prompt |
string | one of | Inline system prompt for the subagent |
prompt_file |
string | one of | Path to a file containing the prompt (mutually exclusive with prompt) |
tools |
list of strings | no | Allowlist of tool names; omit to inherit all parent tools |
model |
sonnet | opus | haiku | inherit |
no | Model alias for this subagent; omit to inherit the parent's model |
Subagent model must be a SDK alias
Only sonnet, opus, haiku, or inherit are valid. Full model IDs like claude-sonnet-4-20250514 are silently rejected by the Claude CLI — invalid values fail at config-load time with a clear error. Use inherit to track the parent's pinned full ID for version stability.
Working directory & max turns¶
claude:
working_directory: ./workspace # subprocess cwd; restricts file access
max_turns: 10 # cap on agent loop iterations
Allowed tools¶
When set, only the listed tools are visible to the agent (in addition to any defined under the agent's top-level tools:):
claude:
allowed_tools:
- knowledge-base
- web-search
Settings isolation (plugins, skills, hooks)¶
The Claude Agent SDK spawns the claude CLI as a subprocess. By default the CLI loads everything it finds under ~/.claude/, .claude/, and .claude/settings.local.json — plugins, skills, hooks, MCP servers, etc. That means an agent deployed from your laptop can silently inherit your personal Claude Code config, and the same agent deployed in CI/cloud gets a different setup.
HoloDeck defaults to full isolation: when claude.setting_sources is unset, the backend passes setting_sources=[] to the SDK so the subprocess loads no settings layers. Your agent's behavior is determined solely by its agent.yaml.
Opt back in only when you need it:
claude:
setting_sources:
- project # loads .claude/settings.json (and CLAUDE.md)
Accepted values:
| Value | Loads |
|---|---|
user |
~/.claude/settings.json (global) |
project |
.claude/settings.json and CLAUDE.md |
local |
.claude/settings.local.json (gitignored) |
all |
sugar for [user, project, local] — cannot be combined with other entries |
Notes:
- Include
projectif your agent depends onCLAUDE.md. - The default is
null→ HoloDeck sends[]. Setting an empty list explicitly (setting_sources: []) has the same effect. allis rejected if mixed with other values; use either[all]or any subset of the three concrete layers.
Embedding provider requirement¶
Anthropic does not provide embedding models. When you combine model.provider: anthropic with vectorstore or hierarchical_document tools, you must define an embedding_provider at the agent level:
embedding_provider:
provider: ollama
name: nomic-embed-text:latest
endpoint: http://localhost:11434
# or:
embedding_provider:
provider: openai
name: text-embedding-3-small
Full annotated example¶
# agent.yaml — full Claude-native agent
name: research-assistant
description: Research assistant powered by Claude
model:
provider: anthropic
name: claude-sonnet-4-20250514
auth_provider: oauth_token
temperature: 0.3
max_tokens: 8000
embedding_provider:
provider: ollama
name: nomic-embed-text:latest
endpoint: http://localhost:11434
instructions:
inline: |
You are a research assistant.
Provide thorough, well-sourced answers.
claude:
permission_mode: acceptEdits
working_directory: ./workspace
max_turns: 15
extended_thinking:
enabled: true
budget_tokens: 20000
web_search: true
bash:
enabled: true
excluded_commands: ["rm -rf", "shutdown"]
allow_unsafe: false
file_system:
read: true
write: true
edit: true
agents:
researcher:
description: Gathers raw information from the web.
prompt: "Search the web for primary sources. Cite every claim."
tools: [WebSearch, WebFetch]
model: haiku
allowed_tools:
- knowledge-base
- web-search
tools:
- name: knowledge-base
type: vectorstore
description: Search the research knowledge base
source: ./data/research/
Production considerations: memory and concurrency¶
The Claude backend runs each turn through a fresh Claude CLI Node.js subprocess spawned by claude-agent-sdk (see Anthropic's hosting guide). This is the dominant memory cost in production and the binding constraint on concurrency. If you're deploying with holodeck serve to a container with a fixed memory limit (Azure Container Apps, Cloud Run, ECS Fargate, Kubernetes), the defaults below have been calibrated against real OOM data — but the right number for your agent depends on the tools it carries.
The model in one diagram¶
┌──────────────────────────────────────────────────────────────────┐
│ Parent Python process (HoloDeck serve) │
│ ┌─────────────────────┐ in-process MCP server │
│ │ AG-UI / REST API │◄──── tool calls (vectorstore, │
│ │ + session store │ hierarchical_document, function) │
│ └─────────────────────┘ │
│ │ spawn (per turn) │
│ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Node CLI subprocess │ │ Node CLI subprocess │ …N turns │
│ │ (~300 MiB) │ │ (~300 MiB) │ │
│ └─────────────────────┘ └─────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Two things to notice:
- Tools run in the parent.
create_sdk_mcp_serverhosts HoloDeck tools in the Python process and routes calls back to the subprocess via IPC. A heavyhierarchical_documenttool with a qdrant client and a chunked corpus inflates the parent baseline, not the subprocess. - Subprocesses are per-turn, not per-session. Spec 034 P4 ("Hybrid Sessions") tears the subprocess down at turn end and persists transcripts as JSONL on disk. Idle sessions cost ~30 MiB each (Python object + JSONL handle); concurrent active turns cost the full subprocess.
This is why the concurrency cap is keyed on active turns, not open sessions.
Memory budget defaults¶
| Component | Default | Configurable via |
|---|---|---|
| Parent baseline (server + tools + OTEL) | 400 MiB | (library constant) |
| Per concurrent active turn | 500 MiB | claude.session_memory_estimate_mib |
| Open session ceiling (idle slots) | 1000 | (library constant) |
The active-turn cap is derived from the replica's cgroup memory limit:
max_concurrent_turns = (memory_limit - baseline) / per_turn_estimate
The 500 MiB default covers the ~300 MiB Node CLI steady state plus the simultaneous-startup spike (when N turns hit serve at once, V8 initial heap allocation peaks together) plus parent-side transient work during the turn (hybrid search, rerank, context generation). It was calibrated against the spec 034 P4 cloud validation where a 2 GiB Azure Container Apps replica OOM-killed at 4 concurrent turns and ran cleanly at 3 — (2048-400)/500 = 3 matches that.
Sizing your replica¶
| Replica memory | Derived turn cap | Notes |
|---|---|---|
| 1 GiB | 1 | Bare minimum; one turn at a time |
| 2 GiB | 3 | Default sample (financial-assistant) target |
| 4 GiB | 7 | Comfortable production tier |
| 8 GiB | 15 | High-concurrency tier |
If you need higher concurrency, scale horizontally (more replicas) rather than vertically — the SDK subprocess model means each turn is independent, so adding replicas linearly increases capacity.
Tuning session_memory_estimate_mib¶
The default targets the median tool-heavy agent. Adjust it when:
| Situation | Direction | Suggested override |
|---|---|---|
| Thin agent — no vectorstore, no hierarchical_document, no MCP | Lower | 300 (yields cap=5 on 2 GiB) |
| Heavy retrieval — large hybrid index, in-process rerank | Default | 500 |
| Embedded models — local rerank model, ONNX, sentence-transformers | Higher | 700–900 |
| Multiple MCP servers with persistent state | Higher | 600–700 |
claude:
session_memory_estimate_mib: 700 # tune for your agent's tool footprint
Excess-capacity behaviour¶
When concurrent active turns exceed the cap, the serve layer immediately returns HTTP 429 with a Retry-After body. It does not queue requests — Anthropic's hosting guidance recommends shedding load over buffering, because subprocess startup is expensive and a queued request that times out at the upstream is worse than a fast 429 that the client can retry with backoff.
Observability¶
When OpenTelemetry is enabled (observability.enabled: true), the serve layer emits:
holodeck.serve.active_turns(gauge) — current in-flight countholodeck.serve.turn_rejections_total(counter) — 429s due to the capholodeck.serve.session_count(gauge) — open sessions (idle + active)
Alert on turn_rejections_total rate > 0 sustained for more than a minute — it indicates either undersized replicas or a need to scale horizontally.
Azure WorkingSetBytes is 1-minute averaged
Cloud-provider memory metrics typically report 1-minute averages, which mask the instantaneous peak during simultaneous turn startup. If you're sizing from Azure metrics, expect the real peak to be 1.5–2× the average. The 500 MiB default already accounts for this; if you're tuning from your own observed averages, multiply by ~1.7 before dividing into headroom.
Anthropic's own guidance¶
Anthropic's SDK hosting documentation recommends 1 GiB RAM per agent instance as a conservative starting point — that includes the parent process. HoloDeck's 400 MiB parent baseline + 500 MiB per turn lands right on that number for cap=1 and scales linearly from there. There is no "shared subprocess pool" pattern documented in the SDK; each turn gets its own subprocess by design.
Configuration reference¶
model.* fields¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider |
string | yes | - | Must be anthropic to select the Claude backend |
name |
string | yes | - | Anthropic model identifier (e.g. claude-sonnet-4-20250514) |
auth_provider |
enum | no | api_key |
One of api_key, oauth_token, bedrock, vertex, foundry |
temperature |
float | no | 0.3 |
Randomness, 0.0–2.0 |
max_tokens |
integer | no | 1000 |
Maximum response tokens |
top_p |
float | no | - | Nucleus sampling, 0.0–1.0 |
api_key |
string | no | - | API key (when auth_provider: api_key); prefer ${ANTHROPIC_API_KEY} |
claude.* fields¶
| Field | Type | Default | Description |
|---|---|---|---|
permission_mode |
enum | manual |
manual | acceptEdits | acceptAll |
working_directory |
string | - | Restrict file access; subprocess cwd |
max_turns |
integer (≥1) | SDK default | Maximum agent loop iterations |
extended_thinking |
object | - | { enabled: bool, budget_tokens: int } |
web_search |
boolean | false |
Enable built-in web search |
bash |
object | - | { enabled: bool, excluded_commands: [str], allow_unsafe: bool } |
file_system |
object | - | { read: bool, write: bool, edit: bool } |
agents |
map |
- | Named subagents (see Subagents above) |
allowed_tools |
list of strings | all tools | Explicit tool allowlist |
setting_sources |
list of user|project|local|all |
null → [] (isolation) |
Which Claude Code settings layers the spawned SDK subprocess loads. See Settings isolation below. |
session_memory_estimate_mib |
int (50–2000) | 500 |
Per-active-turn memory budget; drives the concurrency cap (see Production considerations) |
max_concurrent_sessions |
int (1–500) | derived | Hard cap on concurrent active turns; auto-derived from cgroup memory when omitted |
Available models¶
| Model | Description | Context window |
|---|---|---|
claude-sonnet-4-20250514 |
Best balance of speed and capability | 200K tokens |
claude-opus-4-20250514 |
Most capable, best for complex tasks | 200K tokens |
claude-3-5-sonnet-20241022 |
Previous-generation Sonnet | 200K tokens |
claude-3-5-haiku-20241022 |
Fast and cost-effective | 200K tokens |
Check Anthropic's model documentation for the latest list.
Anthropic environment variables¶
| Variable | Auth provider | Description |
|---|---|---|
ANTHROPIC_API_KEY |
api_key |
API authentication key |
CLAUDE_CODE_OAUTH_TOKEN |
oauth_token |
Claude Code OAuth token (recommended) |
AWS_REGION |
bedrock |
AWS Bedrock region |
AWS_DEFAULT_REGION |
bedrock |
Alternate AWS region variable |
CLOUD_ML_REGION |
vertex |
Vertex region |
ANTHROPIC_VERTEX_PROJECT_ID |
vertex |
Vertex project ID |
GCLOUD_PROJECT |
vertex |
Alternate Vertex project context |
GOOGLE_CLOUD_PROJECT |
vertex |
Alternate Vertex project context |
GOOGLE_APPLICATION_CREDENTIALS |
vertex |
Service-account credential file path |
ANTHROPIC_FOUNDRY_RESOURCE |
foundry |
Foundry resource name |
ANTHROPIC_FOUNDRY_BASE_URL |
foundry |
Alternate Foundry base URL |
Serve and deploy¶
The Claude backend fully supports holodeck serve and holodeck deploy — there are no backend-specific restrictions. See Agent Server and Deployment for the workflow, and Production considerations above for sizing the per-turn subprocess memory under serve.
Troubleshooting¶
Cloud auth context missing¶
Error: AWS_REGION environment variable is not set for auth_provider: bedrock
or: CLOUD_ML_REGION environment variable is not set for auth_provider: vertex
or: Missing Foundry target for auth_provider: foundry
- Confirm
model.provider: anthropicand your selectedauth_provider. - Set the required routing variables (see the Authentication providers table above).
model.endpointis ignored for cloud auth modes — drop it.
Subagent not registering¶
Symptom: The parent falls back to the general-purpose agent and your custom subagents (researcher, analyst, …) never appear.
Cause: The Claude CLI silently drops AgentDefinitions whose model is outside sonnet | opus | haiku | inherit. Full model IDs are not accepted in the subagent slot.
Fix: Use the SDK aliases in claude.agents.<name>.model, or omit model to inherit the parent's pinned model.
Embedding provider missing for Anthropic + vectorstore¶
Error: HoloDeck refuses to start when provider: anthropic is combined with vectorstore/hierarchical_document tools but no embedding_provider is configured.
Fix: Add an embedding_provider block (Ollama or OpenAI; see above).
Invalid API key¶
Error: AuthenticationError or Invalid API key.
- Verify the key:
echo $ANTHROPIC_API_KEY. - Ensure no extra whitespace.
- For OAuth, regenerate via the Claude Code CLI.
Next steps¶
- Agent Configuration — full agent.yaml structure
- Tools — extending agent capabilities
- Observability — tracing and metrics
- Vector Stores — semantic search configuration