Semantic Kernel Backend¶
The Semantic Kernel (SK) backend powers HoloDeck agents that run against OpenAI, Azure OpenAI, or local Ollama models. It is automatically selected for any model.provider other than anthropic.
This guide is for backend-specific behaviour: per-provider setup, deployment-name mechanics, local-model nuances, and the full configuration reference. For shared concepts like tools, observability, and vector stores, see the dedicated guides for each.
Quick start¶
OpenAI is the most common entry point. Set an API key and point an agent at it:
# agent.yaml
name: my-agent
model:
provider: openai
name: gpt-4o
temperature: 0.5
max_tokens: 2000
instructions:
inline: "You are a helpful assistant."
# .env
OPENAI_API_KEY=sk-...
Verify:
holodeck chat agent.yaml
For Azure OpenAI or Ollama, see the per-provider sections below.
Advanced configuration¶
The SK backend doesn't expose a separate top-level config block — all backend behaviour is driven by the model settings of the active provider plus the provider's own infrastructure.
Provider: OpenAI¶
OpenAI provides GPT-4o, GPT-4o-mini, and other models through their hosted API.
Prerequisites:
- Create an account at platform.openai.com.
- Generate an API key in the API Keys section.
- Set up billing.
Configuration:
# config.yaml
providers:
openai:
provider: openai
name: gpt-4o
temperature: 0.3
max_tokens: 2000
api_key: ${OPENAI_API_KEY}
# agent.yaml
name: my-agent
model:
provider: openai
name: gpt-4o
temperature: 0.7
max_tokens: 4000
instructions:
inline: "You are a helpful assistant."
Environment variables:
OPENAI_API_KEY=sk-...
Available models:
| Model | Description | Context window |
|---|---|---|
gpt-4o |
Most capable, multimodal | 128K tokens |
gpt-4o-mini |
Fast and cost-effective | 128K tokens |
gpt-4-turbo |
Previous-generation flagship | 128K tokens |
gpt-3.5-turbo |
Fast, lower cost | 16K tokens |
Provider: Azure OpenAI¶
Azure OpenAI Service provides OpenAI models via Microsoft Azure with enterprise features.
Prerequisites:
- Azure subscription with Azure OpenAI access.
- Create an Azure OpenAI resource in the Azure Portal.
- Deploy a model in Azure OpenAI Studio.
- Note the endpoint URL and API key.
Configuration:
Both endpoint and api_key are required:
# config.yaml
providers:
azure_openai:
provider: azure_openai
name: my-gpt4o-deployment # see "Deployment names" below
endpoint: ${AZURE_OPENAI_ENDPOINT}
api_key: ${AZURE_OPENAI_API_KEY}
temperature: 0.3
max_tokens: 2000
# agent.yaml
name: enterprise-agent
model:
provider: azure_openai
name: my-gpt4o-deployment
endpoint: https://my-resource.openai.azure.com/
instructions:
inline: "You are an enterprise assistant."
Environment variables:
AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key-here
Endpoint format:
https://{resource-name}.openai.azure.com/
Find your endpoint in: Azure Portal → Your OpenAI Resource → Keys and Endpoint, or Azure OpenAI Studio → Deployments → Your Deployment.
Deployment names¶
name refers to the deployment, not the base model
In Azure OpenAI, the name field must match your deployment name, not the base model. This is different from OpenAI's API.
When you deploy a model in Azure OpenAI Studio, you create a deployment with a custom name:
- Base model:
gpt-4o,gpt-4o-mini, etc. - Deployment name: your custom identifier (e.g.
my-gpt4o,prod-gpt4).
# If your deployment is "my-gpt4o-production" backed by gpt-4o:
model:
provider: azure_openai
name: my-gpt4o-production
endpoint: https://my-resource.openai.azure.com/
Common mistake:
# WRONG — using the base model name
model:
provider: azure_openai
name: gpt-4o # only works if the deployment is literally named "gpt-4o"
# CORRECT — using your deployment name
model:
provider: azure_openai
name: my-gpt4o-deployment
Provider: Ollama¶
Ollama runs open-source LLMs locally — ideal for privacy-sensitive workloads, offline use, and avoiding API costs.
Prerequisites:
- Install Ollama from ollama.com:
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
- Pull a model:
ollama pull llama3.2
- Verify it's running:
ollama list
Configuration:
Ollama requires an endpoint pointing to your local server:
# config.yaml
providers:
ollama:
provider: ollama
name: llama3.2
endpoint: http://localhost:11434
temperature: 0.7
max_tokens: 2000
# agent.yaml
name: local-agent
model:
provider: ollama
name: llama3.2
endpoint: http://localhost:11434
instructions:
inline: "You are a helpful local assistant."
Environment variables:
OLLAMA_ENDPOINT=http://localhost:11434
Available models (pull with ollama pull <model>):
| Model | Command | Description | Size |
|---|---|---|---|
| GPT-OSS (20B) | ollama pull gpt-oss:20b |
Recommended completion | 40 GB |
| Nomic Embed Text | ollama pull nomic-embed-text:latest |
Recommended embeddings | 274 MB |
| Llama 3.2 | ollama pull llama3.2 |
Meta's latest, general | 2 GB |
| Llama 3.2 (3B) | ollama pull llama3.2:3b |
Larger Llama variant | 5 GB |
| Mistral | ollama pull mistral |
Fast and capable | 4 GB |
| CodeLlama | ollama pull codellama |
Optimised for code | 4 GB |
| Phi-3 | ollama pull phi3 |
Microsoft compact model | 2 GB |
| Gemma 2 | ollama pull gemma2 |
Google's open model | 5 GB |
Running Ollama as a service¶
ollama serve
Or with Docker:
docker run -d \
--name ollama \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
ollama/ollama
Context size¶
For agent workloads, configure a context size of at least 16k tokens. Default Ollama context windows can be small enough to clip tool outputs.
Create a custom model with extended context:
cat <<EOF > Modelfile
FROM gpt-oss:20b
PARAMETER num_ctx 16384
EOF
ollama create gpt-oss:20b-16k -f Modelfile
For 32k:
cat <<EOF > Modelfile
FROM gpt-oss:20b
PARAMETER num_ctx 32768
EOF
ollama create gpt-oss:20b-32k -f Modelfile
Use it:
model:
provider: ollama
name: gpt-oss:20b-16k
endpoint: http://localhost:11434
Memory pressure scales with context
A 32k context with a 20B-parameter model typically requires 48 GB+ RAM or a GPU with 16 GB+ VRAM.
Backend behaviour vs Claude¶
The SK backend is automatically selected when model.provider is anything other than anthropic. There is no top-level claude: block — capabilities like permission modes, extended thinking, web search, and subagents are Claude-only and are documented in the Claude Backend guide.
Configuration reference¶
model.* fields per provider¶
All providers share these fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider |
string | yes | - | openai, azure_openai, or ollama |
name |
string | yes | - | Model name (deployment name for Azure) |
temperature |
float | no | 0.3 |
Randomness, 0.0–2.0 |
max_tokens |
integer | no | 1000 |
Maximum response tokens |
top_p |
float | no | - | Nucleus sampling, 0.0–1.0 |
api_key |
string | varies | - | API key (required for OpenAI / Azure) |
endpoint |
string | varies | - | Endpoint URL (required for Azure / Ollama) |
Per-provider requirements:
| Provider | api_key |
endpoint |
Notes |
|---|---|---|---|
openai |
required | - | - |
azure_openai |
required | required | name must match the deployment name, not the base model |
ollama |
- | required | Default endpoint http://localhost:11434 |
Provider-specific environment variables¶
| Variable | Provider | Description |
|---|---|---|
OPENAI_API_KEY |
OpenAI | API authentication key |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI | Resource endpoint URL |
AZURE_OPENAI_API_KEY |
Azure OpenAI | API authentication key |
OLLAMA_ENDPOINT |
Ollama | Server endpoint (default http://localhost:11434) |
Limitations & roadmap¶
The SK backend currently powers holodeck serve and holodeck deploy build; the Claude backend does not yet support those commands.
- Agent Server: SK-only.
- Deployment: SK-only (containerisation).
Both will gain Claude support in a future release.
Troubleshooting¶
Invalid API key¶
Error: AuthenticationError or Invalid API key.
- Verify the key:
echo $OPENAI_API_KEY(or the Azure / Ollama equivalent). - Ensure no extra whitespace.
- Regenerate the API key if needed.
Azure endpoint missing¶
Error: endpoint is required for azure_openai provider.
model:
provider: azure_openai
name: my-deployment
endpoint: https://my-resource.openai.azure.com/
Model / deployment not found¶
- OpenAI: check the model identifier (e.g.
gpt-4o, notgpt4o). - Azure: ensure
namematches the deployment name exactly. - Ollama: pull first with
ollama pull <model>.
Ollama: connection refused¶
- Verify the daemon:
ollama list. - Start the server:
ollama serve. - Confirm the
endpointURL.
Ollama: model not found¶
ollama pull llama3.2
Rate limits¶
Error: Rate limit exceeded (OpenAI / Azure).
- Implement retry with exponential backoff.
- Reduce
max_tokens. - Use a faster / cheaper model.
- Upgrade the API plan.
Temperature out of range¶
model:
temperature: 0.7 # must be 0.0 – 2.0
Next steps¶
- Agent Configuration — full agent.yaml structure
- Tools — extending agent capabilities
- Observability — tracing and metrics
- Vector Stores — semantic search configuration
- Agent Server and Deployment — running SK agents as services