Agent Server Guide¶
This guide explains how to deploy HoloDeck agents as HTTP servers using the holodeck serve command.
Overview¶
holodeck serve exposes your configured agent as an HTTP server with two protocol options:
- AG-UI Protocol (default) - Standard protocol for AI agent frontends like CopilotKit
- REST API Protocol - Traditional REST endpoints with JSON/SSE responses
Both protocols support streaming responses, session management, multimodal file uploads, and health checks.
Quick Start¶
# Start server with AG-UI protocol (default)
holodeck serve agent.yaml
# Start server with REST API
holodeck serve agent.yaml --protocol rest
# Custom port and host
holodeck serve agent.yaml --port 3000 --host 0.0.0.0
The server displays startup information:
============================================================
HoloDeck Agent Server
============================================================
Agent: research
Protocol: ag-ui
URL: http://127.0.0.1:8000
Endpoints:
POST /awp AG-UI Protocol
GET /health Health Check
GET /ready Readiness Check
Press Ctrl+C to stop
============================================================
CLI Reference¶
holodeck serve <agent_config> [OPTIONS]
Arguments¶
| Argument | Description | Default |
|---|---|---|
agent_config |
Path to agent.yaml configuration file | agent.yaml |
Options¶
| Option | Description | Default |
|---|---|---|
--port, -p |
Port to listen on | 8000 |
--host, -h |
Host to bind to | 127.0.0.1 |
--protocol |
Protocol type: ag-ui or rest |
ag-ui |
--cors-origins |
Comma-separated CORS origins | http://localhost:3000 |
--verbose, -v |
Enable verbose debug logging | false |
--quiet, -q |
Suppress INFO logging output | false |
Examples¶
# Development with verbose logging
holodeck serve agent.yaml --verbose
# Production with all interfaces
holodeck serve agent.yaml --host 0.0.0.0 --port 8080
# REST API with custom CORS
holodeck serve agent.yaml --protocol rest --cors-origins "http://localhost:3000,https://myapp.com"
AG-UI Protocol¶
AG-UI is the default protocol, designed for integration with AI agent frontends like CopilotKit, Vercel AI SDK, and similar frameworks.
Endpoint¶
POST /awp
Accepts RunAgentInput from the AG-UI specification and streams protocol events back to the client.
Architecture¶
┌─────────────────────────┐
│ Web Frontend │
│ (CopilotKit, etc.) │
└────────────┬────────────┘
│ AG-UI Protocol
▼
┌─────────────────────────┐
│ HoloDeck Server │
│ POST /awp │
└────────────┬────────────┘
│ LLM + Tools
▼
┌─────────────────────────┐
│ Agent Execution │
│ (Semantic Kernel) │
└─────────────────────────┘
Thread/Session Mapping¶
AG-UI thread_id maps directly to HoloDeck session_id for conversation continuity.
REST API Protocol¶
When running with --protocol rest, traditional REST endpoints are exposed.
Chat Endpoints¶
Synchronous Chat¶
POST /agent/{agent_name}/chat
Content-Type: application/json
{
"message": "Hello, how can you help me?",
"session_id": "01HQXYZ..." // Optional, auto-generated if omitted
}
Response:
{
"message_id": "01HQABC...",
"content": "I'm a helpful assistant...",
"session_id": "01HQXYZ...",
"tool_calls": [],
"tokens_used": {
"prompt_tokens": 150,
"completion_tokens": 75,
"total_tokens": 225
},
"execution_time_ms": 1250
}
Streaming Chat (SSE)¶
POST /agent/{agent_name}/chat/stream
Content-Type: application/json
{
"message": "Explain quantum computing",
"session_id": "01HQXYZ..."
}
Response: Server-Sent Events stream
event: stream_start
data: {"session_id": "01HQXYZ...", "message_id": "01HQABC..."}
event: message_delta
data: {"delta": "Quantum computing is", "message_id": "01HQABC..."}
event: message_delta
data: {"delta": " a type of computation...", "message_id": "01HQABC..."}
event: stream_end
data: {"message_id": "01HQABC...", "tokens_used": {...}, "execution_time_ms": 2500}
Multipart File Upload¶
POST /agent/{agent_name}/chat/multipart
Content-Type: multipart/form-data
message: "What's in this image?"
files: <binary file data>
session_id: "01HQXYZ..." // Optional
Supports up to 10 files per request with the following limits:
- Max 50MB per file
- Max 100MB total per request
Supported file types:
- Images: PNG, JPEG, GIF, WebP
- Documents: PDF
- Office: DOCX, XLSX, PPTX
- Text: TXT, CSV, Markdown
Streaming with Multipart¶
POST /agent/{agent_name}/chat/stream/multipart
Combines streaming responses with multipart file upload.
Session Management¶
Delete Session¶
DELETE /sessions/{session_id}
Removes session and conversation history. Returns 204 No Content.
Health Endpoints¶
Health Check¶
GET /health
{
"status": "healthy",
"agent_name": "research",
"agent_ready": true,
"active_sessions": 5,
"uptime_seconds": 3600.5
}
Readiness Check¶
GET /ready
{
"ready": true
}
Used by load balancers and container orchestrators.
API Documentation¶
GET /docs
Interactive Swagger UI for testing endpoints (REST protocol only).
CopilotKit Integration¶
HoloDeck integrates seamlessly with CopilotKit using the AG-UI protocol.
Architecture¶
┌─────────────────────────┐
│ CopilotKit Frontend │
│ (Next.js React App) │
└────────────┬────────────┘
│
▼
┌─────────────────────────────────────┐
│ Next.js API Route │
│ /api/copilotkit │
│ HttpAgent → http://127.0.0.1:8000 │
└────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ HoloDeck AG-UI Server │
│ POST /awp │
└─────────────────────────────────────┘
Setup¶
1. Start HoloDeck Server¶
holodeck serve agent.yaml
# Server starts at http://127.0.0.1:8000
2. Create Next.js Project¶
npx create-next-app@latest my-copilot --typescript --tailwind
cd my-copilot
3. Install Dependencies¶
npm install @copilotkit/react-core @copilotkit/react-ui @copilotkit/runtime @ag-ui/client
4. Create API Route¶
Create src/app/api/copilotkit/route.ts:
import { HttpAgent } from "@ag-ui/client";
import {
CopilotRuntime,
ExperimentalEmptyAdapter,
copilotRuntimeNextJSAppRouterEndpoint,
} from "@copilotkit/runtime";
import { NextRequest } from "next/server";
const serviceAdapter = new ExperimentalEmptyAdapter();
const runtime = new CopilotRuntime({
agents: {
// Agent name must match your HoloDeck agent name
research: new HttpAgent({ url: "http://127.0.0.1:8000/awp" }),
},
});
export const POST = async (req: NextRequest) => {
const { handleRequest } = copilotRuntimeNextJSAppRouterEndpoint({
runtime,
serviceAdapter,
endpoint: "/api/copilotkit",
});
return handleRequest(req);
};
5. Create Layout with Provider¶
Update src/app/layout.tsx:
import { CopilotKit } from "@copilotkit/react-core";
import "@copilotkit/react-ui/styles.css";
export default function RootLayout({
children,
}: {
children: React.ReactNode;
}) {
return (
<html lang="en">
<body>
<CopilotKit runtimeUrl="/api/copilotkit" agent="research">
{children}
</CopilotKit>
</body>
</html>
);
}
6. Create Chat Page¶
Create src/app/page.tsx:
"use client";
import { CopilotChat } from "@copilotkit/react-ui";
export default function Home() {
return (
<main className="h-screen">
<CopilotChat
labels={{
title: "Research Assistant",
initial: "Hi! I'm your research assistant. How can I help?",
}}
/>
</main>
);
}
7. Run Both Servers¶
# Terminal 1: HoloDeck
holodeck serve agent.yaml
# Terminal 2: Next.js
npm run dev
Open http://localhost:3000 to chat with your agent.
Session Management¶
HoloDeck maintains conversation sessions with automatic cleanup.
Session Behavior¶
- Auto-creation: Sessions are created automatically on first request
- Session ID format: ULID (Universally Unique Lexicographically Sortable Identifier)
- TTL: 30 minutes of inactivity
- Max sessions: 1000 concurrent sessions
- Cleanup interval: Every 5 minutes
Session Data¶
Each session maintains:
- Conversation history
- Agent executor instance
- Created/last activity timestamps
- Message count
Continuing Conversations¶
Include session_id in requests to continue a conversation:
{
"message": "Tell me more about that",
"session_id": "01HQXYZ..."
}
If the session has expired, a new session is automatically created.
Error Handling¶
Errors are returned in RFC 7807 Problem Details format:
{
"type": "https://holodeck.dev/errors/invalid-request",
"title": "Invalid Request",
"status": 400,
"detail": "Message cannot be empty",
"instance": "/agent/research/chat"
}
Error Types¶
| Status | Type | Description |
|---|---|---|
| 400 | invalid-request |
Malformed request body |
| 404 | not-found |
Agent or session not found |
| 503 | service-unavailable |
Agent not ready |
| 500 | internal-error |
Unexpected server error |
SSE Event Types¶
For streaming endpoints, the following event types are emitted:
| Event | Description | Data |
|---|---|---|
stream_start |
Stream begins | session_id, message_id |
message_delta |
Text chunk | delta, message_id |
tool_call_start |
Tool invocation begins | tool_call_id, name |
tool_call_args |
Tool arguments (chunked) | tool_call_id, args_delta |
tool_call_end |
Tool completes | tool_call_id, status |
stream_end |
Stream completes | tokens_used, execution_time_ms |
error |
Error occurred | RFC 7807 problem details |
Keepalive comments (:) are sent every 15 seconds to prevent connection timeout.
CORS Configuration¶
Configure allowed origins for cross-origin requests:
# Single origin
holodeck serve agent.yaml --cors-origins "http://localhost:3000"
# Multiple origins
holodeck serve agent.yaml --cors-origins "http://localhost:3000,https://myapp.com"
# All interfaces for development
holodeck serve agent.yaml --host 0.0.0.0 --cors-origins "*"
Complete Examples¶
Basic AG-UI Server¶
# agent.yaml
name: assistant
model:
provider: openai
name: gpt-4o
instructions:
inline: "You are a helpful assistant."
holodeck serve agent.yaml
REST API with Tools¶
# agent.yaml
name: research
model:
provider: ollama
name: llama3.2:latest
instructions:
file: instructions/research.md
tools:
- type: vectorstore
name: search_docs
store: chroma
collection: research_papers
holodeck serve agent.yaml --protocol rest --port 8080
Production Deployment¶
# With all interfaces exposed
holodeck serve agent.yaml \
--host 0.0.0.0 \
--port 8000 \
--cors-origins "https://myapp.com" \
--quiet
# With Docker
docker run -p 8000:8000 \
-v $(pwd)/agent.yaml:/app/agent.yaml \
holodeck-ai serve /app/agent.yaml --host 0.0.0.0
Next Steps¶
- See Agent Configuration Guide for agent.yaml reference
- See Tools Guide for adding tools to your agent
- See Observability Guide for tracing and monitoring
- See Global Configuration for shared settings