Tools¶
Tools are agent capabilities defined in agent.yaml. They work on both backends — the OpenAI backend (openai/azure_openai) and the Claude backend (anthropic/ollama) dispatch every tool type through the same loader.
Quick start¶
A minimal agent with one function tool — deterministic enough to verify with holodeck test.
# agent.yaml
name: math-agent
model:
provider: openai
name: gpt-4o-mini
api_key: ${OPENAI_API_KEY}
temperature: 0.0
instructions:
inline: "You are a calculator. Use the tools to answer arithmetic questions."
tools:
- name: subtract
type: function
description: "Compute a - b. Pass a and b as numeric strings; returns the difference as a string."
file: tools/math.py
function: subtract
# tools/math.py
def subtract(a: str, b: str) -> str:
"""Compute ``a - b`` for two numeric string operands.
Args:
a: Minuend as a numeric string (e.g. ``"206588"``).
b: Subtrahend as a numeric string.
Returns:
The difference ``a - b`` as a string.
"""
return str(float(a.replace(",", "")) - float(b.replace(",", "")))
holodeck chat agent.yaml # then ask: "What is 206588 minus 1500?"
# Agent calls subtract(a="206588", b="1500") → 205088.0
How it works¶
HoloDeck supports five tool types: function (Python callables), vectorstore (semantic search), hierarchical_document (structure-aware hybrid search), mcp (Model Context Protocol servers), and prompt (LLM-powered semantic functions — schema-defined, execution still planned). Both backends load tools at agent-startup, derive each tool's schema from its config (function type hints, MCP server discovery, etc.), and surface them to the model. RAG tools (vectorstore, hierarchical_document) embed your data — see Vector Stores for provider setup. MCP servers run as subprocesses managed by the agent lifecycle — see the MCP CLI guide for discovery and config. The sections below are condensed reference for each type.
Common fields¶
All tools share name (1–100 chars, alphanumeric + underscores, unique on the agent — used to reference the tool in test cases and logs), description (≤500 chars, helps the model decide when to call it), and type (the discriminator that determines which extra fields apply).
tools:
- name: search_kb
description: Search company knowledge base for answers
type: vectorstore
| Tool Type | Use case | Status |
|---|---|---|
function |
Custom Python logic | ✅ Implemented |
vectorstore |
Semantic search over data | ✅ Implemented |
hierarchical_document |
Structure-aware hybrid document search | ✅ Implemented |
mcp |
Model Context Protocol servers (stdio) | ✅ Implemented |
prompt |
LLM-powered semantic functions | 🚧 Planned |
Function tools ✅¶
Execute custom Python functions defined in your project. Function tools are imported at agent-config load time via importlib.util.spec_from_file_location, validated as callable, and surfaced to the model with the function's signature and docstring as the tool's parameter schema and description. src/holodeck/lib/function_tool_loader.py:load_function_tool is shared by both backends.
Use when: custom business logic the agent calls deterministically; pure functions (math, parsing, formatting); lightweight integrations with libraries already in your venv; anything where an MCP subprocess would be overkill.
Fields¶
| Field | Type | Required | Notes |
|---|---|---|---|
name |
string | Yes | Tool identifier (unique on agent) |
description |
string | Yes | Shown to the model — the function's docstring complements it |
type |
"function" |
Yes | Discriminator |
file |
string (path) | Yes | Python file containing the function. Resolved relative to the directory holding agent.yaml (absolute paths also work) |
function |
string | Yes | Name of the callable inside file |
parameters |
dict | No | Optional schema for introspection. Backends derive JSON schema from type annotations when omitted |
defer_loading |
bool | No (true) |
When true, excluded from the initial tool context and surfaced via tool filtering on demand. Set false for tools the agent should always have available |
See src/holodeck/models/tool.py:FunctionTool for the full Pydantic model. A complete working sample lives at sample/financial-assistant/claude/ (wires subtract and divide into an agent).
How loading works¶
At agent-startup (not first call), the shared loader:
- Resolves
tool.filerelative to the agent project root (the directory holdingagent.yaml); absolute paths pass through unchanged. - Builds an import spec and executes the module in isolation under a synthesized name (
holodeck_function_tool_<tool.name>), so two tools with the same source file don't collide. - Looks up
tool.functionand verifies the attribute is callable. - Returns the callable to the backend, which wraps it as a native tool (an OpenAI Agents SDK function tool or a Claude SDK in-process MCP tool).
Any failure — missing file, import error, missing attribute, attribute that isn't callable — is re-raised as a ConfigError keyed on tools.<tool.name>, so the agent fails to start with a clear message instead of crashing mid-test (FR-025).
Best practices¶
- Keep each function focused on a single task — small, well-named tools are easier for the model to pick.
- Add type hints on every parameter and the return, plus a short docstring; both feed the tool schema and description.
- Make the function self-contained — the loader executes the module fresh, so don't rely on side-effects from other modules.
- Return JSON-serializable values (strings, numbers, dicts, lists). Arbitrary objects fail when the backend serializes the result.
- Validate inputs and raise a meaningful
ValueError— the runtime captures it as the tool error and surfaces it to the agent. - Avoid long-running calls; prefer an MCP server when you need a separate process or async I/O at scale.
- Prefer pure functions — trivially testable in isolation (e.g.,
tests/unit/sample/test_financial_tools.py).
Vectorstore tools ✅¶
Semantic search over unstructured or structured data — knowledge bases, FAQs, RAG context retrieval.
Anthropic users: embedding provider required
Anthropic does not provide embedding models. When using model.provider: anthropic
with vectorstore tools, you must define embedding_provider at the agent level.
See Embedding Provider.
Basic example¶
- name: search-kb
description: Search knowledge base for answers
type: vectorstore
source: knowledge_base/
Fields¶
source (required) is the data file or directory to index. Supported: single files (.txt, .md, .pdf, .json, .csv), directories (recursively indexed), or remote URLs (auto-cached locally).
| Field | Type | Default | Notes |
|---|---|---|---|
source |
path | — | Required. File, directory, or remote URL to index |
embedding_model |
string | provider default | e.g. text-embedding-3-small, nomic-embed-text:latest |
vector_field |
string | list | auto-detect | Field(s) to vectorize (JSON/CSV). XOR with vector_fields |
meta_fields |
list | all fields | Metadata fields to include in results |
chunk_size |
int | 512 |
Characters per chunk (> 0) |
chunk_overlap |
int | 0 |
Characters to overlap between chunks (≥ 0) |
record_path |
string | — | Dot-path to array in nested JSON, e.g. data.records |
record_prefix |
string | — | Prefix added to record fields |
meta_prefix |
string | — | Prefix added to metadata fields |
database |
object | string | in-memory | Provider config or named reference to config.yaml |
- name: search-docs
description: Search technical documentation
type: vectorstore
source: docs/
embedding_model: text-embedding-3-small
vector_field: [title, content]
meta_fields: [source, date, url]
chunk_size: 1024
chunk_overlap: 128
Embedding provider by backend
- OpenAI / Azure OpenAI / Ollama (OpenAI Agents or Claude backend): use the agent's own model provider for embeddings; the tool's
embedding_modelselects which embedding model from that provider. - Anthropic (Claude backend): Anthropic offers no embedding models — the agent-level
embedding_provideris used instead, andembedding_modelselects the model within it.
Mismatched embedding models across tools
If an agent defines both vectorstore and hierarchical_document tools with
different embedding_model values, HoloDeck raises a validation error at startup.
All embedding-based tools in an agent must share one embedding model (they share a
single embedding provider instance).
Database providers¶
HoloDeck supports multiple vector database backends through vector store connector libraries; switch providers via configuration without changing agent code. Provider options: in-memory (built-in, dev/test), postgres (pgvector), qdrant, chromadb, pinecone, azure-ai-search. Install all at once with uv add holodeck-ai[vectorstores].
# Inline
- name: search-kb
type: vectorstore
source: knowledge_base/
database:
provider: postgres
connection_string: postgresql://user:password@localhost:5432/mydb
# Named reference (database: my-postgres-store) defined in config.yaml
vectorstores:
my-postgres-store:
provider: postgres
connection_string: ${DATABASE_URL}
For full provider setup (Docker, connection strings, Azure AI Search, env vars), see Vector Stores.
Data formats¶
- Text (
.txt,.md): the whole file is chunked and vectorized. - JSON (array of objects): each object is a record;
vector_fieldpicks the text field(s). - JSON (nested): use
record_path: data.recordsto reach the array. - CSV: header row defines fields;
vector_fieldselects which to embed.
Hierarchical document tools ✅¶
Structure-aware hybrid document search combining semantic, keyword, and exact-match modalities with contextual embeddings for superior retrieval quality (Anthropic contextual retrieval approach — ~49% better retrieval).
Anthropic users: embedding provider required
Anthropic does not provide embedding models. When using model.provider: anthropic
with hierarchical document tools, you must define embedding_provider at the agent level.
See Embedding Provider.
Use when: searching structured/legal/regulatory documents with section hierarchy; hybrid search; RAG needing contextual embeddings; documents with heading-based structure (markdown, PDF, Word).
Basic example¶
tools:
- name: docs_search
type: hierarchical_document
description: "Search company documentation"
source: "./docs/"
The tool automatically converts documents to markdown (markitdown), parses hierarchy from headings, chunks by structure (max 800 tokens), generates LLM context per chunk, embeds the contextualized text, and builds BM25 + exact-match indices.
Fields¶
source (required) is a document file or directory (.md, .pdf, .docx, .txt; directories recursively indexed).
Chunking
| Field | Default | Range | Notes |
|---|---|---|---|
chunking_strategy |
structure |
structure, token |
structure parses markdown headings; token uses fixed splitting |
max_chunk_tokens |
800 |
100–2000 | Max tokens per chunk |
chunk_overlap |
50 |
0–200 | Token overlap between chunks |
Document domain (document_domain, default none) selects subsection-detection patterns: us_legislative ((a)(1)(A)(i)), au_legislative ((1)(a)(i)(A)), academic (1./1.1/1.1.1), technical (Step 1, Note:, Warning:), legal_contract (Article I, Section 1, (a) clauses). max_subsection_depth (int, default unlimited) caps nesting.
Search
| Field | Default | Range | Notes |
|---|---|---|---|
search_mode |
hybrid |
semantic, keyword, exact, hybrid |
Search modality |
top_k |
10 |
1–100 | Results to return |
min_score |
None | 0.0–1.0 | Minimum similarity threshold |
semantic_weight |
0.5 |
0.0–1.0 | Hybrid weight |
keyword_weight |
0.3 |
0.0–1.0 | Hybrid weight |
exact_weight |
0.2 |
0.0–1.0 | Hybrid weight |
rrf_k |
60 |
≥ 1 | Reciprocal Rank Fusion constant |
In hybrid mode, the three weights must sum to 1.0. Pick modes by query type: semantic for concepts, keyword for technical terms, exact for section references (e.g. "Section 203(a)(1)"), hybrid for general purpose.
Contextual embeddings (contextual_embeddings, default true) prepend an LLM-generated summary to each chunk before embedding for ~49% better retrieval (Anthropic research); uses Claude Haiku by default. Tune with context_max_tokens (default 100, range 50–200) and context_concurrency (default 10, range 1–50). Same embedding-provider-by-backend rules as vectorstore tools apply.
Feature extraction: extract_definitions (default true) indexes term definitions; extract_cross_references (default true) resolves cross-references between sections.
Reranking: set enable_reranking: true and supply reranker_model (an LLMProvider config) for LLM-based reranking of results.
Storage: database (DatabaseConfig or named reference, default in-memory) — see Vector Stores for provider options. keyword_index configures the BM25 backend (default in-memory rank_bm25).
Keyword index: OpenSearch (production)¶
For large corpora or multi-instance deployments, offload BM25 to an external OpenSearch cluster:
keyword_index:
provider: opensearch
endpoint: "https://search.example.com:9200"
index_name: "my-keyword-index"
username: "${OPENSEARCH_USERNAME}"
password: "${OPENSEARCH_PASSWORD}"
verify_certs: true
timeout_seconds: 10
Fields: provider (in-memory | opensearch), endpoint, index_name, username/password (basic auth) or api_key, verify_certs (default true), timeout_seconds (default 10, range 1–120). For local dev, run OpenSearch via Docker with discovery.type=single-node and DISABLE_SECURITY_PLUGIN=true, then use endpoint: "http://localhost:9200" with no auth. On Linux you may need sudo sysctl -w vm.max_map_count=262144.
Example configurations¶
# Legal/regulatory — section references + definitions
- name: legal_search
type: hierarchical_document
description: "Search legal documents"
source: "./legal/"
search_mode: hybrid
document_domain: us_legislative
extract_definitions: true
extract_cross_references: true
semantic_weight: 0.4
keyword_weight: 0.3
exact_weight: 0.3
# Large corpus with persistence
- name: knowledge_base
type: hierarchical_document
description: "Persistent knowledge base"
source: "./knowledge/"
database:
provider: postgres
connection_string: "${POSTGRES_URL}"
top_k: 20
Result format¶
Each result includes a score, source path, location breadcrumb (Chapter 5 > Section 5.3 > Reporting Requirements), section id, the chunk text, and any relevant definitions:
[1] Score: 0.847 | Source: docs/compliance.md
Location: Chapter 5 > Section 5.3 > Reporting Requirements
Section: sec_5_3
Annual reporting must be submitted within 60 days of the fiscal year end.
Relevant definitions:
• Covered entity: Any organization with annual revenue exceeding...
Performance & cost¶
- Default 800-token chunks work well; increase for dense technical content, decrease for granular retrieval.
- Tune weights: more
semantic_weightfor concepts,keyword_weightfor terminology,exact_weightfor regulatory text. - For >1000 documents, use a persistent database to avoid re-indexing on restart.
- Keep
contextual_embeddingson (default) for the ~49% accuracy gain; cost is ~$0.03 per 100-page document via Claude Haiku. Raisecontext_concurrencyto speed ingestion. - Very large documents are auto-truncated for context while preserving chunk content.
MCP tools ✅¶
Model Context Protocol (MCP) server integrations let agents interact with external systems through a standardized protocol (stdio transport). Browse available servers at the official registry — filesystem, GitHub, Slack, PostgreSQL, and more.
For discovery, adding, listing, and removing servers via the CLI (holodeck mcp search/add/list/remove) and global-vs-agent config precedence, see the MCP CLI guide. This section covers the agent.yaml shape.
Basic example¶
- name: filesystem
description: Read and write files in the workspace
type: mcp
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "./data"]
Fields¶
command (required, stdio) is how the server launches — one of npx (npm packages, auto-installs), node (local .js files), uvx (Python packages via uv), docker (containers). args (required) are the command-line arguments, usually the package name plus config.
| Field | Type | Default | Notes |
|---|---|---|---|
command |
enum | — | Required. npx, node, uvx, docker |
args |
list[str] | — | Required. Server package name + arguments |
transport |
enum | stdio |
Only stdio is implemented |
config |
object | — | Server-specific config (passed via MCP_CONFIG env var) |
env |
object | — | Env vars for the server process; supports ${VAR} substitution |
env_file |
path | — | Load env vars from a .env file (env overrides) |
request_timeout |
int | 30 |
Per-request timeout (seconds) |
encoding |
string | utf-8 |
stdio character encoding |
tools:
- type: mcp
name: filesystem
description: Read and write files in the workspace data directory
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "./sample/data"]
config:
allowed_directories: ["./sample/data"]
request_timeout: 30
Sample servers¶
# GitHub
- name: github
type: mcp
description: GitHub repository operations
command: npx
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "${GITHUB_TOKEN}"
# Python server via uvx (fetch web content)
- name: mcp-server-fetch
type: mcp
description: Fetch web content
command: uvx
args: ["mcp-server-fetch"]
# Local Node.js file
- name: my-custom-server
type: mcp
description: Custom local MCP server
command: node
args: ["./tools/my-mcp-server.js", "--config", "./config.json"]
# Containerized
- name: custom-server
type: mcp
description: Custom containerized server
command: docker
args: ["run", "-i", "--rm", "my-mcp-server:latest"]
Other common servers: @modelcontextprotocol/server-sqlite (database queries), @modelcontextprotocol/server-brave-search (web search, needs BRAVE_API_KEY), @modelcontextprotocol/server-puppeteer (browser automation), basic-memory via uvx (short-term scratchpad across turns).
Prerequisites¶
MCP tools require the runtime for your command: npx/node → Node.js 18+ (nodejs.org), uvx → uv, docker → Docker. Verify with node --version, uvx --version, or docker --version.
Lifecycle¶
MCP plugins are managed automatically: connected on agent startup, tools discovered and registered, and properly closed when the session ends. Always terminate chat sessions cleanly (exit/quit) so servers shut down.
Prompt tools 🚧¶
Status: Planned — configuration schema defined, execution not yet implemented.
LLM-powered semantic functions with Mustache-style template substitution: text generation with templates, specialized task prompts, reusable prompt chains, A/B testing.
Fields¶
Provide either template (inline, ≤5000 chars, {{variable}} syntax) or file (path to an external template, relative to agent.yaml) — not both. parameters (required, ≥1) maps template variables to schemas the agent fills. Optional model overrides the agent's model for this tool.
- name: code-reviewer
description: Review code for best practices
type: prompt
file: prompts/code_review.txt
model:
provider: openai
name: gpt-4
temperature: 0.3
parameters:
code:
type: string
description: Code to review
language:
type: string
description: Programming language
enum: [python, javascript, go, java]
Template syntax supports simple variables ({{name}}), conditionals ({{#if description}}...{{/if}}), and loops ({{#each items}}- {{this}}{{/each}}).
Tool comparison¶
| Feature | Function | Vectorstore | Hierarchical Document | MCP | Prompt |
|---|---|---|---|---|---|
| Status | ✅ Implemented | ✅ Implemented | ✅ Implemented | ✅ Implemented | 🚧 Planned |
| Use case | Custom logic | Search data | Structured document search | External integrations | Template-based |
| Execution | Python function | Vector similarity | Hybrid RRF fusion | MCP protocol (stdio) | LLM generation |
| Setup | Python files | Data files | Document files | Server config + runtime | Template text |
| Parameters | Defined in code | Implicit (search query) | Implicit (search query) | Server-specific tools | Defined in YAML |
| Latency | Low (<10ms) | Medium (~100ms) | Medium (~100-300ms) | Medium (~50-500ms) | High (LLM call) |
| Cost | Internal | Embedding API | Embedding + context LLM | Server resource | LLM tokens |
Troubleshooting¶
Function tools — all resolution failures raise ConfigError at agent startup (keyed on tools.<name>), so a misconfiguration fails fast instead of mid-test:
- File not found:
tool.filedoesn't resolve (relative paths resolve against the agent project root). - Import failure: the module raised during
exec_module— the original traceback is chained. - Function not found / not callable: no matching attribute, or the attribute isn't callable (e.g. a constant).
- Runtime error inside the function: caught by the backend and surfaced to the model as a tool error so the agent can recover.
Vectorstore tools — no data found returns empty results; invalid path or unsupported format errors at startup (config validation).
Hierarchical document tools — no results: check source path and supported format; slow ingestion: split very large PDFs or use persistent storage; missing context: ensure contextual_embeddings: true (default) and that documents have heading structure.
MCP tools — server unavailable, invalid config, or runtime (npx/uvx/docker) not found error at startup; connection timeout is configurable via request_timeout; runtime errors are returned as tool error responses to the LLM.
Prompt tools — invalid template errors at startup; LLM failure is a soft failure (logged, error message returned); template-rendering errors surface during execution.
General best practices¶
Use descriptive names and clear descriptions (the model decides when to call a tool from them). Define parameters clearly, handle errors gracefully, test with realistic data, version tool files in source control, and include test cases that exercise each tool.
Next steps¶
- Agent Configuration — tool usage in context
- Vector Stores — RAG provider setup
- MCP CLI — discovering and managing MCP servers
- File References — path resolution
- Examples — complete tool usage