Tools¶

Tools are agent capabilities defined in agent.yaml. They work on both backends — the OpenAI backend (openai/azure_openai) and the Claude backend (anthropic/ollama) dispatch every tool type through the same loader.

Quick start¶

A minimal agent with one function tool — deterministic enough to verify with holodeck test.

# agent.yaml
name: math-agent

model:
  provider: openai
  name: gpt-4o-mini
  api_key: ${OPENAI_API_KEY}
  temperature: 0.0

instructions:
  inline: "You are a calculator. Use the tools to answer arithmetic questions."

tools:
  - name: subtract
    type: function
    description: "Compute a - b. Pass a and b as numeric strings; returns the difference as a string."
    file: tools/math.py
    function: subtract

# tools/math.py
def subtract(a: str, b: str) -> str:
    """Compute ``a - b`` for two numeric string operands.

    Args:
        a: Minuend as a numeric string (e.g. ``"206588"``).
        b: Subtrahend as a numeric string.

    Returns:
        The difference ``a - b`` as a string.
    """
    return str(float(a.replace(",", "")) - float(b.replace(",", "")))

holodeck chat agent.yaml   # then ask: "What is 206588 minus 1500?"
# Agent calls subtract(a="206588", b="1500") → 205088.0

How it works¶

HoloDeck supports five tool types: function (Python callables), vectorstore (semantic search), hierarchical_document (structure-aware hybrid search), mcp (Model Context Protocol servers), and prompt (LLM-powered semantic functions — schema-defined, execution still planned). Both backends load tools at agent-startup, derive each tool's schema from its config (function type hints, MCP server discovery, etc.), and surface them to the model. RAG tools (vectorstore, hierarchical_document) embed your data — see Vector Stores for provider setup. MCP servers run as subprocesses managed by the agent lifecycle — see the MCP CLI guide for discovery and config. The sections below are condensed reference for each type.

Common fields¶

All tools share name (1–100 chars, alphanumeric + underscores, unique on the agent — used to reference the tool in test cases and logs), description (≤500 chars, helps the model decide when to call it), and type (the discriminator that determines which extra fields apply).

tools:
  - name: search_kb
    description: Search company knowledge base for answers
    type: vectorstore

Tool Type	Use case	Status
`function`	Custom Python logic	✅ Implemented
`vectorstore`	Semantic search over data	✅ Implemented
`hierarchical_document`	Structure-aware hybrid document search	✅ Implemented
`mcp`	Model Context Protocol servers (stdio)	✅ Implemented
`prompt`	LLM-powered semantic functions	🚧 Planned

Function tools ✅¶

Execute custom Python functions defined in your project. Function tools are imported at agent-config load time via importlib.util.spec_from_file_location, validated as callable, and surfaced to the model with the function's signature and docstring as the tool's parameter schema and description. src/holodeck/lib/function_tool_loader.py:load_function_tool is shared by both backends.

Use when: custom business logic the agent calls deterministically; pure functions (math, parsing, formatting); lightweight integrations with libraries already in your venv; anything where an MCP subprocess would be overkill.

Fields¶

Field	Type	Required	Notes
`name`	string	Yes	Tool identifier (unique on agent)
`description`	string	Yes	Shown to the model — the function's docstring complements it
`type`	`"function"`	Yes	Discriminator
`file`	string (path)	Yes	Python file containing the function. Resolved relative to the directory holding `agent.yaml` (absolute paths also work)
`function`	string	Yes	Name of the callable inside `file`
`parameters`	dict	No	Optional schema for introspection. Backends derive JSON schema from type annotations when omitted
`defer_loading`	bool	No (`true`)	When `true`, excluded from the initial tool context and surfaced via tool filtering on demand. Set `false` for tools the agent should always have available

See src/holodeck/models/tool.py:FunctionTool for the full Pydantic model. A complete working sample lives at sample/financial-assistant/claude/ (wires subtract and divide into an agent).

How loading works¶

At agent-startup (not first call), the shared loader:

Resolves tool.file relative to the agent project root (the directory holding agent.yaml); absolute paths pass through unchanged.
Builds an import spec and executes the module in isolation under a synthesized name (holodeck_function_tool_<tool.name>), so two tools with the same source file don't collide.
Looks up tool.function and verifies the attribute is callable.
Returns the callable to the backend, which wraps it as a native tool (an OpenAI Agents SDK function tool or a Claude SDK in-process MCP tool).

Any failure — missing file, import error, missing attribute, attribute that isn't callable — is re-raised as a ConfigError keyed on tools.<tool.name>, so the agent fails to start with a clear message instead of crashing mid-test (FR-025).

Best practices¶

Keep each function focused on a single task — small, well-named tools are easier for the model to pick.
Add type hints on every parameter and the return, plus a short docstring; both feed the tool schema and description.
Make the function self-contained — the loader executes the module fresh, so don't rely on side-effects from other modules.
Return JSON-serializable values (strings, numbers, dicts, lists). Arbitrary objects fail when the backend serializes the result.
Validate inputs and raise a meaningful ValueError — the runtime captures it as the tool error and surfaces it to the agent.
Avoid long-running calls; prefer an MCP server when you need a separate process or async I/O at scale.
Prefer pure functions — trivially testable in isolation (e.g., tests/unit/sample/test_financial_tools.py).

Vectorstore tools ✅¶

Semantic search over unstructured or structured data — knowledge bases, FAQs, RAG context retrieval.

Anthropic users: embedding provider required

Anthropic does not provide embedding models. When using model.provider: anthropic with vectorstore tools, you must define embedding_provider at the agent level. See Embedding Provider.

Basic example¶

- name: search-kb
  description: Search knowledge base for answers
  type: vectorstore
  source: knowledge_base/

Fields¶

source (required) is the data file or directory to index. Supported: single files (.txt, .md, .pdf, .json, .csv), directories (recursively indexed), or remote URLs (auto-cached locally).

Field	Type	Default	Notes
`source`	path	—	Required. File, directory, or remote URL to index
`embedding_model`	string	provider default	e.g. `text-embedding-3-small`, `nomic-embed-text:latest`
`vector_field`	string \| list	auto-detect	Field(s) to vectorize (JSON/CSV). XOR with `vector_fields`
`meta_fields`	list	all fields	Metadata fields to include in results
`chunk_size`	int	`512`	Characters per chunk (> 0)
`chunk_overlap`	int	`0`	Characters to overlap between chunks (≥ 0)
`record_path`	string	—	Dot-path to array in nested JSON, e.g. `data.records`
`record_prefix`	string	—	Prefix added to record fields
`meta_prefix`	string	—	Prefix added to metadata fields
`database`	object \| string	in-memory	Provider config or named reference to `config.yaml`

- name: search-docs
  description: Search technical documentation
  type: vectorstore
  source: docs/
  embedding_model: text-embedding-3-small
  vector_field: [title, content]
  meta_fields: [source, date, url]
  chunk_size: 1024
  chunk_overlap: 128

Embedding provider by backend

OpenAI / Azure OpenAI / Ollama (OpenAI Agents or Claude backend): use the agent's own model provider for embeddings; the tool's embedding_model selects which embedding model from that provider.
Anthropic (Claude backend): Anthropic offers no embedding models — the agent-level embedding_provider is used instead, and embedding_model selects the model within it.

Mismatched embedding models across tools

If an agent defines both vectorstore and hierarchical_document tools with different embedding_model values, HoloDeck raises a validation error at startup. All embedding-based tools in an agent must share one embedding model (they share a single embedding provider instance).

Database providers¶

HoloDeck supports multiple vector database backends through vector store connector libraries; switch providers via configuration without changing agent code. Provider options: in-memory (built-in, dev/test), postgres (pgvector), qdrant, chromadb, pinecone, azure-ai-search. Install all at once with uv add holodeck-ai[vectorstores].

# Inline
- name: search-kb
  type: vectorstore
  source: knowledge_base/
  database:
    provider: postgres
    connection_string: postgresql://user:password@localhost:5432/mydb

# Named reference (database: my-postgres-store) defined in config.yaml
vectorstores:
  my-postgres-store:
    provider: postgres
    connection_string: ${DATABASE_URL}

For full provider setup (Docker, connection strings, Azure AI Search, env vars), see Vector Stores.

Data formats¶

Text (.txt, .md): the whole file is chunked and vectorized.
JSON (array of objects): each object is a record; vector_field picks the text field(s).
JSON (nested): use record_path: data.records to reach the array.
CSV: header row defines fields; vector_field selects which to embed.

Hierarchical document tools ✅¶

Structure-aware hybrid document search combining semantic, keyword, and exact-match modalities with contextual embeddings for superior retrieval quality (Anthropic contextual retrieval approach — ~49% better retrieval).

Anthropic users: embedding provider required

Anthropic does not provide embedding models. When using model.provider: anthropic with hierarchical document tools, you must define embedding_provider at the agent level. See Embedding Provider.

Use when: searching structured/legal/regulatory documents with section hierarchy; hybrid search; RAG needing contextual embeddings; documents with heading-based structure (markdown, PDF, Word).

Basic example¶

tools:
  - name: docs_search
    type: hierarchical_document
    description: "Search company documentation"
    source: "./docs/"

The tool automatically converts documents to markdown (markitdown), parses hierarchy from headings, chunks by structure (max 800 tokens), generates LLM context per chunk, embeds the contextualized text, and builds BM25 + exact-match indices.

Fields¶

source (required) is a document file or directory (.md, .pdf, .docx, .txt; directories recursively indexed).

Chunking

Field	Default	Range	Notes
`chunking_strategy`	`structure`	`structure`, `token`	`structure` parses markdown headings; `token` uses fixed splitting
`max_chunk_tokens`	`800`	100–2000	Max tokens per chunk
`chunk_overlap`	`50`	0–200	Token overlap between chunks

Document domain (document_domain, default none) selects subsection-detection patterns: us_legislative ((a)(1)(A)(i)), au_legislative ((1)(a)(i)(A)), academic (1./1.1/1.1.1), technical (Step 1, Note:, Warning:), legal_contract (Article I, Section 1, (a) clauses). max_subsection_depth (int, default unlimited) caps nesting.

Search

Field	Default	Range	Notes
`search_mode`	`hybrid`	`semantic`, `keyword`, `exact`, `hybrid`	Search modality
`top_k`	`10`	1–100	Results to return
`min_score`	None	0.0–1.0	Minimum similarity threshold
`semantic_weight`	`0.5`	0.0–1.0	Hybrid weight
`keyword_weight`	`0.3`	0.0–1.0	Hybrid weight
`exact_weight`	`0.2`	0.0–1.0	Hybrid weight
`rrf_k`	`60`	≥ 1	Reciprocal Rank Fusion constant

In hybrid mode, the three weights must sum to 1.0. Pick modes by query type: semantic for concepts, keyword for technical terms, exact for section references (e.g. "Section 203(a)(1)"), hybrid for general purpose.

Contextual embeddings (contextual_embeddings, default true) prepend an LLM-generated summary to each chunk before embedding for ~49% better retrieval (Anthropic research); uses Claude Haiku by default. Tune with context_max_tokens (default 100, range 50–200) and context_concurrency (default 10, range 1–50). Same embedding-provider-by-backend rules as vectorstore tools apply.

Feature extraction: extract_definitions (default true) indexes term definitions; extract_cross_references (default true) resolves cross-references between sections.

Reranking: set enable_reranking: true and supply reranker_model (an LLMProvider config) for LLM-based reranking of results.

Storage: database (DatabaseConfig or named reference, default in-memory) — see Vector Stores for provider options. keyword_index configures the BM25 backend (default in-memory rank_bm25).

Keyword index: OpenSearch (production)¶

For large corpora or multi-instance deployments, offload BM25 to an external OpenSearch cluster:

keyword_index:
  provider: opensearch
  endpoint: "https://search.example.com:9200"
  index_name: "my-keyword-index"
  username: "${OPENSEARCH_USERNAME}"
  password: "${OPENSEARCH_PASSWORD}"
  verify_certs: true
  timeout_seconds: 10

Fields: provider (in-memory | opensearch), endpoint, index_name, username/password (basic auth) or api_key, verify_certs (default true), timeout_seconds (default 10, range 1–120). For local dev, run OpenSearch via Docker with discovery.type=single-node and DISABLE_SECURITY_PLUGIN=true, then use endpoint: "http://localhost:9200" with no auth. On Linux you may need sudo sysctl -w vm.max_map_count=262144.

Example configurations¶

# Legal/regulatory — section references + definitions
- name: legal_search
  type: hierarchical_document
  description: "Search legal documents"
  source: "./legal/"
  search_mode: hybrid
  document_domain: us_legislative
  extract_definitions: true
  extract_cross_references: true
  semantic_weight: 0.4
  keyword_weight: 0.3
  exact_weight: 0.3

# Large corpus with persistence
- name: knowledge_base
  type: hierarchical_document
  description: "Persistent knowledge base"
  source: "./knowledge/"
  database:
    provider: postgres
    connection_string: "${POSTGRES_URL}"
  top_k: 20

Result format¶

Each result includes a score, source path, location breadcrumb (Chapter 5 > Section 5.3 > Reporting Requirements), section id, the chunk text, and any relevant definitions:

[1] Score: 0.847 | Source: docs/compliance.md
Location: Chapter 5 > Section 5.3 > Reporting Requirements
Section: sec_5_3

Annual reporting must be submitted within 60 days of the fiscal year end.

Relevant definitions:
  • Covered entity: Any organization with annual revenue exceeding...

Performance & cost¶

Default 800-token chunks work well; increase for dense technical content, decrease for granular retrieval.
Tune weights: more semantic_weight for concepts, keyword_weight for terminology, exact_weight for regulatory text.
For >1000 documents, use a persistent database to avoid re-indexing on restart.
Keep contextual_embeddings on (default) for the ~49% accuracy gain; cost is ~$0.03 per 100-page document via Claude Haiku. Raise context_concurrency to speed ingestion.
Very large documents are auto-truncated for context while preserving chunk content.

MCP tools ✅¶

Model Context Protocol (MCP) server integrations let agents interact with external systems through a standardized protocol (stdio transport). Browse available servers at the official registry — filesystem, GitHub, Slack, PostgreSQL, and more.

For discovery, adding, listing, and removing servers via the CLI (holodeck mcp search/add/list/remove) and global-vs-agent config precedence, see the MCP CLI guide. This section covers the agent.yaml shape.

Basic example¶

- name: filesystem
  description: Read and write files in the workspace
  type: mcp
  command: npx
  args: ["-y", "@modelcontextprotocol/server-filesystem", "./data"]

Fields¶

command (required, stdio) is how the server launches — one of npx (npm packages, auto-installs), node (local .js files), uvx (Python packages via uv), docker (containers). args (required) are the command-line arguments, usually the package name plus config.

Field	Type	Default	Notes
`command`	enum	—	Required. `npx`, `node`, `uvx`, `docker`
`args`	list[str]	—	Required. Server package name + arguments
`transport`	enum	`stdio`	Only `stdio` is implemented
`config`	object	—	Server-specific config (passed via `MCP_CONFIG` env var)
`env`	object	—	Env vars for the server process; supports `${VAR}` substitution
`env_file`	path	—	Load env vars from a `.env` file (`env` overrides)
`request_timeout`	int	`30`	Per-request timeout (seconds)
`encoding`	string	`utf-8`	stdio character encoding

tools:
  - type: mcp
    name: filesystem
    description: Read and write files in the workspace data directory
    command: npx
    args: ["-y", "@modelcontextprotocol/server-filesystem", "./sample/data"]
    config:
      allowed_directories: ["./sample/data"]
    request_timeout: 30

Sample servers¶

# GitHub
- name: github
  type: mcp
  description: GitHub repository operations
  command: npx
  args: ["-y", "@modelcontextprotocol/server-github"]
  env:
    GITHUB_PERSONAL_ACCESS_TOKEN: "${GITHUB_TOKEN}"

# Python server via uvx (fetch web content)
- name: mcp-server-fetch
  type: mcp
  description: Fetch web content
  command: uvx
  args: ["mcp-server-fetch"]

# Local Node.js file
- name: my-custom-server
  type: mcp
  description: Custom local MCP server
  command: node
  args: ["./tools/my-mcp-server.js", "--config", "./config.json"]

# Containerized
- name: custom-server
  type: mcp
  description: Custom containerized server
  command: docker
  args: ["run", "-i", "--rm", "my-mcp-server:latest"]

Other common servers: @modelcontextprotocol/server-sqlite (database queries), @modelcontextprotocol/server-brave-search (web search, needs BRAVE_API_KEY), @modelcontextprotocol/server-puppeteer (browser automation), basic-memory via uvx (short-term scratchpad across turns).

Prerequisites¶

MCP tools require the runtime for your command: npx/node → Node.js 18+ (nodejs.org), uvx → uv, docker → Docker. Verify with node --version, uvx --version, or docker --version.

Lifecycle¶

MCP plugins are managed automatically: connected on agent startup, tools discovered and registered, and properly closed when the session ends. Always terminate chat sessions cleanly (exit/quit) so servers shut down.

Prompt tools 🚧¶

Status: Planned — configuration schema defined, execution not yet implemented.

LLM-powered semantic functions with Mustache-style template substitution: text generation with templates, specialized task prompts, reusable prompt chains, A/B testing.

Fields¶

Provide either template (inline, ≤5000 chars, {{variable}} syntax) or file (path to an external template, relative to agent.yaml) — not both. parameters (required, ≥1) maps template variables to schemas the agent fills. Optional model overrides the agent's model for this tool.

- name: code-reviewer
  description: Review code for best practices
  type: prompt
  file: prompts/code_review.txt
  model:
    provider: openai
    name: gpt-4
    temperature: 0.3
  parameters:
    code:
      type: string
      description: Code to review
    language:
      type: string
      description: Programming language
      enum: [python, javascript, go, java]

Template syntax supports simple variables ({{name}}), conditionals ({{#if description}}...{{/if}}), and loops ({{#each items}}- {{this}}{{/each}}).

Tool comparison¶

Feature	Function	Vectorstore	Hierarchical Document	MCP	Prompt
Status	✅ Implemented	✅ Implemented	✅ Implemented	✅ Implemented	🚧 Planned
Use case	Custom logic	Search data	Structured document search	External integrations	Template-based
Execution	Python function	Vector similarity	Hybrid RRF fusion	MCP protocol (stdio)	LLM generation
Setup	Python files	Data files	Document files	Server config + runtime	Template text
Parameters	Defined in code	Implicit (search query)	Implicit (search query)	Server-specific tools	Defined in YAML
Latency	Low (<10ms)	Medium (~100ms)	Medium (~100-300ms)	Medium (~50-500ms)	High (LLM call)
Cost	Internal	Embedding API	Embedding + context LLM	Server resource	LLM tokens

Troubleshooting¶

Function tools — all resolution failures raise ConfigError at agent startup (keyed on tools.<name>), so a misconfiguration fails fast instead of mid-test:

File not found: tool.file doesn't resolve (relative paths resolve against the agent project root).
Import failure: the module raised during exec_module — the original traceback is chained.
Function not found / not callable: no matching attribute, or the attribute isn't callable (e.g. a constant).
Runtime error inside the function: caught by the backend and surfaced to the model as a tool error so the agent can recover.

Vectorstore tools — no data found returns empty results; invalid path or unsupported format errors at startup (config validation).

Hierarchical document tools — no results: check source path and supported format; slow ingestion: split very large PDFs or use persistent storage; missing context: ensure contextual_embeddings: true (default) and that documents have heading structure.

MCP tools — server unavailable, invalid config, or runtime (npx/uvx/docker) not found error at startup; connection timeout is configurable via request_timeout; runtime errors are returned as tool error responses to the LLM.

Prompt tools — invalid template errors at startup; LLM failure is a soft failure (logged, error message returned); template-rendering errors surface during execution.

General best practices¶

Use descriptive names and clear descriptions (the model decides when to call a tool from them). Define parameters clearly, handle errors gracefully, test with realistic data, version tool files in source control, and include test cases that exercise each tool.

Next steps¶

Agent Configuration — tool usage in context
Vector Stores — RAG provider setup
MCP CLI — discovering and managing MCP servers
File References — path resolution
Examples — complete tool usage