Changelog¶

All notable changes to HoloDeck will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased ¶

Planned Features¶

Deployment Engine: Registry push (holodeck deploy push) and cloud deployment (holodeck deploy run)
Plugin System: Pre-built plugin packages for common integrations

0.6.2 - 2026-04-19¶

Added¶

Evaluation-Runs Dashboard (Feature #031) --Interactive Dash UI for holodeck test run history
holodeck test view CLI launches the Dash dashboard as a subprocess with SIGINT forwarding; --seed renders the built-in golden fixture (24 runs, 6 prompt versions) for demos
Summary view: pass-rate time series, metric breakdown, sortable/filterable run table, CSV export, prompt-version boundary markers
Explorer view: three-column drilldown (runs -> cases -> detail) with full conversation thread, tool-call panels, expected-tools coverage, agent-config snapshot, and metric rows
Compare view: side-by-side diff for up to three runs with a persistent floating compare tray (quick-add buttons for 2 or 3 most-recent runs)
Dynamic reload: mtime-aware run memo + dcc.Interval polls the results directory every 5 s so new holodeck test output appears without restarting the dashboard
Conversation rendering: assistant bubble renders Markdown for prose replies and prettified JSON for structured-output replies (strict {/[ prefix + successful json.loads detection)
Optional [dashboard] extra bundling Dash >=3.0, Plotly >=5.20, pandas >=2.0; holodeck test view prints an install hint and exits 2 if the extra is missing
Prompt Version Frontmatter (Feature #031 US2) --Optional YAML frontmatter on instructions.file to version and label system prompts
Recognised keys: version, author, description, tags; any other keys are preserved under extra on the persisted run record
Auto-derives version: auto-<sha256[:8]> from the body hash when no manual version: is supplied, so every run still has a stable identifier
Inline instructions (instructions.inline) resolve to source: inline and skip frontmatter parsing entirely
Prompt body reaching the LLM is byte-equivalent (frontmatter is stripped); malformed frontmatter fails with a clear ConfigError instead of silently shipping a broken header
EvalRun Persistence (Feature #031 US1, US3) --Every holodeck test invocation writes results/<slugify(agent.name)>/<ISO-timestamp>.json
EvalRunMetadata snapshot captures prompt version, agent config (redacted via deep-copy), git commit, HoloDeck version, environment, and timing
Secrets-bearing fields on the snapshotted Agent are automatically redacted

Changed¶

Retrieval Tool Classification --_get_retrieval_tool_names in the test runner now recognises HierarchicalDocumentToolConfig as a retrieval source (alongside VectorstoreTool and is_retrieval=True MCP tools), so RAG metrics receive retrieval_context from hierarchical-document search hits
Executor Pass-Rate Unit --Aligned ExecutionResult.pass_rate with the canonical 0..100 scale used by ReportSummary; the dashboard still computes the fraction from passed / total_tests for producer-robustness

Fixed¶

Dashboard Filter Chip Bubbling --Filter chips, the runs-search input, and row-click handlers no longer collide with the compare-queue add button (#T442)
KPI Unit Rendering --Collapsed the pass-rate % and duration ms suffixes into a single span so the KPI tiles no longer wrap mid-number

Security¶

pip-audit: 20 CVEs resolved in direct + transitive dependencies
aiohttp >=3.13.3 -> >=3.13.4 (10 CVEs: CVE-2026-34513 through CVE-2026-34520, CVE-2026-34525, CVE-2026-22815)
authlib >=1.6.9 -> >=1.6.11 (GHSA-jj8c-mmj3-mmgv)
cryptography >=46.0.6 -> >=46.0.7 (CVE-2026-39892)
pypdf >=6.9.1 -> >=6.10.2 (CVE-2026-40260, GHSA-jj6c-8h6c-hppx, GHSA-4pxv-j86v-mhcw, GHSA-7gw9-cf7v-778f, GHSA-x284-j5p8-9c5p)
pytest >=7.4.0 -> >=9.0.3 (CVE-2025-71176; exclude 9.0.1, 9.0.2)
pillow constraint >=12.1.1 -> >=12.2.0 (CVE-2026-40192)
python-multipart constraint >=0.0.22 -> >=0.0.26 (CVE-2026-40347)
pygments CVE-2026-4539 (ReDoS in AdlLexer) continues to be ignored via --ignore-vuln --no upstream fix yet; HoloDeck never invokes the ADL lexer

Dependencies¶

python-frontmatter >=1.1,<2.0 (new core dep for prompt-version YAML parsing)
dash >=3.0,<4.0 (new optional [dashboard] extra)
plotly >=5.20,<7.0 (new optional [dashboard] extra)
pandas >=2.0 (new optional [dashboard] extra)

Documentation¶

New Dashboard Guide covering holodeck test view, run-history discovery, prompt versioning, conversation rendering, filters, and security notes
Agent Configuration guide: Prompt Versioning section documenting supported frontmatter keys and auto-derived version behaviour
README: Evaluation Dashboard and prompt-versioning sections in the Quick Start

Testing¶

14 new dashboard unit tests (_render_assistant_body across prose/JSON/mixed inputs; _results_dir_fingerprint + get_runs memo invalidation)
Full dashboard suite: 37 tests passing

0.6.1 - 2026-04-12¶

Added¶

Auto-Detect Pip Extras for Deploy Build (#298) --holodeck deploy build now inspects the agent config and installs the matching optional extras (chromadb, azure-blob, s3, claude-otel) in the generated Dockerfile, eliminating the need to modify the base image or apply runtime workarounds
Explicit Azure OpenAI api_version Field --New api_version field on LLMProvider for per-agent control of the Azure OpenAI API version

Changed¶

Azure OpenAI Default API Version --Pinned to 2024-10-21 (latest GA) to prevent Semantic Kernel from defaulting to unsupported preview versions (e.g., 2025-08-28 in SK 1.41.1)

Fixed¶

.gitignore --Added .env.* patterns so Docker env files with secrets are not accidentally committed

0.6.0 - 2026-03-28¶

Added¶

Custom Anthropic Endpoint Support --Route the Claude Agent SDK to third-party Anthropic-compatible endpoints (Ollama, LiteLLM, etc.)
New AuthProvider.custom validates ANTHROPIC_AUTH_TOKEN environment variable
model.endpoint is forwarded as ANTHROPIC_BASE_URL to the SDK subprocess when set for Anthropic providers
Warning validator in LLMProvider when auth_provider: custom is used without an endpoint
Ollama legal-assistant sample under sample/legal-assistant/ollama/
Async Tool Initialization Endpoints (Feature #025) --Background tool data ingestion via REST API
POST /tools/{tool_name}/init --Trigger background vectorstore/hierarchical_document ingestion (201 Created with Location header)
GET /tools/{tool_name}/init --Poll initialization status with progress tracking (documents_processed/total_documents)
GET /tools --List all agent tools with initialization state (pending/in_progress/completed/failed/not_started)
RFC 7807 ProblemDetail error responses (400/404/409/429/503)
ToolInitManager orchestrator with configurable concurrency limiting (default 3), conflict detection, and graceful shutdown
SourceResolver for unified local/S3/Azure Blob/HTTP source resolution with temp directory cleanup
initialize_single_tool() async function with source override and progress callback support
OTel spans for full init job lifecycle (start, progress, complete, failed)
OpenAPI 3.1 contract specification (specs/025-tool-init-endpoints/contracts/openapi.yaml)

Changed¶

GitHub Actions Claude Code review workflow: added concurrency control, pull-requests: write permission, file-based body submission

Fixed¶

CI/CD: Claude code review posting now works correctly (#291)

Dependencies¶

httpx >=0.27 (new core dependency --async HTTP for remote source downloads)
boto3 >=1.42.0 (new optional extra [s3])
azure-storage-blob >=12.19 (new optional extra [azure-blob])
cryptography: 46.0.5 -> 46.0.6
nltk: 3.9.3 -> 3.9.4
pypdf: 6.9.1 -> 6.9.2
requests: 2.32.5 -> 2.33.0

Documentation¶

Complete spec-kit artifacts for Feature #025 (spec, plan, data-model, research, quickstart, OpenAPI contract, task breakdowns)

Testing¶

71 new tool init tests (routes, manager, OTel, integration, re-initialization)
Full suite: 4,321 passed, 47 skipped, 0 failed

0.5.2 - 2026-03-23¶

Documentation¶

API Reference Rewrite --Rewrote all 6 existing docs/api/*.md files to fix broken mkdocstrings references (#286)
Removed nonexistent symbols (config.merge.ConfigMerger, validator.normalize_errors, compute_f1_score, etc.)
Corrected module paths across all reference pages
Added 9 new API reference pages for previously undocumented subsystems:
backends, chat, deploy, serve, services, observability, pdf-processor, tool-filter, tools
Updated mkdocs.yml navigation to include all 15 API reference pages
Build validation: mkdocs build --strict passes with 0 warnings/errors

0.5.1 - 2026-03-23¶

Added¶

Claude Backend Support for holodeck serve (Feature #024 US1) --Full Claude Agent SDK integration with the server command (#282)
Task-bound session actor wrapping Claude SDK sessions for anyio task-group binding
BackendSessionError exception for propagating backend failures to protocols
max_concurrent_sessions config field (1-100, default 10) for capacity management
backend_ready and backend_diagnostics health check fields in server models
Pre-flight validation (Node.js >=18, API credentials) during server startup
Graceful lifecycle with 5s shutdown timeouts and parallel session teardown
Real-Time Tool Streaming via Claude Agent SDK Hooks (#283)
PreToolUse, PostToolUse, PostToolUseFailure hook integration
ToolEvent dataclass for provider-agnostic tool event abstraction
Queue-based event passthrough: ClaudeSession -> _TaskBoundSession -> AgentExecutor -> AG-UI protocol
Concurrent tool event queue draining with fallback to post-hoc emission for non-hook backends
Node.js Conditional Installation for Claude Agent Containers (Feature #024 US2) (#284)
Auto-detection of Anthropic provider in Dockerfile generation
needs_nodejs Jinja2 conditional for Node.js 22.x via nodesource
OpenTelemetry GenAI Semantic Convention Instrumentation (#271)
otel-instrumentation-claude-agent-sdk activation for invoke_agent and execute_tool spans
get_observability_context() accessor function
Configurable schedule_delay_millis in TracingConfig (default 2000ms)
New claude-otel optional dependency group
Specification Artifacts
Feature #024 (Claude Serve/Deploy Parity): full spec, plan, data-model, quickstart, 139 tasks across 5 user stories
Feature #023 (Choose Your Backend): full spec, plan, data-model, quickstart, research, per-story task files

Changed¶

Configuration System Overhaul (#272)
Fixed provider resolution to match by dict key with .provider field fallback
Enabled multiple providers of the same type (e.g., openai_prod vs openai_dev)
Fixed config merging: deep-merge project + user configs instead of replacement
Fixed lossy YAML roundtrip: substitute env vars before first parse
Added load_agent_with_config() helper eliminating 15-line boilerplate in 3 CLI commands
Added ConfigLoader caching to eliminate double file I/O
Consolidated 3 deep-merge implementations into single _deep_merge utility
Deleted ~1,500 lines of dead code: merge.py, ConfigValidator, unused functions
Fixed TOCTOU vulnerability in parse_yaml; narrowed exception catching
Anthropic Cloud Auth Hardening (#269) --Stricter validation for Bedrock, Vertex, and Foundry credential flows

Refactored¶

Test Suite Simplification (#275) --2,120 fewer lines (-3.1%) with zero coverage loss
Consolidated 4 identical DeepEval RAG evaluator test files into single test_rag_evaluators.py with @pytest.mark.parametrize
Extracted shared fixture factories into conftest.py files
Parameterized repetitive test patterns across 30 files (option parsing, exit codes, TTY/non-TTY pairs)
Removed trivial framework-behavior tests (Pydantic field presence, object identity, enum existence)

Fixed¶

Docker SDK ImportError (#285) --Lazy-import docker SDK in deploy/builder.py and deploy/__init__.py to prevent ModuleNotFoundError when [deploy] extra is not installed
Missing execute_tool OTel spans --Fixed ContextVar timing mismatch in holodeck chat; workaround re-injects ContextVar from instance attribute in hooks
Claude subprocess OTel conflict --Set OTEL_TRACES_EXPORTER=none for Claude subprocess when Python-side GenAI instrumentation is active
Web search propagation --web_search flag from ClaudeConfig now correctly wired to allowed_tools in build_options

Security¶

Replaced deprecated safety check (mandatory auth) with pip-audit (free, OSV-based) (#270)
6 CVEs resolved in direct dependencies:
CVE-2026-27962, CVE-2026-28490: authlib >=1.6.9
CVE-2026-30922: pyasn1 >=0.6.3
CVE-2026-27448, CVE-2026-27459: pyopenssl >=26.0.0
CVE-2026-33123: pypdf >=6.9.1
3 additional transitive CVEs patched via constraint dependencies

Dependencies¶

claude-agent-sdk: 0.1.37 -> 0.1.44
cryptography: >=46.0.2 -> >=46.0.5
semantic-kernel: >=1.37.1 -> >=1.39.4
pypdf: >=6.6.0 -> >=6.9.1
werkzeug: >=3.1.5 -> >=3.1.6
authlib: >=1.6.6 -> >=1.6.9
pyasn1: >=0.6.2 -> >=0.6.3 (dev)
New constraint dependencies: jaraco-context >=6.1.0, nltk >=3.9.3, pyopenssl >=26.0.0, pillow >=12.1.1, protobuf >=5.29.6, python-multipart >=0.0.22
New optional group: claude-otel (OpenTelemetry instrumentation for Claude Agent SDK)
Replaced safety with pip-audit >=2.7.0

Documentation¶

CLAUDE.md: added LSP vs Grep guidance, Python 3.10+ note
Hardened Anthropic cloud auth documentation (#269)

Testing¶

44 new Claude backend instrumentation tests
259 serve module tests (Claude backend support)
203 backend/validator/config tests
20 Node.js conditional Dockerfile tests
Integration tests with OTLP exporter for Aspire dashboard
All 3,813 unit tests passing; full regression: 4,124 passed

0.5.0 - 2026-02-24¶

Added¶

Claude Agent SDK Backend --Native Anthropic provider support as a first-class backend (#021)
ClaudeBackend and ClaudeSession implementing provider-agnostic AgentBackend/AgentSession protocols
BackendSelector auto-routes agents by model.provider (anthropic -> Claude SDK, openai/azure/ollama -> Semantic Kernel)
ClaudeConfig Pydantic model with extended thinking, web search, bash, file system, subagents, and permission mode settings
AuthProvider enum supporting api_key, oauth_token, bedrock, vertex, foundry credential flows
Subprocess retry with exponential backoff (3 attempts) and max_turns exceeded detection
Structured output validation via JSON schema (file or inline)
Multi-Backend Abstraction Layer --Protocol-driven architecture decoupling all consumers from provider-specific types
AgentBackend, AgentSession, ContextGenerator protocols in lib/backends/base.py
ExecutionResult dataclass with response, tool calls/results, token usage, structured output, and error tracking
SKBackend/SKSession wrapping existing Semantic Kernel infrastructure behind the same protocols
Claude Tool Adapters --Bridge HoloDeck tools to Claude Agent SDK via in-process MCP server
VectorStoreToolAdapter and HierarchicalDocToolAdapter wrapping initialized tool instances as @tool-decorated handlers
build_holodeck_sdk_server() factory bundling adapters into McpSdkServerConfig
MCP Bridge (mcp_bridge.py) --Translates HoloDeck MCPTool configs to Claude SDK McpStdioServerConfig with env var resolution
OTel Bridge (otel_bridge.py) --Maps ObservabilityConfig to Claude subprocess environment variables (OTLP exporter, protocol, privacy controls)
Startup Validators --6 pre-flight checks called during ClaudeBackend.initialize(): Node.js presence, credential validation, embedding provider validation, tool filtering warnings, working directory collision detection, response format schema validation
ClaudeSDKContextGenerator --ContextGenerator protocol implementation using Claude Agent SDK query() for contextual embeddings with batch prompts, JSON parsing, single-chunk fallback, retry logic, and concurrency control
Shared Tool Initializer (tool_initializer.py) --Provider-agnostic tool initialization used by both SK and Claude backends with 5-tier context generator resolution chain
Shared Instruction Resolver (instruction_resolver.py) --Extracted instruction loading (file or inline) from AgentFactory for cross-backend reuse
embedding_provider field on Agent model --Required when using provider: anthropic with vectorstore/hierarchical_document tools (Anthropic has no native embedding API)
context_model field on HierarchicalDocumentToolConfig --Dedicated LLM for contextual embedding generation, separate from the main agent model
embedding_model field on VectorstoreTool and HierarchicalDocumentToolConfig --Explicit embedding model override with cross-tool conflict detection
PDF Processor Package (lib/pdf_processor/) --Extracted PDF operations into dedicated package (#265)
heading_extractor.py: Dual-strategy heading detection --bookmark-based (preferred) with font-size fallback and fuzzy matching
page_extractor.py: Page-range extraction using pypdf with bounds validation
Multi-Field Keyword Search --Per-field boosting for keyword-based retrieval (#265)
Indexed fields: content (1x), parent_chain (2x), section_id (2x), defined_term (3x), source_file (1x)
Enables structure-aware queries (e.g., "Section 203(a)", "Force Majeure")

Changed¶

Chat Layer Decoupled from Semantic Kernel --AgentExecutor and ChatSessionManager now use BackendSelector + AgentSession abstractions; zero AgentFactory/ChatHistory imports remain in the chat layer
Chat Streaming --New execute_turn_streaming async generator and process_message_streaming for token-by-token output; CLI REPL shows spinner until first chunk arrives
Test Runner Decoupled --TestExecutor uses BackendSelector + ExecutionResult instead of direct AgentFactory; allow_side_effects flag plumbed through to backend selection
AgentFactory Refactored --Now a backward-compatible facade; HierarchicalDocumentTool initialization delegated to shared tool_initializer
Chat History Model --ChatSessionManager.history changed from SK ChatHistory to plain list[dict] for provider neutrality
ContextGenerator Protocol --HierarchicalDocumentTool accepts any ContextGenerator implementation via set_context_generator(), decoupled from LLMContextGenerator and SK chat service

Fixed¶

Context Model YAML Overrides --Fixed silent ignoring of context_model YAML overrides in the SK path by unifying initialization through tool_initializer
Claude Multi-Turn Sessions --Use session_id parameter (SDK 0.1.37 compat) for proper conversation continuity
MCP Tool Communication --Wrap prompts as AsyncIterable to keep stdin open for bidirectional MCP tool communication (fixes ProcessTransport error)
Credential Validation --validate_credentials now returns actual credential values in env dict instead of empty dict
Tool Result Enrichment --_enrich_tool_results correlates tool names from tool_calls to tool_results via call_id for evaluation retrieval_context

Security¶

orjson 3.11.4 -> 3.11.7 --fixes CVE-2025-67221 (DoS via missing recursion depth limits)
wheel 0.45.1 -> 0.46.3 --fixes CVE-2026-24049 (path traversal via extracted file permissions)

Dependencies¶

claude-agent-sdk 0.1.37 --Native Claude Agent SDK integration (new)
authlib 1.6.5 -> 1.6.6
pypdf 6.3.0 -> 6.6.0+ (removed upper bound)
werkzeug 3.1.4 -> 3.1.5+
azure-core >=1.38.0 (new)
orjson 3.11.4 -> 3.11.7
wheel 0.45.1 -> 0.46.3

Documentation¶

Claude Agent SDK configuration guide (docs/guides/agent-configuration.md, docs/guides/llm-providers.md)
Example Claude agent YAML (docs/examples/claude_agent.yaml)
Updated API models reference, README, and AGENTS.md for multi-backend architecture
Agent JSON schema updated (schemas/agent.schema.json)

Known Limitations¶

holodeck serve does not support Claude agents --The server command (holodeck serve) only supports Semantic Kernel backends (OpenAI, Azure OpenAI, Ollama). Agents with provider: anthropic work via holodeck test and holodeck chat but cannot be deployed as HTTP servers yet. Claude agent server/deployment support is planned for a future release.

0.4.0 - 2026-02-07¶

Added¶

HierarchicalDocumentTool --Structure-aware document search with hierarchy preservation (#255)
Markdown heading chain tracking (H1-H6 parent chains)
Domain-aware subsection recognition (US legislative, AU legislative, academic, technical, legal contracts, financial, medical, patent, general)
LLM-based contextual embeddings (Anthropic approach, ~49% improved retrieval accuracy)
Incremental ingestion with mtime-based tracking and --force-ingest override
Hybrid search combining semantic + keyword with configurable weights
Full YAML configuration --no code required
Tiered Keyword Search with RRF Fusion --Automatic strategy selection based on provider capabilities
NATIVE_HYBRID for providers with built-in hybrid search (azure-ai-search, weaviate, qdrant, mongodb, azure-cosmos-nosql)
FALLBACK_BM25 using rank_bm25 + Reciprocal Rank Fusion (k=60) for other providers (postgres, pinecone, chromadb, faiss, in-memory, sql-server)
KeywordSearchProvider Protocol --Pluggable keyword search backend interface with two implementations:
InMemoryBM25KeywordProvider --rank_bm25 in-process for development and local workloads
OpenSearchKeywordProvider --external OpenSearch cluster for production, with configurable auth (basic/API key), TLS, and timeouts
KeywordIndexConfig Model --YAML-configurable keyword search backend selection (in-memory or opensearch) with Pydantic validation
Keyword Search Provider Router --Automatic backend routing with OpenTelemetry span instrumentation for search observability
Shared Tool Utilities (#257) --Extracted reusable infrastructure into lib/tools/:
common.py: file discovery, source path resolution, placeholder embedding generation
base_tool.py: EmbeddingServiceMixin and DatabaseConfigMixin for tool code reuse
Shared Terminal UI Utilities (#256) --Consolidated duplicate code into lib/ui/:
terminal.py: TTY detection
spinner.py: SpinnerMixin for progress animation
colors.py: ANSIColors and colorize() function
Chat history extraction utilities shared between chat and test_runner
HierarchicalDocumentTool Specification (#242) --Full spec-kit artifacts:
spec.md with 8 user stories (P1-P3 priorities)
Implementation plan, data model documentation, quickstart guide
110+ implementation tasks organized by priority

Changed¶

BM25 Score Normalization --Replaced hardcoded /10.0 divisor with max-score normalization; the top result always scores 1.0, others are proportional to the maximum
Async OpenSearch I/O --HybridSearchExecutor.build_keyword_index() and keyword_search() are now async; OpenSearch calls offloaded via asyncio.to_thread(), in-memory BM25 remains direct with zero overhead
KeywordIndexConfig Self-Validation --OpenSearch field validation (endpoint, index_name) moved from parent HierarchicalDocumentToolConfig into KeywordIndexConfig itself via @model_validator, enabling validation regardless of construction context
Chunk Ownership Architecture --HybridSearchExecutor now owns chunk data via internal _chunk_map, eliminating chunk duplication and improving lookup performance
Search Mode Routing --Tool supports KEYWORD, SEMANTIC, and HYBRID search modes with graceful degradation to semantic-only on keyword failure
CLI Error Handling --Extracted error handling into reusable context manager

Removed¶

ExactMatchIndex --Removed unused class, SearchMode.EXACT enum value, _exact_search() method, and exact match routing logic (~485 lines) in favor of unified keyword search

Fixed¶

Hybrid Weight Validation --Enforce semantic_weight + keyword_weight > 0 for hybrid search mode, rejecting invalid weight combinations

Security¶

aiohttp 3.13.2 -> 3.13.3 --fixes 8 CVEs:
CVE-2025-47364 (CRLF injection in redirects)
CVE-2025-49109 (DoS via keepalive infinite loop)
CVE-2025-49110 (DoS via Transfer-Encoding header)
CVE-2025-49111 (DoS via invalid chunk extensions)
CVE-2025-49112 (Proxy header injection)
CVE-2025-49113 (DoS via Content-Length/Transfer-Encoding conflict)
CVE-2025-69229 (DoS via excessive chunked messages)
CVE-2025-69230 (DoS via Cookie header logging)
werkzeug 3.1.4 -> 3.1.5
python-multipart 0.0.20 -> 0.0.22
authlib 1.6.5 -> 1.6.6
pypdf 6.4.0 -> 6.6.2
protobuf 5.29.5 -> 5.29.6
semantic-kernel 1.39.0 -> 1.39.3
wheel 0.45.1 -> 0.46.2

Documentation¶

Hierarchical Document Tools section in tools reference guide
HierarchicalDocumentTool spec, plan, data model, and quickstart artifacts (#242)
Standardized parallel test execution (-n auto) across CLAUDE.md and AGENTS.md

Testing¶

HierarchicalDocumentTool coverage increased from 79% to 97% (26 new test cases)
Comprehensive keyword search test suite: KeywordSearchProvider protocol, InMemoryBM25, OpenSearchKeywordProvider, HybridSearchExecutor, provider routing, OTel spans, graceful degradation
KeywordIndexConfig and HierarchicalDocumentToolConfig model validation tests
Consolidated and removed trivial unit tests for cleaner test suite

0.3.5 - 2026-01-28¶

Added¶

Azure Container Apps Deployment (#234)
holodeck deploy run/status/destroy commands with Azure deployer
Typed BaseDeployer interface with deployment state tracking via Pydantic models
Strongly typed result models and configurable health checks
CLI error handling extracted into reusable context manager
Cross-Architecture Container Builds (#241)
Configurable platform field on deployment config (default: linux/amd64)
Support for building containers on ARM machines (e.g., Apple Silicon) targeting amd64 deployment
Always-fetch base image variant via pull=True

Fixed¶

Dockerfile user permissions for proper file operations in containers
Default base image updated to published ghcr.io/justinbarias/holodeck-base:latest
Removed unused helper functions from deployment module

Testing¶

Deploy build command unit tests
Azure deployer behavior and platform configuration validation tests

Documentation¶

Deployment guide updates for Azure Container Apps

0.3.4 - 2026-01-24¶

Added¶

Deploy Build Command (holodeck deploy build): Build container images from agent configuration
Pydantic deployment configuration models with validation
Dockerfile generation with Jinja2 templates
Container image building via Docker SDK (docker-py)
Tag strategies: git_sha, git_tag, latest, custom
OCI-compliant image labels
--dry-run mode to preview builds without executing
--no-cache flag for fresh builds
HoloDeck Base Image: Pre-built Docker base image for agent containers
Multi-architecture support (linux/amd64, linux/arm64)
GitHub Actions workflow for automated builds
Published to ghcr.io/justinbarias/holodeck-base:latest
Non-root user for security
Health check configuration
OpenCode Speckit Support: Spec-kit slash commands for OpenCode editor
/speckit.specify, /speckit.clarify, /speckit.plan, /speckit.tasks
/speckit.analyze, /speckit.checklist, /speckit.implement
/speckit.constitution, /speckit.taskstoissues

Documentation¶

Comprehensive deployment guide at docs/guides/deployment.md
DIY deployment instructions using the base image
Cloud provider configuration reference (AWS App Runner, GCP Cloud Run, Azure Container Apps)

0.3.3 - 2026-01-17¶

Added¶

Holodeck Init - Support for Vector Store Provider choice: PostgreSQL (pgvector) and Pinecone support

Changed¶

Tool Filtering: Anthropic tool search to reduce token usage
Claude Workflow: use Opus model in Claude workflow

Documentation¶

Tool filtering configuration guide

0.3.2 - 2026-01-10¶

Added¶

DeepEval Evaluation Tracing: Observability support for DeepEval metrics

Fixed¶

Security vulnerabilities identified in dependencies

0.3.1 - 2026-01-09¶

Changed¶

Test Runner Expected Tools: loosened expected_tools validation to allow substring matching

0.3.0 - 2026-01-08¶

Added¶

OpenTelemetry Observability: Full observability instrumentation with GenAI semantic conventions
OpenTelemetry configuration models (traces, metrics, logs)
OTLP export support for traces and metrics
Agent Local Server (holodeck serve): REST API server for agents
FastAPI-based REST endpoints for agent invocation
AG-UI compliant endpoint for agent interaction

0.1.7 - 2025-12-27¶

Added¶

MCP CLI Commands: Complete CLI for managing MCP servers
holodeck mcp search: Search MCP registry for servers
holodeck mcp add: Add MCP servers to configuration
holodeck mcp list: List configured servers (agent and global)
holodeck mcp remove: Remove MCP servers from configuration
Global MCP server merge into agent configurations
Structured Data Ingestion: Loader and vectorstore integration for structured data sources
Vectorstore Reranking: Reranking support for vectorstore search results
Interactive Config Wizard Enhancements:
Template selection step
LLM provider selection step
DeepEval Metrics: DeepEval integration as alternative/complement to Azure AI Evals
CLI Defaults: agent.yaml as default config for chat and test commands
New Package Entrypoint: Added holodeck-ai script entrypoint

Changed¶

Vector Store Providers: Removed Redis support, added PostgreSQL (pgvector), Pinecone, and Qdrant
Documentation: Updated for uv tool install, Ollama as preferred provider
Test Progress/Reporting: Improved display and refactored agent_factory
Schema Validation: Relaxed validation for better flexibility

Fixed¶

Telemetry warning in CLI
CNAME configuration bug

0.1.6 - 2025-11-28¶

Added¶

MCP Tool Integration: Full Model Context Protocol (MCP) tool support with stdio transport
MCP server configuration and connection management
Tool discovery and invocation via MCP protocol

Fixed¶

Instruction loading issues in agent configuration

0.1.5 - 2025-11-27¶

Added¶

Project and User Config Support: Execution config resolution now supports project-level and user-level configuration files

Fixed¶

ChromaDB connection issues

0.1.4 - 2025-11-27¶

Fixed¶

PyPI release by removing local version identifiers

0.1.3 - 2025-11-27¶

Added¶

ChromaDB Support: Explicit ChromaDB vector store integration

Changed¶

Package Manager: Switched from Poetry to uv for faster dependency management

Fixed¶

Test logging improvements
RedisVL compatibility issues
CLI quiet mode behavior

0.1.2 - 2025-11-26¶

Added¶

Ollama Endpoint Support: Local LLM execution via Ollama
Vector Stores Setup Guide: Comprehensive Redis vector store documentation
Claude Code integration for development assistance

0.1.1 - 2025-11-25¶

Added¶

Semantic Kernel Vector Store Abstractions: Support for all vector store providers (Redis, ChromaDB, etc.)
Agent config execution settings applied to Semantic Kernel

0.1.0 - 2025-11-23¶

Added¶

Chat Models and Validation Pipeline: Scaffold for interactive chat functionality
Markdown Report Generation: Comprehensive test result reporting (T123-T127)
Progress Display Enhancements: Spinner, ANSI colors, elapsed time display
Per-Test Metric Resolution: EvaluationMetric objects for fine-grained metric configuration (T095-T096)
File Processing Improvements: Enhanced file input handling

Changed¶

Consolidated and refactored tests to parameterized tests for better maintainability
Config init command improvements

0.0.14 - 2025-11-15¶

Fixed¶

Poetry development dependencies
MkDocs build step
Poetry version configuration
Various Poetry configuration issues

0.0.7 - 2025-11-08¶

Added¶

Agent Execution Implementation: Core agent execution engine
Evaluators: User Story 1 evaluator implementation
Response Format Definition: Phase 4 implementation (T014-T019)
Global Settings Configuration: Phase 2 & 3 with TDD approach

0.0.6 - 2025-10-25¶

Added¶

holodeck init Command: Complete project initialization with templates
Phase 8: Polish & QA for init command
Phase 7: Project metadata specification (US5)
Phase 5: Sample files and examples generation (US3)
User Story 2: Project template selection (Phase 4)
Core init engine implementation
Basic agent creation from templates
ConfigLoader returns GlobalConfig rather than dict

0.0.5 - 2025-10-20¶

Fixed¶

Version tag configuration

0.0.4 - 2025-10-20¶

Added¶

GitHub release workflow
Automated PyPI publishing

0.0.1 - 2025-10-19¶

Added - User Story 1: Define Agent Configuration¶

Core Features¶

Agent Configuration Schema: Complete YAML-based agent configuration with Pydantic validation
Agent metadata (name, description)
LLM provider configuration (OpenAI, Azure OpenAI, Anthropic)
Model parameters (temperature, max_tokens)
Instructions (inline or file-based)
Tools array with type discrimination
Test cases with expected behavior validation
Evaluation metrics with flexible model configuration
Configuration Loading & Validation (ConfigLoader):
Load and parse agent.yaml files
Validate against Pydantic schema with user-friendly error messages
File path resolution (relative to agent.yaml directory)
Environment variable substitution (${VAR_NAME} pattern)
Precedence hierarchy: agent.yaml > environment variables > global config
Global Configuration Support:
Load ~/.holodeck/config.yaml for system-wide settings
Provider configurations at global level
Tool configurations at global level
Configuration merging with proper precedence

Data Models¶

LLMProvider Model:
Multi-provider support (openai, azure_openai, anthropic)
Model selection and parameter configuration
Temperature range validation (0-2)
Max tokens validation (>0)
Azure-specific endpoint configuration
Tool Models (Discriminated Union):
VectorstoreTool: Vector search with source, embedding model, chunk size/overlap
FunctionTool: Python function tools with parameters schema
MCPTool: Model Context Protocol server integration
PromptTool: AI-powered semantic functions with template support
Tool type validation and discrimination
Evaluation Models:
Metric configuration with name, threshold, enabled flag
Per-metric model override for flexible configuration
AI-powered and NLP metrics support
TestCase Model:
Test inputs with expected behaviors
Ground truth for validation
Expected tool usage tracking
Evaluation metrics per test
Agent Model:
Complete agent definition
All field validations and constraints
Tool and evaluation composition
GlobalConfig Model:
Provider registry
Vectorstore configurations
Deployment settings

Error Handling¶

Custom Exception Hierarchy:
HoloDeckError: Base exception
ConfigError: Configuration-specific errors
ValidationError: Schema validation errors with field details
FileNotFoundError: File resolution errors with path suggestions
Human-Readable Error Messages:
Field names and types in validation errors
Actual vs. expected values
File paths with suggestions
Nested error flattening for complex schemas

Infrastructure & Tooling¶

Development Setup:
Makefile with 30+ development commands
Poetry dependency management
Pre-commit hooks (black, ruff, mypy, detect-secrets)
Python 3.10+ support
Testing:
Unit test suite with 11 test files covering all models
Integration test suite for end-to-end workflows
80%+ code coverage requirement
Test execution: make test, make test-coverage, make test-parallel
Code Quality:
Black code formatting (88 char line length)
Ruff linting (pycodestyle, pyflakes, isort, flake8-bugbear, pyupgrade, pep8-naming, flake8-simplify, bandit)
MyPy type checking with strict settings
Security scanning (safety, bandit, detect-secrets)
Automated pre-commit validation
Documentation:
MkDocs site configuration with Material theme
Getting Started guide (installation, quickstart)
Configuration guides (agent config, tools, evaluations, global config, file references)
Example agent configurations (basic, with tools, with evaluations, with global config)
API reference documentation (ConfigLoader, Pydantic models)
Architecture documentation (configuration loading flow)

Features Summary by Component¶

ConfigLoader API¶

loader = ConfigLoader()
agent = loader.load_agent_yaml("agent.yaml")  # Returns Agent instance

Parse YAML to Agent instances
Automatic environment variable substitution
File reference resolution with validation
Configuration precedence handling
Comprehensive error reporting

Schema Support¶

File References: Instructions and tool definitions can be loaded from files
Environment Variables: ${ENV_VAR} patterns supported throughout configs
Type Discrimination: Tool types automatically validated and parsed
Nested Validation: Complex nested structures validated properly

Testing Coverage¶

Unit Tests (11 files):

test_errors.py - Exception handling and messaging
test_env_loader.py - Environment variable substitution
test_defaults.py - Default configuration handling
test_validator.py - Validation utilities
test_tool_models.py - Tool type validation and discrimination
test_llm_models.py - LLM provider configuration
test_evaluation_models.py - Evaluation metric configuration
test_testcase_models.py - Test case validation
test_agent_models.py - Agent schema validation
test_globalconfig_models.py - Global configuration handling
test_config_loader.py - ConfigLoader functionality

Integration Tests (1 file):

test_config_end_to_end.py - Full workflow testing

Known Limitations¶

Version 0.0.1 Scope¶

CLI Not Implemented: No command-line interface (planned for User Story 2)
No Agent Execution: Agent models are validated but not executed (Phase 2 feature)
No Tool Execution: Tools are defined but not executed (Phase 2 feature)
No Evaluation Engine: Metrics are configured but not executed (Phase 2 feature)
No Deployment: No FastAPI endpoint generation or Docker deployment (Phase 2-3 features)
No Observability: OpenTelemetry integration planned for Phase 2
No Plugin System: Plugin packages not yet available (Phase 3 feature)

Validation Limitations¶

File Validation: Only checks file existence, not content validity
LLM Provider APIs: No actual API testing (would require credentials)
Tool Validation: Type validation only, no runtime validation

Known Issues¶

None reported in 0.0.1.

How to Use This Changelog¶

Unreleased: Features coming in future releases
Semantic Versioning: MAJOR.MINOR.PATCH
MAJOR: Breaking changes or new architecture
MINOR: New features and functionality
PATCH: Bug fixes and improvements
Categories: Added (new features), Changed (modifications), Fixed (bug fixes), Deprecated (to be removed), Removed (deprecated features deleted), Security (security fixes)

Roadmap¶

v0.1 - Core agent engine + CLI
v0.2 - Evaluation framework
v0.3 - API deployment (serve + deploy build)
v0.4 - Hierarchical document search & tiered keyword search
v0.5 - Multi-backend architecture (Claude Agent SDK, OTel, Claude serve support)
v0.6 - Enterprise features (SSO, audit logs, RBAC)
v1.0 - Production-ready release

Previous Versions¶

Development Versions¶

Pre-0.0.1: Architecture planning and vision definition
Project vision (VISION.md)
Architecture documentation
Specification and planning

Contributing¶

See CONTRIBUTING.md for guidelines on:

Development setup
Running tests
Code style requirements
Submitting pull requests

License¶

HoloDeck is released under the MIT License. See LICENSE file for details.

Changelog Format¶

We follow Keep a Changelog format:

Added: New features
Changed: Changes to existing functionality
Deprecated: Features to be removed in future versions
Removed: Features that have been removed
Fixed: Bug fixes
Security: Security-related changes

Changelog¶

Unreleased¶

Planned Features¶

0.6.2 - 2026-04-19¶

Added¶

Changed¶

Fixed¶

Security¶

Dependencies¶

Documentation¶

Testing¶

0.6.1 - 2026-04-12¶

Added¶

Changed¶

Fixed¶

0.6.0 - 2026-03-28¶

Added¶

Changed¶

Fixed¶

Dependencies¶

Documentation¶

Testing¶

0.5.2 - 2026-03-23¶

Documentation¶

0.5.1 - 2026-03-23¶

Added¶

Changed¶

Refactored¶

Fixed¶

Security¶

Dependencies¶

Documentation¶

Testing¶

0.5.0 - 2026-02-24¶

Added¶

Changed¶

Fixed¶

Security¶

Dependencies¶

Documentation¶

Known Limitations¶

0.4.0 - 2026-02-07¶

Added¶

Changed¶

Removed¶

Fixed¶

Security¶

Documentation¶

Testing¶

0.3.5 - 2026-01-28¶

Added¶

Fixed¶

Testing¶

Documentation¶

0.3.4 - 2026-01-24¶

Added¶

Documentation¶

0.3.3 - 2026-01-17¶

Added¶

Changed¶

Documentation¶

0.3.2 - 2026-01-10¶

Added¶

Fixed¶

0.3.1 - 2026-01-09¶

Changed¶

0.3.0 - 2026-01-08¶

Added¶

0.1.7 - 2025-12-27¶

Added¶

Changed¶

Fixed¶

0.1.6 - 2025-11-28¶

Added¶

Fixed¶

0.1.5 - 2025-11-27¶

Added¶

Fixed¶

0.1.4 - 2025-11-27¶

Fixed¶

Unreleased ¶