Tools¶
The holodeck.tools package provides tool implementations that extend agent capabilities with semantic search, hierarchical document retrieval, and MCP server integration.
Module Overview¶
| Module | Description |
|---|---|
holodeck.tools |
Package exports for tools, mixins, and utilities |
holodeck.tools.base_tool |
Mixin classes for embedding and database configuration |
holodeck.tools.common |
Shared constants and pure utility functions |
holodeck.tools.vectorstore_tool |
Semantic search over unstructured and structured data |
holodeck.tools.hierarchical_document_tool |
Structure-aware document retrieval with hybrid search |
holodeck.tools.base_tool¶
Mixin classes that encapsulate common functionality shared between VectorStoreTool and HierarchicalDocumentTool. Mixins are used instead of inheritance because the tools have fundamentally different record types and core behaviors.
EmbeddingServiceMixin¶
EmbeddingServiceMixin
¶
Mixin for embedding service injection.
Provides the set_embedding_service method used by AgentFactory to inject a Semantic Kernel TextEmbedding service for generating real embeddings.
Required instance attributes (set by subclass init): _embedding_service: Any - stores the injected service config: Any - tool configuration with a .name attribute
set_embedding_service(service)
¶
Set the embedding service for generating embeddings.
This method allows AgentFactory to inject a Semantic Kernel TextEmbedding service for generating real embeddings instead of placeholder zeros.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service
|
Any
|
Semantic Kernel TextEmbedding service instance (OpenAITextEmbedding or AzureTextEmbedding). |
required |
Source code in src/holodeck/tools/base_tool.py
50 51 52 53 54 55 56 57 58 59 60 61 62 | |
DatabaseConfigMixin¶
DatabaseConfigMixin
¶
Mixin for database configuration resolution and collection creation.
Provides methods for resolving database configuration from various formats (None, string reference, DatabaseConfig object) and creating vector store collections with automatic fallback to in-memory storage.
Required instance attributes (set by subclass init): config: Any - tool configuration with .name and .database attributes _provider: str - stores the resolved provider name _collection: Any - stores the created collection instance _embedding_dimensions: int | None - embedding dimensions
_resolve_database_config(database)
¶
Resolve database configuration to provider and connection kwargs.
Handles three types of database configuration: 1. None - use in-memory storage 2. String reference - unresolved reference, warn and use in-memory 3. DatabaseConfig object - extract provider and connection parameters
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
database
|
DatabaseConfig | str | None
|
Database configuration (from tool config) |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, dict[str, Any]]
|
Tuple of (provider_name, connection_kwargs) |
Example
provider, kwargs = self._resolve_database_config(None) provider 'in-memory' kwargs {}
Source code in src/holodeck/tools/base_tool.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
_create_collection_with_fallback(provider, dimensions, connection_kwargs, record_class=None, definition=None)
¶
Create a vector store collection with fallback to in-memory.
Attempts to create a collection with the specified provider. If creation fails (e.g., database unreachable), falls back to in-memory storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
Vector store provider name |
required |
dimensions
|
int
|
Embedding dimensions for the collection |
required |
connection_kwargs
|
dict[str, Any]
|
Provider-specific connection parameters |
required |
record_class
|
type[Any] | None
|
Optional custom record class for the collection |
None
|
definition
|
Any | None
|
Optional VectorStoreCollectionDefinition for structured data |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
Created collection instance |
Raises:
| Type | Description |
|---|---|
Exception
|
If in-memory provider also fails (shouldn't happen) |
Source code in src/holodeck/tools/base_tool.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
holodeck.tools.common¶
Shared constants and pure utility functions used by VectorStoreTool, HierarchicalDocumentTool, and other tool implementations. Follows the DRY principle for file handling, path resolution, and embedding generation.
Constants¶
SUPPORTED_EXTENSIONS¶
SUPPORTED_EXTENSIONS = frozenset({'.txt', '.md', '.pdf', '.csv', '.json'})
module-attribute
¶
FILE_TYPE_MAPPING¶
FILE_TYPE_MAPPING = {'.txt': 'text', '.md': 'text', '.pdf': 'pdf', '.csv': 'csv', '.json': 'text'}
module-attribute
¶
Functions¶
get_file_type¶
get_file_type(path)
¶
Get FileInput type from file extension.
Maps file extensions to the appropriate type value for FileProcessor. Defaults to "text" for unknown extensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
File path (string or Path object) |
required |
Returns:
| Type | Description |
|---|---|
str
|
FileInput type string ("text", "pdf", "csv", etc.) |
Example
get_file_type("document.pdf") 'pdf' get_file_type(Path("data/file.csv")) 'csv' get_file_type("unknown.xyz") 'text'
Source code in src/holodeck/tools/common.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
resolve_source_path¶
resolve_source_path(source, base_dir=None)
¶
Resolve a source path relative to a base directory.
This function handles path resolution in priority order: 1. If source is absolute, return as-is 2. If base_dir is provided, resolve relative to base_dir 3. Try agent_base_dir context variable 4. Fall back to current working directory
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
Source path to resolve (from tool config) |
required |
base_dir
|
str | None
|
Optional base directory for relative path resolution |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Resolved absolute Path to the source |
Example
resolve_source_path("/absolute/path/file.txt") PosixPath('/absolute/path/file.txt') resolve_source_path("relative/file.txt", "/base") PosixPath('/base/relative/file.txt')
Source code in src/holodeck/tools/common.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
discover_files¶
discover_files(source_path, extensions=None)
¶
Discover files to ingest from a source path.
Recursively traverses directories and filters by supported extensions. For single files, validates the extension is supported.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_path
|
Path
|
Resolved path (file or directory) to discover from |
required |
extensions
|
frozenset[str] | None
|
Set of supported extensions (default: SUPPORTED_EXTENSIONS) |
None
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
List of Path objects for files to process, sorted for deterministic order |
Note
This function does not validate file existence - that should be checked before calling this function.
Example
discover_files(Path("/docs")) [PosixPath('/docs/file1.md'), PosixPath('/docs/subdir/file2.txt')]
Source code in src/holodeck/tools/common.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
generate_placeholder_embeddings¶
generate_placeholder_embeddings(count, dimensions=1536)
¶
Generate placeholder embedding vectors for testing.
Creates zero-valued embedding vectors when no embedding service is available. Useful for testing and development without LLM API access.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
Number of embeddings to generate |
required |
dimensions
|
int
|
Embedding vector dimensions (default: 1536) |
1536
|
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
List of zero-valued embedding vectors |
Example
embeddings = generate_placeholder_embeddings(3, 768) len(embeddings) 3 len(embeddings[0]) 768
Source code in src/holodeck/tools/common.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | |
holodeck.tools.vectorstore_tool¶
Provides semantic search over files and directories containing text data or structured data (CSV, JSON, JSONL files with field mapping). Supports automatic file discovery, text chunking, embedding generation, vector storage, and modification-time tracking for incremental ingestion.
VectorStoreTool¶
VectorStoreTool(config, base_dir=None, execution_config=None)
¶
Bases: EmbeddingServiceMixin, DatabaseConfigMixin
Vectorstore tool for semantic search over unstructured data.
This tool enables agents to perform semantic search over documents by: 1. Discovering files from configured source (file or directory) 2. Converting files to markdown using FileProcessor 3. Chunking text for optimal embedding generation 4. Generating embeddings via Semantic Kernel services 5. Storing document chunks in a vector database 6. Performing similarity search on queries
Inherits from
EmbeddingServiceMixin: Provides set_embedding_service() method DatabaseConfigMixin: Provides database config resolution and collection creation
Attributes:
| Name | Type | Description |
|---|---|---|
config |
Tool configuration from agent.yaml |
|
is_initialized |
bool
|
Whether the tool has been initialized |
document_count |
int
|
Number of document chunks stored |
last_ingest_time |
datetime | None
|
Timestamp of last ingestion |
Example
config = VectorstoreTool( ... name="knowledge_base", ... description="Search product docs", ... source="data/docs/" ... ) tool = VectorStoreTool(config) await tool.initialize() results = await tool.search("How do I authenticate?")
Initialize VectorStoreTool with configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
VectorstoreTool
|
VectorstoreTool configuration from agent.yaml containing: - name: Tool identifier - description: Tool description - source: File or directory path to ingest - embedding_model: Optional custom embedding model - database: Optional database configuration - top_k: Number of results to return (default: 5) - min_similarity_score: Minimum score threshold (optional) - chunk_size: Text chunk size in tokens (optional) - chunk_overlap: Chunk overlap in tokens (optional) |
required |
base_dir
|
str | None
|
Base directory for resolving relative source paths. If None, source paths are resolved relative to current working directory. |
None
|
execution_config
|
ExecutionConfig | None
|
Execution configuration for file processing timeouts and caching. If None, default FileProcessor settings are used. |
None
|
Source code in src/holodeck/tools/vectorstore_tool.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
initialize(force_ingest=False, provider_type=None, progress_callback=None)
async
¶
Initialize tool and ingest source files.
Discovers files from the configured source, processes them into chunks, generates embeddings, and stores them in the vector database. Source path is resolved relative to base_dir if set.
For structured data mode (when vector_field is configured), loads structured data from CSV/JSON/JSONL files with field mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
force_ingest
|
bool
|
If True, re-ingest all files regardless of modification time. |
False
|
provider_type
|
str | None
|
LLM provider for dimension auto-detection (defaults to "openai" if not specified) |
None
|
progress_callback
|
Callable[[int, int | None], None] | None
|
Optional callback invoked after each file is
processed (or skipped). Called as |
None
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the source path doesn't exist. |
RuntimeError
|
If no supported files are found in source. |
ConfigError
|
If configured fields don't exist in source (structured mode). |
Source code in src/holodeck/tools/vectorstore_tool.py
718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 | |
search(query)
async
¶
Execute semantic search and return formatted results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Natural language search query. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Formatted string with search results including scores and sources. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If tool not initialized. |
ValueError
|
If query is empty. |
Source code in src/holodeck/tools/vectorstore_tool.py
840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 | |
set_embedding_service(service)
¶
Set the embedding service for generating embeddings.
This method allows AgentFactory to inject a Semantic Kernel TextEmbedding service for generating real embeddings instead of placeholder zeros.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service
|
Any
|
Semantic Kernel TextEmbedding service instance (OpenAITextEmbedding or AzureTextEmbedding). |
required |
Source code in src/holodeck/tools/base_tool.py
50 51 52 53 54 55 56 57 58 59 60 61 62 | |
holodeck.tools.hierarchical_document_tool¶
Provides intelligent document search that understands document structure, extracts definitions, and generates optimized context for LLM consumption. Supports semantic, keyword (BM25), and hybrid search modes with configurable weights.
HierarchicalDocumentTool¶
HierarchicalDocumentTool(config, base_dir=None, execution_config=None)
¶
Bases: EmbeddingServiceMixin, DatabaseConfigMixin
Semantic Kernel tool for hierarchical document retrieval.
This tool provides intelligent document search that understands document structure, extracts definitions, and generates optimized context for LLM consumption.
Inherits from
EmbeddingServiceMixin: Provides set_embedding_service() method DatabaseConfigMixin: Provides database config resolution and collection creation
Attributes:
| Name | Type | Description |
|---|---|---|
config |
Tool configuration from HierarchicalDocumentToolConfig. |
|
chunks |
Indexed document chunks. |
Example
from holodeck.models.tool import HierarchicalDocumentToolConfig config = HierarchicalDocumentToolConfig( ... name="doc_search", ... description="Search policy documents", ... source="./docs/policy.md" ... ) tool = HierarchicalDocumentTool(config) await tool.initialize() results = await tool.search("What are the reporting requirements?")
Initialize the hierarchical document tool.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
HierarchicalDocumentToolConfig
|
Tool configuration from HierarchicalDocumentToolConfig. |
required |
base_dir
|
str | None
|
Optional base directory for resolving relative source paths. If None, source paths are resolved relative to current working directory or agent_base_dir context variable. |
None
|
execution_config
|
ExecutionConfig | None
|
Execution configuration for file processing
timeouts and caching. When provided, the lazy FileProcessor
used by |
None
|
Source code in src/holodeck/tools/hierarchical_document_tool.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |
initialize(force_ingest=True, provider_type=None, progress_callback=None)
async
¶
Initialize the tool by processing all configured documents.
This method should be called before any search operations. It loads documents, chunks them, extracts definitions, and indexes content for search.
Uses mtime-based incremental ingestion to skip unchanged files. Files are only re-ingested if their modification time is newer than the stored record's mtime.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
force_ingest
|
bool
|
If True, re-ingest all files regardless of modification time. Existing records will be deleted before re-ingestion. |
True
|
provider_type
|
str | None
|
LLM provider for dimension auto-detection (defaults to "openai" if not specified). |
None
|
progress_callback
|
Callable[[int, int | None], None] | None
|
Optional callback invoked after each file is
processed (or skipped). Called as |
None
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If a document file is not found. |
Source code in src/holodeck/tools/hierarchical_document_tool.py
1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 | |
search(query, top_k=None)
async
¶
Search documents for content relevant to query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
top_k
|
int | None
|
Override configured top_k. |
None
|
Returns:
| Type | Description |
|---|---|
list[SearchResult]
|
List of SearchResult objects. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If tool is not initialized. |
ValueError
|
If query is empty. |
Source code in src/holodeck/tools/hierarchical_document_tool.py
1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 | |
get_context(query, max_tokens=None)
async
¶
Get LLM-ready context for a query.
This is a convenience method that searches and formats results into a single context string suitable for LLM prompts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Query to get context for. |
required |
max_tokens
|
int | None
|
Maximum tokens for context (currently unused). |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Formatted context string. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If tool is not initialized. |
Source code in src/holodeck/tools/hierarchical_document_tool.py
1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 | |
get_definition(term)
¶
Look up a term's definition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
term
|
str
|
Term to look up. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, str] | None
|
Dictionary with 'term' and 'definition' keys, or None. |
Source code in src/holodeck/tools/hierarchical_document_tool.py
1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 | |
set_embedding_service(service)
¶
Set the embedding service for generating embeddings.
This method allows AgentFactory to inject a Semantic Kernel TextEmbedding service for generating real embeddings instead of placeholder zeros.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service
|
Any
|
Semantic Kernel TextEmbedding service instance (OpenAITextEmbedding or AzureTextEmbedding). |
required |
Source code in src/holodeck/tools/base_tool.py
50 51 52 53 54 55 56 57 58 59 60 61 62 | |
set_context_generator(generator)
¶
Set the context generator for contextual embeddings.
Accepts any ContextGenerator protocol implementation (LLMContextGenerator, ClaudeSDKContextGenerator, etc.).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
generator
|
Any
|
A ContextGenerator protocol implementation. |
required |
Source code in src/holodeck/tools/hierarchical_document_tool.py
180 181 182 183 184 185 186 187 188 189 | |