Test Execution Framework API¶
The test runner orchestrates the complete test execution pipeline for HoloDeck agents, from configuration resolution through evaluation and result reporting.
Test Case Configuration¶
Configuration models for defining test cases with multimodal file support.
TestCaseModel
¶
Bases: BaseModel
Test case for agent evaluation.
Represents a single test scenario with input, optional expected output, expected tool usage, multimodal file inputs, and RAG context.
validate_files(v)
classmethod
¶
Validate files list is not empty if provided.
Source code in src/holodeck/models/test_case.py
117 118 119 120 121 122 123 | |
validate_ground_truth(v)
classmethod
¶
Validate ground_truth is not empty if provided.
Source code in src/holodeck/models/test_case.py
109 110 111 112 113 114 115 | |
validate_input(v)
classmethod
¶
Validate input is not empty.
Source code in src/holodeck/models/test_case.py
93 94 95 96 97 98 99 | |
validate_name(v)
classmethod
¶
Validate name is not empty if provided.
Source code in src/holodeck/models/test_case.py
101 102 103 104 105 106 107 | |
FileInput
¶
Bases: BaseModel
File input for multimodal test cases.
Represents a single file reference for test case inputs, supporting both local files and remote URLs with optional extraction parameters.
check_path_or_url(v, info)
classmethod
¶
Validate that exactly one of path or url is provided.
Source code in src/holodeck/models/test_case.py
38 39 40 41 42 43 | |
model_post_init(__context)
¶
Validate path and url mutual exclusivity after initialization.
Source code in src/holodeck/models/test_case.py
62 63 64 65 66 67 | |
validate_pages(v)
classmethod
¶
Validate pages are positive integers.
Source code in src/holodeck/models/test_case.py
54 55 56 57 58 59 60 | |
validate_type(v)
classmethod
¶
Validate file type is supported.
Source code in src/holodeck/models/test_case.py
45 46 47 48 49 50 51 52 | |
Test Results¶
Data models for test execution results and metrics.
TestResult
¶
Bases: BaseModel
Result of executing a single test case.
Contains the test input, agent response, tool calls, metric results, and overall pass/fail status along with any errors encountered.
Example Usage¶
from holodeck.lib.test_runner.executor import TestExecutor
from holodeck.config.loader import ConfigLoader
# Load agent configuration
loader = ConfigLoader()
config = loader.load("agent.yaml")
# Create executor and run tests
executor = TestExecutor()
results = executor.run_tests(config)
# Access results
for test_result in results.test_results:
print(f"Test {test_result.test_name}: {test_result.status}")
print(f"Metrics: {test_result.metrics}")
Multimodal File Support¶
The test runner integrates with the file processor to handle:
- Images: JPG, PNG with OCR support
- Documents: PDF (full or page ranges), Word, PowerPoint
- Data: Excel (sheet/range selection), CSV, text files
- Remote Files: URL-based inputs with caching
Files are automatically processed before agent invocation and included in test context.
Integration with Evaluation Framework¶
Test results automatically pass through the evaluation framework:
- NLP Metrics: Computed on all test outputs (F1, BLEU, ROUGE, METEOR)
- AI-powered Metrics: Optional evaluation by Azure AI models
- Custom Metrics: User-defined evaluation functions
Evaluation configuration comes from the agent's evaluations section.
Related Documentation¶
- Data Models: Test case and result models
- Evaluation Framework: Metrics and evaluation system
- Configuration Loading: Loading agent configurations