Skip to content

Test Execution Framework API

The test runner orchestrates the complete test execution pipeline for HoloDeck agents, from configuration resolution through evaluation and result reporting.

Test Case Configuration

Configuration models for defining test cases with multimodal file support.

TestCaseModel

Bases: BaseModel

Test case for agent evaluation.

Represents a single test scenario with input, optional expected output, expected tool usage, multimodal file inputs, and RAG context.

validate_files(v) classmethod

Validate files list is not empty if provided.

Source code in src/holodeck/models/test_case.py
117
118
119
120
121
122
123
@field_validator("files")
@classmethod
def validate_files(cls, v: list[FileInput] | None) -> list[FileInput] | None:
    """Validate files list is not empty if provided."""
    if v is not None and len(v) > 10:
        raise ValueError("Maximum 10 files per test case")
    return v

validate_ground_truth(v) classmethod

Validate ground_truth is not empty if provided.

Source code in src/holodeck/models/test_case.py
109
110
111
112
113
114
115
@field_validator("ground_truth")
@classmethod
def validate_ground_truth(cls, v: str | None) -> str | None:
    """Validate ground_truth is not empty if provided."""
    if v is not None and (not v or not v.strip()):
        raise ValueError("ground_truth must be non-empty if provided")
    return v

validate_input(v) classmethod

Validate input is not empty.

Source code in src/holodeck/models/test_case.py
93
94
95
96
97
98
99
@field_validator("input")
@classmethod
def validate_input(cls, v: str) -> str:
    """Validate input is not empty."""
    if not v or not v.strip():
        raise ValueError("input must be a non-empty string")
    return v

validate_name(v) classmethod

Validate name is not empty if provided.

Source code in src/holodeck/models/test_case.py
101
102
103
104
105
106
107
@field_validator("name")
@classmethod
def validate_name(cls, v: str | None) -> str | None:
    """Validate name is not empty if provided."""
    if v is not None and (not v or not v.strip()):
        raise ValueError("name must be non-empty if provided")
    return v

FileInput

Bases: BaseModel

File input for multimodal test cases.

Represents a single file reference for test case inputs, supporting both local files and remote URLs with optional extraction parameters.

check_path_or_url(v, info) classmethod

Validate that exactly one of path or url is provided.

Source code in src/holodeck/models/test_case.py
38
39
40
41
42
43
@field_validator("path", "url", mode="before")
@classmethod
def check_path_or_url(cls, v: Any, info: Any) -> Any:
    """Validate that exactly one of path or url is provided."""
    # This runs before validation, so we check in root_validator
    return v

model_post_init(__context)

Validate path and url mutual exclusivity after initialization.

Source code in src/holodeck/models/test_case.py
62
63
64
65
66
67
def model_post_init(self, __context: Any) -> None:
    """Validate path and url mutual exclusivity after initialization."""
    if self.path and self.url:
        raise ValueError("Cannot provide both 'path' and 'url'")
    if not self.path and not self.url:
        raise ValueError("Must provide either 'path' or 'url'")

validate_pages(v) classmethod

Validate pages are positive integers.

Source code in src/holodeck/models/test_case.py
54
55
56
57
58
59
60
@field_validator("pages")
@classmethod
def validate_pages(cls, v: list[int] | None) -> list[int] | None:
    """Validate pages are positive integers."""
    if v is not None and not all(isinstance(p, int) and p > 0 for p in v):
        raise ValueError("pages must be positive integers")
    return v

validate_type(v) classmethod

Validate file type is supported.

Source code in src/holodeck/models/test_case.py
45
46
47
48
49
50
51
52
@field_validator("type")
@classmethod
def validate_type(cls, v: str) -> str:
    """Validate file type is supported."""
    valid_types = {"image", "pdf", "text", "excel", "word", "powerpoint", "csv"}
    if v not in valid_types:
        raise ValueError(f"type must be one of {valid_types}, got {v}")
    return v

Test Results

Data models for test execution results and metrics.

TestResult

Bases: BaseModel

Result of executing a single test case.

Contains the test input, agent response, tool calls, metric results, and overall pass/fail status along with any errors encountered.

Example Usage

from holodeck.lib.test_runner.executor import TestExecutor
from holodeck.config.loader import ConfigLoader

# Load agent configuration
loader = ConfigLoader()
config = loader.load("agent.yaml")

# Create executor and run tests
executor = TestExecutor()
results = executor.run_tests(config)

# Access results
for test_result in results.test_results:
    print(f"Test {test_result.test_name}: {test_result.status}")
    print(f"Metrics: {test_result.metrics}")

Multimodal File Support

The test runner integrates with the file processor to handle:

  • Images: JPG, PNG with OCR support
  • Documents: PDF (full or page ranges), Word, PowerPoint
  • Data: Excel (sheet/range selection), CSV, text files
  • Remote Files: URL-based inputs with caching

Files are automatically processed before agent invocation and included in test context.

Integration with Evaluation Framework

Test results automatically pass through the evaluation framework:

  1. NLP Metrics: Computed on all test outputs (F1, BLEU, ROUGE, METEOR)
  2. AI-powered Metrics: Optional evaluation by Azure AI models
  3. Custom Metrics: User-defined evaluation functions

Evaluation configuration comes from the agent's evaluations section.