Observability Guide¶

This guide explains HoloDeck's OpenTelemetry-based observability for tracing, metrics, and logging.

Overview¶

HoloDeck provides built-in observability using OpenTelemetry, following the GenAI semantic conventions for AI/LLM instrumentation. All configuration is done through YAML—no code required.

Key features:

Distributed Tracing - Track requests across LLM calls and tool executions
GenAI Attributes - Token usage, model info, and completion details
Multiple Exporters - Console, OTLP (gRPC/HTTP), Prometheus, Azure Monitor
Sensitive Data Control - Capture or redact prompts/completions
Zero-Code Setup - Configure entirely in agent.yaml

Quick Start¶

Add observability to your agent configuration:

# agent.yaml
name: my-agent
model:
  provider: openai
  name: gpt-4o

observability:
  enabled: true
  # Console output by default - great for development

Run with observability enabled:

holodeck chat agent.yaml
# or
holodeck serve agent.yaml

You'll see trace and span information in the console output.

Send to OTLP Endpoint¶

observability:
  enabled: true
  exporters:
    otlp:
      enabled: true
      endpoint: http://localhost:4317
      protocol: grpc
      insecure: true

Configuration Reference¶

Full Schema¶

observability:
  enabled: true                    # Enable/disable observability
  service_name: "custom-service"   # Optional override (default: "holodeck-{agent.name}")

  traces:
    enabled: true                  # Trace collection
    sample_rate: 1.0               # 0.0 to 1.0 (default: 100% sampling)
    capture_content: false         # Capture prompts/completions (default: false)
    redaction_patterns:            # Regex patterns for sensitive data
      - '\b\d{3}-\d{2}-\d{4}\b'   # Example: SSN pattern
    max_queue_size: 2048           # Max spans in buffer
    max_export_batch_size: 512     # Spans per batch

  metrics:
    enabled: true                  # Metrics collection
    export_interval_ms: 5000       # Export every 5 seconds
    include_semantic_kernel_metrics: true

  logs:
    enabled: true                  # Structured logging
    level: INFO                    # DEBUG, INFO, WARNING, ERROR, CRITICAL
    include_trace_context: true    # Include trace/span IDs
    filter_namespaces:
      - semantic_kernel            # Which loggers to capture

  resource_attributes:             # Custom OTel resource attributes
    environment: production
    version: 1.0

  exporters:
    console:
      enabled: true                # Console output (default)
      pretty_print: true
      include_timestamps: true

    otlp:
      enabled: true
      endpoint: http://localhost:4317
      protocol: grpc               # grpc or http
      headers:
        authorization: "Bearer ${OTEL_API_KEY}"
      timeout_ms: 30000
      compression: gzip
      insecure: true

    prometheus:
      enabled: false
      port: 8889
      host: 0.0.0.0
      path: /metrics

    azure_monitor:
      enabled: false
      connection_string: "${APPLICATIONINSIGHTS_CONNECTION_STRING}"

Service Name¶

By default, the service name is holodeck-{agent.name}:

Agent named research → service name holodeck-research

Override with service_name:

observability:
  enabled: true
  service_name: "my-custom-service"

Traces Configuration¶

Sample Rate¶

Control what percentage of requests are traced:

traces:
  sample_rate: 1.0    # 100% sampling (default, good for development)
  sample_rate: 0.1    # 10% sampling (production with high traffic)
  sample_rate: 0.0    # Disable tracing

Capturing Prompts and Completions¶

By default, prompts and completions are NOT captured (privacy). Enable for debugging:

traces:
  capture_content: true  # Captures full prompt/completion in span events

Redacting Sensitive Data¶

Redact patterns before export:

traces:
  capture_content: true
  redaction_patterns:
    - '\b\d{3}-\d{2}-\d{4}\b'     # SSN
    - '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'  # Email
    - '\b\d{16}\b'                 # Credit card (16 digits)

Buffer Settings¶

Configure span buffering for batch export:

traces:
  max_queue_size: 2048         # Max spans in memory
  max_export_batch_size: 512   # Spans per export batch

GenAI Semantic Conventions¶

HoloDeck uses Semantic Kernel's native OpenTelemetry instrumentation, which follows OpenTelemetry GenAI semantic conventions.

Captured Attributes¶

Every LLM invocation captures:

Attribute	Description	Example
`gen_ai.system`	Provider name	`openai`, `anthropic`
`gen_ai.request.model`	Model identifier	`gpt-4o`, `claude-3-sonnet`
`gen_ai.request.temperature`	Temperature setting	`0.7`
`gen_ai.usage.input_tokens`	Prompt token count	`150`
`gen_ai.usage.output_tokens`	Completion token count	`75`
`gen_ai.usage.total_tokens`	Total tokens	`225`
`gen_ai.response.finish_reason`	Why generation stopped	`stop`, `length`
`gen_ai.response.id`	Provider's completion ID	`chatcmpl-...`

Span Events (when capture_content enabled)¶

gen_ai.content.prompt - Full prompt text
gen_ai.content.completion - Full completion text

Evaluation Tracing (DeepEval)¶

HoloDeck creates OpenTelemetry spans for DeepEval metric evaluations during test runs. Enable capture_evaluation_content to capture evaluation inputs/outputs:

observability:
  enabled: true
  traces:
    capture_evaluation_content: true  # Capture inputs, outputs, reasoning

Captured Span Attributes¶

Every DeepEval evaluation creates a span named holodeck.evaluation.{metric_name} with these attributes:

Attribute	Description	Example
`evaluation.metric.name`	Metric identifier	`geval`, `faithfulness`
`evaluation.threshold`	Pass/fail threshold	`0.7`
`evaluation.model.provider`	Evaluation LLM provider	`openai`, `ollama`
`evaluation.model.name`	Evaluation model name	`gpt-4o`, `llama3.2`
`evaluation.score`	Evaluation score (0.0-1.0)	`0.85`
`evaluation.passed`	Whether score met threshold	`true`
`evaluation.duration_ms`	Evaluation duration	`1523`

Content Attributes (when capture_evaluation_content enabled)¶

Attribute	Description	Max Length
`evaluation.input`	User query being evaluated	1000 chars
`evaluation.actual_output`	Agent response being evaluated	1000 chars
`evaluation.expected_output`	Ground truth (if provided)	1000 chars
`evaluation.retrieval_context`	RAG context chunks (JSON)	2000 chars
`evaluation.reasoning`	LLM-generated evaluation reasoning	2000 chars

Exporters¶

Console Exporter (Default)¶

The console exporter outputs to stdout—ideal for development:

exporters:
  console:
    enabled: true
    pretty_print: true       # Human-readable format
    include_timestamps: true

When no exporters are explicitly enabled, console is used automatically.

OTLP Exporter¶

Export to any OpenTelemetry-compatible backend (Jaeger, Zipkin, Grafana Tempo, etc.):

gRPC (Default)¶

exporters:
  otlp:
    enabled: true
    endpoint: http://localhost:4317
    protocol: grpc
    insecure: true  # No TLS (development)

HTTP/Protobuf¶

exporters:
  otlp:
    enabled: true
    endpoint: http://localhost:4318
    protocol: http

With Authentication¶

exporters:
  otlp:
    enabled: true
    endpoint: https://otel-collector.example.com:4317
    protocol: grpc
    headers:
      authorization: "Bearer ${OTEL_API_KEY}"
    compression: gzip
    timeout_ms: 30000

Prometheus Exporter (Planned)¶

exporters:
  prometheus:
    enabled: true
    port: 8889
    host: 0.0.0.0
    path: /metrics

Metrics available at http://localhost:8889/metrics.

Azure Monitor Exporter (Planned)¶

exporters:
  azure_monitor:
    enabled: true
    connection_string: "${APPLICATIONINSIGHTS_CONNECTION_STRING}"

Multiple Exporters¶

Enable multiple exporters simultaneously:

exporters:
  console:
    enabled: true      # Local debugging
  otlp:
    enabled: true      # Send to backend
    endpoint: http://localhost:4317

Setting Up an OTLP Sink¶

.NET Aspire Dashboard¶

The Aspire Dashboard provides a free, local OpenTelemetry UI.

Quick Start with Docker¶

# Run Aspire Dashboard
docker run --rm -d \
  --name aspire-dashboard \
  -p 18888:18888 \
  -p 4317:18889 \
  mcr.microsoft.com/dotnet/aspire-dashboard:9.0

Dashboard UI: http://localhost:18888
OTLP gRPC Endpoint: http://localhost:4317

Configure HoloDeck¶

observability:
  enabled: true
  exporters:
    otlp:
      enabled: true
      endpoint: http://localhost:4317
      protocol: grpc
      insecure: true

Run Your Agent¶

holodeck chat agent.yaml
# or
holodeck serve agent.yaml

Open http://localhost:18888 to view traces, metrics, and logs.

Environment Variables¶

User Configuration¶

Set sensitive values in environment:

# OTLP authentication
export OTEL_API_KEY="your-api-key"

# Azure Monitor
export APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=..."

Use in YAML with ${VAR_NAME}:

exporters:
  otlp:
    headers:
      authorization: "Bearer ${OTEL_API_KEY}"

Auto-Enabled by HoloDeck¶

HoloDeck automatically sets these environment variables when observability is enabled:

# Enables Semantic Kernel GenAI telemetry
SEMANTICKERNEL_EXPERIMENTAL_GENAI_ENABLE_OTEL_DIAGNOSTICS=true

# Enables prompt/completion capture (only if capture_content: true)
SEMANTICKERNEL_EXPERIMENTAL_GENAI_ENABLE_OTEL_DIAGNOSTICS_SENSITIVE=true

Examples¶

Development Setup¶

# agent.yaml
name: dev-agent
model:
  provider: ollama
  name: llama3.2:latest

observability:
  enabled: true
  traces:
    capture_content: true  # See prompts for debugging
  exporters:
    console:
      enabled: true
      pretty_print: true

Production with OTLP¶

# agent.yaml
name: prod-agent
model:
  provider: openai
  name: gpt-4o

observability:
  enabled: true
  service_name: "prod-research-agent"

  traces:
    sample_rate: 0.1       # 10% sampling
    capture_content: false # Privacy

  metrics:
    enabled: true
    export_interval_ms: 10000

  logs:
    enabled: true
    level: WARNING

  resource_attributes:
    environment: production
    version: "1.2.0"
    team: research

  exporters:
    otlp:
      enabled: true
      endpoint: https://otel.example.com:4317
      protocol: grpc
      headers:
        authorization: "Bearer ${OTEL_API_KEY}"
      compression: gzip

Multi-Exporter Setup¶

observability:
  enabled: true

  exporters:
    # Local debugging
    console:
      enabled: true

    # Central observability platform
    otlp:
      enabled: true
      endpoint: http://tempo:4317
      protocol: grpc

    # Metrics for alerting
    prometheus:
      enabled: true
      port: 8889

Minimal Tracing Only¶

observability:
  enabled: true
  metrics:
    enabled: false
  logs:
    enabled: false
  exporters:
    otlp:
      enabled: true
      endpoint: http://localhost:4317

Integration with Commands¶

Observability is available in all HoloDeck commands:

Chat¶

holodeck chat agent.yaml --verbose
# Traces each message exchange

Test¶

holodeck test agent.yaml
# Traces each test case execution

Serve¶

holodeck serve agent.yaml
# Traces each API request

Performance¶

Observability is designed for minimal overhead:

Metric	Value
Overhead	< 5% of response time
Scale	~100 requests/min, ~10K spans/hour
Batch size	512 spans (default)
Buffer	2048 spans max
Drop policy	Oldest-first when buffer full

Troubleshooting¶

No traces appearing¶

Check observability.enabled: true
Verify exporter endpoint is reachable
Check for firewall/network issues
Enable verbose mode: holodeck chat agent.yaml --verbose

Missing token counts¶

Some providers don't return token usage in streaming mode. Use non-streaming for complete metrics.

High memory usage¶

Reduce buffer size for high-volume scenarios:

traces:
  max_queue_size: 512
  max_export_batch_size: 128

OTLP connection refused¶

Ensure your OTLP endpoint is running and accessible:

# Test gRPC endpoint
grpcurl -plaintext localhost:4317 list

# Test HTTP endpoint
curl http://localhost:4318/v1/traces

Best Practices¶

Development: Use console exporter with capture_content: true
Production: Use OTLP with sampling and no content capture
Security: Never capture content in production without redaction
Sampling: Use 10-25% sampling for high-traffic services
Retention: Configure your backend's retention policy appropriately

Next Steps¶

See Agent Server Guide for deploying agents
See Global Configuration for shared settings
See OpenTelemetry Documentation for backend setup
See GenAI Semantic Conventions for attribute details