OpenTelemetry Integration Example
This guide demonstrates how to run the NBU Exporter with OpenTelemetry distributed tracing using Docker Compose.
Overview
The example stack includes:
- NBU Exporter: Collects NetBackup metrics with OpenTelemetry tracing enabled
- OpenTelemetry Collector: Receives and processes traces from the exporter
- Jaeger: Stores and visualizes distributed traces
- Prometheus (optional): Scrapes metrics from the exporter
Prerequisites
- Docker and Docker Compose installed
- NetBackup server accessible from your Docker host
- Valid NetBackup API key
Quick Start
1. Configure the Exporter
Edit config.yaml to enable OpenTelemetry:
server:
host: "localhost"
port: "2112"
uri: "/metrics"
scrapingInterval: "1h"
logName: "log/nbu-exporter.log"
nbuserver:
scheme: "https"
host: "master.my.domain" # Your NetBackup server
port: "1556"
apiKey: "your-api-key-here" # Your API key
# ... other settings
opentelemetry:
enabled: true
endpoint: "otel-collector:4317"
insecure: true
samplingRate: 1.0 # Trace all scrapes for testing
2. Start the Stack
# Start all services
docker-compose -f docker-compose-otel.yaml up -d
# Check service status
docker-compose -f docker-compose-otel.yaml ps
# View logs
docker-compose -f docker-compose-otel.yaml logs -f nbu_exporter
3. Access the Services
- NBU Exporter Metrics: http://localhost:2112/metrics
- NBU Exporter Health: http://localhost:2112/health
- Jaeger UI: http://localhost:16686
- Prometheus: http://localhost:9090 (if enabled)
- Collector Metrics: http://localhost:8888/metrics
- Collector Health: http://localhost:13133
4. View Traces in Jaeger
- Open Jaeger UI: http://localhost:16686
- Select service:
nbu-exporter - Click "Find Traces"
- Click on a trace to view details
Understanding the Trace Hierarchy
Each Prometheus scrape creates a trace with this structure:
prometheus.scrape (root span)
├── netbackup.fetch_storage
│ └── http.request (GET /storage/storage-units)
└── netbackup.fetch_jobs
├── netbackup.fetch_job_page (offset=0)
│ └── http.request (GET /admin/jobs?offset=0)
├── netbackup.fetch_job_page (offset=1)
│ └── http.request (GET /admin/jobs?offset=1)
└── netbackup.fetch_job_page (offset=N)
└── http.request (GET /admin/jobs?offset=N)
Analyzing Traces
Find Slow Scrapes
- In Jaeger UI, go to "Search" tab
- Select service:
nbu-exporter - Set "Min Duration" to filter slow traces (e.g., 30s)
- Click "Find Traces"
Identify Bottlenecks
- Click on a slow trace
- Examine the span timeline
- Look for spans with long durations
- Check span attributes for details:
http.status_code: HTTP response statushttp.duration_ms: Request durationnetbackup.total_pages: Number of pages fetchednetbackup.total_jobs: Number of jobs retrieved
Common Patterns
Slow storage fetch:
Diagnosis: NetBackup storage API is slow. Check server performance.High pagination:
netbackup.fetch_jobs: 45.3s
├── netbackup.fetch_job_page: 15.2s
├── netbackup.fetch_job_page: 15.1s
└── netbackup.fetch_job_page: 15.0s
scrapingInterval.
API errors:
Diagnosis: NetBackup API error. Check span events for error details.Configuration Options
Sampling Rates
Adjust samplingRate based on your needs:
Development (trace everything):
Production (sample 10%):
High-frequency (minimal overhead):
Collector Configuration
The OpenTelemetry Collector can be customized in otel-collector-config.yaml:
Add additional exporters:
Adjust batch processing:
Change log level:
Troubleshooting
Traces Not Appearing
Check exporter logs:
Look for:
- INFO[0001] Detected NetBackup API version: 13.0
- INFO[0001] OpenTelemetry initialized successfully
If you see:
- WARN[0001] Failed to initialize OpenTelemetry: connection refused
Solution: Ensure the collector is running:
Collector Connection Issues
Check collector health:
Check collector logs:
Verify network connectivity:
Jaeger Not Receiving Traces
Check Jaeger logs:
Verify collector is exporting to Jaeger:
Check collector metrics:
High Memory Usage
Reduce batch size in collector:
Lower memory limit:
Reduce sampling rate:
Production Deployment
Security Considerations
Enable TLS for OTLP:
Configure collector with TLS:
receivers:
otlp:
protocols:
grpc:
tls:
cert_file: /etc/certs/server.crt
key_file: /etc/certs/server.key
Use secrets management:
# Don't commit API keys to version control
docker-compose -f docker-compose-otel.yaml up -d \
-e NBU_API_KEY=$(cat /path/to/secret)
Resource Limits
Add resource limits to prevent resource exhaustion:
services:
nbu_exporter:
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
reservations:
cpus: '0.25'
memory: 128M
Monitoring
Monitor collector health:
Monitor collector metrics:
Key metrics to watch:
- otelcol_receiver_accepted_spans: Spans received
- otelcol_exporter_sent_spans: Spans exported
- otelcol_processor_batch_batch_send_size: Batch sizes
- otelcol_exporter_send_failed_spans: Export failures
Stopping the Stack
# Stop all services
docker-compose -f docker-compose-otel.yaml down
# Stop and remove volumes
docker-compose -f docker-compose-otel.yaml down -v
Alternative Backends
Grafana Tempo
Replace Jaeger with Tempo for long-term trace storage:
services:
tempo:
image: grafana/tempo:latest
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
ports:
- "3200:3200" # Tempo
- "4317:4317" # OTLP gRPC
Update collector config:
AWS X-Ray
Export traces to AWS X-Ray:
Honeycomb
Export traces to Honeycomb: