Changelog
All notable changes to the NBU Exporter project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
(No unreleased changes yet)
[2.2.1] - 2026-02-18
Fixed
- Double URL-encoding of job filter query parameter (#13): The
endTimefilter for job metrics was pre-encoded with%20for spaces, butBuildURL()applies URL encoding automatically viaurl.Values.Encode(). This caused double-encoding (%20→%2520), making the NetBackup API return HTTP 400 errors. Job metrics (nbu_jobs_bytes,nbu_jobs_count,nbu_status_count) were not collected as a result. Storage metrics were unaffected.
[2.2.0] - 2026-01-24
Added
- OpenTelemetry Distributed Tracing: Optional distributed tracing for NetBackup API calls and Prometheus scrape cycles
- OTLP gRPC exporter for sending traces to OpenTelemetry Collector, Jaeger, or Tempo
- Configurable sampling rates (0.0 to 1.0) for controlling trace collection overhead
- Automatic trace context propagation using W3C Trace Context standard
- Span hierarchy:
prometheus.scrape→netbackup.fetch_storage/netbackup.fetch_jobs→http.request - Rich span attributes following OpenTelemetry semantic conventions
- Resource attributes: service name, version, hostname, NetBackup server
- Graceful degradation: continues operating if tracing initialization fails
- Zero overhead when disabled
- Minimal overhead when enabled: < 2% at 0.1 sampling, < 5% at 1.0 sampling
opentelemetryconfiguration section with fields:enabled,endpoint,insecure,samplingRate- Telemetry manager (
internal/telemetry/manager.go) for OpenTelemetry lifecycle management - Instrumented HTTP client with automatic span creation for all NetBackup API requests
- Instrumented Prometheus collector with root span for scrape cycles
- Span attributes for storage operations: endpoint, storage unit count, API version
- Span attributes for job operations: endpoint, time window, total jobs, pagination details
- Span attributes for HTTP requests: method, URL, status code, duration, request/response sizes
- Error recording in spans with detailed error messages and span status
- Trace context injection into outgoing NetBackup API requests
- Docker Compose example with OpenTelemetry Collector and Jaeger (
docker-compose-otel.yaml) - OpenTelemetry Collector configuration example (
otel-collector-config.yaml) - Comprehensive OpenTelemetry documentation in README and
docs/opentelemetry-example.md - Trace analysis guide at
docs/trace-analysis-guide.md - OpenTelemetry integration tests with mock collector
- OpenTelemetry benchmark tests for performance validation
- Multi-Version API Support: NetBackup 10.0 (API 3.0), 10.5 (API 12.0), and 11.0 (API 13.0)
- Automatic Version Detection: Intelligent fallback logic (13.0 → 12.0 → 3.0) with retry mechanism
- Content-Type Validation: Validates server responses are JSON before unmarshaling, preventing cryptic error messages
apiVersionconfiguration field innbuserversection (now optional, defaults to auto-detection)- API version included in Accept header for all NetBackup API requests
- Version detector module (
internal/exporter/version_detector.go) with exponential backoff retry logic nbu_api_versionPrometheus metric to expose currently active API version- Comprehensive test suites:
- Version detection integration tests with mock servers
- Backward compatibility tests for existing configurations
- End-to-end workflow tests for all API versions
- Performance validation tests
- Metrics consistency tests across versions
- API compatibility tests with real response fixtures
- Test fixtures for all API versions in
testdata/api-versions/: jobs-response-v3.json,jobs-response-v12.json,jobs-response-v13.jsonstorage-response-v3.json,storage-response-v12.json,storage-response-v13.jsonerror-406-response.jsonfor version incompatibility testing- Comprehensive migration guide at
docs/netbackup-11-migration.mdwith multiple deployment scenarios - Configuration examples for all supported versions in
docs/config-examples/ - Enhanced error handling for 406 (Not Acceptable) responses with troubleshooting guidance
- Optional fields in storage data model:
storageCategory, replication capabilities, snapshot flags, WORM support - Optional field in jobs data model:
kilobytesDataTransferred - Configuration validation with
Config.Validate()method - Helper methods for configuration:
GetNBUBaseURL(),GetServerAddress(),GetScrapingDuration(),MaskAPIKey(),BuildURL() NbuClientstructure for reusable HTTP client with connection pooling- Context support throughout for proper cancellation and timeout handling
- Structured metric key types (
StorageMetricKey,JobMetricKey,JobStatusKey) - Health check endpoint at
/health Serverstructure for better application lifecycle management- Graceful shutdown with configurable timeout (10 seconds)
insecureSkipVerifyconfiguration option for TLS verification- API key masking in debug logs for security
ReadHeaderTimeoutto HTTP server for security- Comprehensive error context with
fmt.Errorfwrapping - Collection timeout (2 minutes) to prevent hanging scrapes
- Linting Configuration: Added
.markdownlint.jsonand.prettierrcfor consistent code formatting - Markdown line length relaxed to 120 characters
- Tables excluded from line length checks
- Prettier configured with 120 character width and LF line endings
- Storage Metrics Caching: TTL-based caching for storage metrics reduces NetBackup API load
- Configurable cache TTL via
cacheTTLfield in config (default: 5 minutes) - Uses patrickmn/go-cache for thread-safe caching with automatic expiration
- Cache hit returns instantly without API call, cache miss triggers fetch
- Metric HELP string documents caching behavior per Prometheus best practices
- Health Check with Connectivity Verification: Enhanced
/healthendpoint verifies NetBackup API connectivity - Returns HTTP 200 when NBU API is reachable
- Returns HTTP 503 with "UNHEALTHY" message when NBU API is unreachable
- Uses lightweight version endpoint for connectivity test (5-second timeout)
- Returns "OK (starting)" during startup phase before collector initialization
- Staleness Tracking Metrics: New metrics for monitoring data freshness
nbu_up: Gauge metric (1 = healthy, 0 = unhealthy) reflecting NBU API connectivitynbu_last_scrape_timestamp_seconds: Unix timestamp of last successful collection withsourcelabel- Dynamic Configuration Reload: Configuration changes can be applied without restarting the exporter
- SafeConfig wrapper provides thread-safe config access via RWMutex
- SIGHUP signal handler triggers manual config reload
- fsnotify file watcher detects config file changes (watches directory for vim/emacs atomic saves)
- Storage cache automatically flushed when NBU server address changes
Changed
- BREAKING: Fixed typo
scrappingInterval→scrapingIntervalin configuration - Accept header format now includes API version:
application/vnd.netbackup+json;version=X.Y - Data models updated to support optional fields across all API versions
- Default API version changed from "12.0" to "13.0" (with automatic fallback)
apiVersionconfiguration field is now optional (previously required)- Minimum NetBackup version requirement: 10.0+ (supports API versions 3.0, 12.0, and 13.0)
- Refactored
main.gowithServerstructure for better separation of concerns - Improved error handling - functions now return errors instead of calling
os.Exit() - HTTP client now reused across requests for better performance
- TLS verification now configurable (defaults to secure mode)
- Metric collection continues even if one source fails (storage or jobs)
- Improved godoc comments throughout codebase
- Better variable naming following Go conventions
- Centralized URL construction in
Config.BuildURL() - Enhanced logging with structured fields and debug mode support
- Code Quality Improvements: Comprehensive refactoring for maintainability and SonarCloud compliance
- Consolidated duplicate span creation logic into single reusable
createSpanhelper function - Centralized all OpenTelemetry span attribute keys as constants in
internal/telemetry/attributes.go - Extracted complex error message templates to
internal/telemetry/errors.gofor easier maintenance - Batched span attribute recording for improved performance (single
SetAttributescall) - Enhanced OpenTelemetry endpoint validation with format and port range checks
- Extracted complex conditional logic to named helper functions for clarity
- Renamed all test functions to follow Go conventions (removed underscores, e.g.,
TestNbuClient_GetHeaders→TestNbuClientGetHeaders) - Eliminated duplicate string literals by extracting to named constants (API versions, content types, test configuration values)
- Reduced cognitive complexity in 8 test functions by extracting helper functions:
TestNewNbuCollectorAutomaticDetection(complexity 24 → <15)TestAPIVersionDetectorIntegration(complexity 37 → <15)TestConfigValidateNbuServer(complexity 23 → <15)TestConfigValidateOpenTelemetry(complexity 16 → <15)TestConfigValidateServer(complexity 17 → <15)TestConfigGetNBUBaseURL(complexity 23 → <15)TestConfigValidate(complexity 23 → <15)createMockServerWithFilehelper (complexity 22 → <15)
- Centralized test helper functions in
internal/testutilpackage with fluent builder interfaces - Added
TestConfigBuilderfor consistent test configuration creation - Added
MockServerBuilderfor simplified mock HTTP server setup - Enhanced error messages with additional context (URL, status code, content-type, response preview)
- Added comprehensive package-level documentation with usage examples
- All code now passes SonarCloud "Sonar Way" quality profile checks
Removed
cmd.go- UnusedConfigCommandstructuredebug.go- File containing only commented-out code- Unused variables
programNameandnbuRootfrom package level - Obvious and redundant code comments
- Direct
os.Exit()calls from utility functions
Fixed
- HTML Response Handling: Server returning HTML instead of JSON now produces clear error messages instead of cryptic "invalid character '<'" errors
- Error handling in
ReadFile()- now returns errors properly - Missing error checks in metric collection
- Resource leaks from not reusing HTTP clients
- Potential Slowloris attack vector with
ReadHeaderTimeout - Configuration validation - ports, schemes, and durations now validated
- Graceful shutdown - now uses proper context with timeout
- JSON unmarshaling errors now include response preview for easier debugging
Security
- TLS certificate verification now configurable and secure by default
- API keys masked in logs to prevent accidental exposure
- Added
ReadHeaderTimeoutto prevent slow header attacks - Proper error context without exposing sensitive information
Performance
- HTTP client connection pooling reduces overhead
- Context-aware operations allow early cancellation
- Reduced allocations from client reuse
- Better resource cleanup with proper context handling
- Batched Job Pagination: Jobs fetched in batches of 100 per API call (~100x fewer requests for large job sets)
- Parallel Metric Collection: Storage and job metrics collected concurrently with
errgroup, reducing scrape time tomax(storage_time, jobs_time)instead of sum - Pre-allocation Capacity Hints: Maps and slices pre-allocated with expected sizes (100 job metrics, 50 status metrics) to reduce GC pressure
Migration Notes
OpenTelemetry Distributed Tracing (Optional)
New Feature: This version adds optional OpenTelemetry distributed tracing for diagnosing slow scrapes and identifying performance bottlenecks.
Quick Start:
- Add OpenTelemetry configuration to your
config.yaml:
opentelemetry:
enabled: true
endpoint: "localhost:4317" # Your OTLP collector endpoint
insecure: true # Use false for production with TLS
samplingRate: 0.1 # Sample 10% of scrapes
- Deploy OpenTelemetry Collector (see
docker-compose-otel.yamlfor example) - Restart the exporter
- View traces in Jaeger, Tempo, or your observability backend
Backward Compatibility:
- OpenTelemetry is completely optional - existing deployments work without any changes
- When disabled or not configured, zero tracing overhead
- All existing Prometheus metrics remain unchanged
- No impact on scrape performance when disabled
Use Cases:
- Diagnose slow NetBackup API calls
- Identify performance bottlenecks in scrape cycles
- Track request flows through the exporter
- Correlate logs with traces for troubleshooting
- Monitor API version detection performance
See docs/opentelemetry-example.md for complete setup guide and docs/trace-analysis-guide.md for trace analysis examples.
Multi-Version API Support
Important: This version adds support for NetBackup 10.0, 10.5, and 11.0 with automatic version detection. See docs/netbackup-11-migration.md for complete migration guide.
Quick Start:
- Ensure NetBackup server is version 10.0 or later
- Optional: Remove or update
apiVersionfield in config.yaml to enable automatic detection - Restart the exporter - it will automatically detect the highest supported API version
- Verify detected version in logs:
INFO: Detected NetBackup API version: X.Y - Verify metrics collection
Backward Compatibility:
- Existing configurations with explicit
apiVersioncontinue to work without changes - Existing configurations without
apiVersionwill now auto-detect (previously defaulted to "12.0") - No breaking changes to Prometheus metrics - all metric names and labels remain consistent
- NetBackup 11.0 maintains backward compatibility with API versions 12.0 and 3.0
Configuration File Changes
Update your config.yaml:
# Change this:
server:
scrappingInterval: "1h"
# To this:
server:
scrapingInterval: "1h"
# API version configuration (all options are valid):
# Option 1: Automatic detection (recommended)
nbuserver:
# apiVersion not specified - will auto-detect
# Option 2: Explicit version for NetBackup 11.0
nbuserver:
apiVersion: "13.0"
# Option 3: Explicit version for NetBackup 10.5
nbuserver:
apiVersion: "12.0"
# Option 4: Explicit version for NetBackup 10.0-10.4
nbuserver:
apiVersion: "3.0"
# Optionally add (defaults to false if omitted):
nbuserver:
insecureSkipVerify: false
Testing Your Migration
# Validate configuration
./bin/nbu_exporter --config config.yaml --debug
# Check health endpoint
curl http://localhost:2112/health
# Check metrics endpoint
curl http://localhost:2112/metrics
# Verify detected API version
curl http://localhost:2112/metrics | grep nbu_api_version
# Check logs for version detection
tail -f log/nbu-exporter.log | grep "Detected NetBackup API version"
Test Coverage
This release includes comprehensive test coverage:
- 80%+ code coverage for API client and version detection modules
- Unit tests for all three API versions (3.0, 12.0, 13.0)
- Integration tests with mock NetBackup servers
- Backward compatibility tests for existing configurations
- End-to-end workflow tests including fallback scenarios
- Performance validation tests
- Metrics consistency tests across all versions
[2.1.0] - 2025-11-09
Code quality improvements and refactoring for maintainability and SonarCloud compliance.
[2.0.0] - 2025-11-09
OpenTelemetry integration and distributed tracing support.
[1.2.2] - 2025-11-08
Patch release.
[1.2.1] - 2025-11-08
GitHub Actions workflow for GitHub Pages deployment.
[1.2.0] - 2025-11-08
API 11.0 integration support.
[1.1.0] - 2025-11-08
Support for API 10.5 of NetBackup.
[1.0.0] - 2025-11-08
Initial build with test server support.