Comprehensive documentation for coordination, messaging, discovery, and internal systems. Core Coordination Packages: - pkg/election - Democratic leader election (uptime-based, heartbeat mechanism, SLURP integration) - pkg/coordination - Meta-coordination with dependency detection (4 built-in rules) - coordinator/ - Task orchestration and assignment (AI-powered scoring) - discovery/ - mDNS peer discovery (automatic LAN detection) Messaging & P2P Infrastructure: - pubsub/ - GossipSub messaging (31 message types, role-based topics, HMMM integration) - p2p/ - libp2p networking (DHT modes, connection management, security) Monitoring & Health: - pkg/metrics - Prometheus metrics (80+ metrics across 12 categories) - pkg/health - Health monitoring (4 HTTP endpoints, enhanced checks, graceful degradation) Internal Systems: - internal/licensing - License validation (KACHING integration, cluster leases, fail-closed) - internal/hapui - Human Agent Portal UI (9 commands, HMMM wizard, UCXL browser, decision voting) - internal/backbeat - P2P operation telemetry (6 phases, beat synchronization, health reporting) Documentation Statistics (Phase 3): - 10 packages documented (~18,000 lines) - 31 PubSub message types cataloged - 80+ Prometheus metrics documented - Complete API references with examples - Integration patterns and best practices Key Features Documented: - Election: 5 triggers, candidate scoring (5 weighted components), stability windows - Coordination: AI-powered dependency detection, cross-repo sessions, escalation handling - PubSub: Topic patterns, message envelopes, SHHH redaction, Hypercore logging - Metrics: All metric types with labels, Prometheus scrape config, alert rules - Health: Liveness vs readiness, critical checks, Kubernetes integration - Licensing: Grace periods, circuit breaker, cluster lease management - HAP UI: Interactive terminal commands, HMMM composition wizard, web interface (beta) - BACKBEAT: 6-phase operation tracking, beat budget estimation, drift detection Implementation Status Marked: - ✅ Production: Election, metrics, health, licensing, pubsub, p2p, discovery, coordinator - 🔶 Beta: HAP web interface, BACKBEAT telemetry, advanced coordination - 🔷 Alpha: SLURP election scoring - ⚠️ Experimental: Meta-coordination, AI-powered dependency detection Progress: 22/62 files complete (35%) Next Phase: AI providers, SLURP system, API layer, reasoning engine 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
25 KiB
CHORUS Metrics Package
Overview
The pkg/metrics package provides comprehensive Prometheus-based metrics collection for the CHORUS distributed system. It exposes detailed operational metrics across all system components including P2P networking, DHT operations, PubSub messaging, elections, task management, and resource utilization.
Architecture
Core Components
- CHORUSMetrics: Central metrics collector managing all Prometheus metrics
- Prometheus Registry: Custom registry for metric collection
- HTTP Server: Exposes metrics endpoint for scraping
- Background Collectors: Periodic system and resource metric collection
Metric Types
The package uses three Prometheus metric types:
- Counter: Monotonically increasing values (e.g., total messages sent)
- Gauge: Values that can go up or down (e.g., connected peers)
- Histogram: Distribution of values with configurable buckets (e.g., latency measurements)
Configuration
MetricsConfig
type MetricsConfig struct {
// HTTP server configuration
ListenAddr string // Default: ":9090"
MetricsPath string // Default: "/metrics"
// Histogram buckets
LatencyBuckets []float64 // Default: 0.001s to 10s
SizeBuckets []float64 // Default: 64B to 16MB
// Node identification labels
NodeID string // Unique node identifier
Version string // CHORUS version
Environment string // deployment environment (dev/staging/prod)
Cluster string // cluster identifier
// Collection intervals
SystemMetricsInterval time.Duration // Default: 30s
ResourceMetricsInterval time.Duration // Default: 15s
}
Default Configuration
config := metrics.DefaultMetricsConfig()
// Returns:
// - ListenAddr: ":9090"
// - MetricsPath: "/metrics"
// - LatencyBuckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
// - SizeBuckets: [64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216]
// - SystemMetricsInterval: 30s
// - ResourceMetricsInterval: 15s
Metrics Catalog
System Metrics
chorus_system_info
Type: Gauge
Description: System information with version labels
Labels: node_id, version, go_version, cluster, environment
Value: Always 1 when present
chorus_uptime_seconds
Type: Gauge Description: System uptime in seconds since start Value: Current uptime in seconds
P2P Network Metrics
chorus_p2p_connected_peers
Type: Gauge Description: Number of currently connected P2P peers Value: Current peer count
chorus_p2p_messages_sent_total
Type: Counter
Description: Total number of P2P messages sent
Labels: message_type, peer_id
Usage: Track outbound message volume per type and destination
chorus_p2p_messages_received_total
Type: Counter
Description: Total number of P2P messages received
Labels: message_type, peer_id
Usage: Track inbound message volume per type and source
chorus_p2p_message_latency_seconds
Type: Histogram
Description: P2P message round-trip latency distribution
Labels: message_type
Buckets: Configurable latency buckets (default: 1ms to 10s)
chorus_p2p_connection_duration_seconds
Type: Histogram
Description: Duration of P2P connections
Labels: peer_id
Usage: Track connection stability
chorus_p2p_peer_score
Type: Gauge
Description: Peer quality score
Labels: peer_id
Value: Score between 0.0 (poor) and 1.0 (excellent)
DHT (Distributed Hash Table) Metrics
chorus_dht_put_operations_total
Type: Counter
Description: Total number of DHT put operations
Labels: status (success/failure)
Usage: Track DHT write operations
chorus_dht_get_operations_total
Type: Counter
Description: Total number of DHT get operations
Labels: status (success/failure)
Usage: Track DHT read operations
chorus_dht_operation_latency_seconds
Type: Histogram
Description: DHT operation latency distribution
Labels: operation (put/get), status (success/failure)
Usage: Monitor DHT performance
chorus_dht_provider_records
Type: Gauge Description: Number of provider records stored in DHT Value: Current provider record count
chorus_dht_content_keys
Type: Gauge Description: Number of content keys stored in DHT Value: Current content key count
chorus_dht_replication_factor
Type: Gauge
Description: Replication factor for DHT keys
Labels: key_hash
Value: Number of replicas for specific keys
chorus_dht_cache_hits_total
Type: Counter
Description: DHT cache hit count
Labels: cache_type
Usage: Monitor DHT caching effectiveness
chorus_dht_cache_misses_total
Type: Counter
Description: DHT cache miss count
Labels: cache_type
Usage: Monitor DHT caching effectiveness
PubSub Messaging Metrics
chorus_pubsub_topics
Type: Gauge Description: Number of active PubSub topics Value: Current topic count
chorus_pubsub_subscribers
Type: Gauge
Description: Number of subscribers per topic
Labels: topic
Value: Subscriber count for each topic
chorus_pubsub_messages_total
Type: Counter
Description: Total PubSub messages
Labels: topic, direction (sent/received), message_type
Usage: Track message volume per topic
chorus_pubsub_message_latency_seconds
Type: Histogram
Description: PubSub message delivery latency
Labels: topic
Usage: Monitor message propagation performance
chorus_pubsub_message_size_bytes
Type: Histogram
Description: PubSub message size distribution
Labels: topic
Buckets: Configurable size buckets (default: 64B to 16MB)
Election System Metrics
chorus_election_term
Type: Gauge Description: Current election term number Value: Monotonically increasing term number
chorus_election_state
Type: Gauge
Description: Current election state (1 for active state, 0 for others)
Labels: state (idle/discovering/electing/reconstructing/complete)
Usage: Only one state should have value 1 at any time
chorus_heartbeats_sent_total
Type: Counter Description: Total number of heartbeats sent by this node Usage: Monitor leader heartbeat activity
chorus_heartbeats_received_total
Type: Counter Description: Total number of heartbeats received from leader Usage: Monitor follower connectivity to leader
chorus_leadership_changes_total
Type: Counter Description: Total number of leadership changes Usage: Monitor election stability (lower is better)
chorus_leader_uptime_seconds
Type: Gauge Description: Current leader's tenure duration Value: Seconds since current leader was elected
chorus_election_latency_seconds
Type: Histogram Description: Time taken to complete election process Usage: Monitor election efficiency
Health Monitoring Metrics
chorus_health_checks_passed_total
Type: Counter
Description: Total number of health checks passed
Labels: check_name
Usage: Track health check success rate
chorus_health_checks_failed_total
Type: Counter
Description: Total number of health checks failed
Labels: check_name, reason
Usage: Track health check failures and reasons
chorus_health_check_duration_seconds
Type: Histogram
Description: Health check execution duration
Labels: check_name
Usage: Monitor health check performance
chorus_system_health_score
Type: Gauge Description: Overall system health score Value: 0.0 (unhealthy) to 1.0 (healthy) Usage: Monitor overall system health
chorus_component_health_score
Type: Gauge
Description: Component-specific health score
Labels: component
Value: 0.0 (unhealthy) to 1.0 (healthy)
Usage: Track individual component health
Task Management Metrics
chorus_tasks_active
Type: Gauge Description: Number of currently active tasks Value: Current active task count
chorus_tasks_queued
Type: Gauge Description: Number of queued tasks waiting execution Value: Current queue depth
chorus_tasks_completed_total
Type: Counter
Description: Total number of completed tasks
Labels: status (success/failure), task_type
Usage: Track task completion and success rate
chorus_task_duration_seconds
Type: Histogram
Description: Task execution duration distribution
Labels: task_type, status
Usage: Monitor task performance
chorus_task_queue_wait_time_seconds
Type: Histogram Description: Time tasks spend in queue before execution Usage: Monitor task scheduling efficiency
SLURP (Context Generation) Metrics
chorus_slurp_contexts_generated_total
Type: Counter
Description: Total number of SLURP contexts generated
Labels: role, status (success/failure)
Usage: Track context generation volume
chorus_slurp_generation_time_seconds
Type: Histogram Description: Time taken to generate SLURP contexts Buckets: [0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0, 60.0, 120.0] Usage: Monitor context generation performance
chorus_slurp_queue_length
Type: Gauge Description: Length of SLURP generation queue Value: Current queue depth
chorus_slurp_active_jobs
Type: Gauge Description: Number of active SLURP generation jobs Value: Currently running generation jobs
chorus_slurp_leadership_events_total
Type: Counter Description: SLURP-related leadership events Usage: Track leader-initiated context generation
SHHH (Secret Sentinel) Metrics
chorus_shhh_findings_total
Type: Counter
Description: Total number of SHHH redaction findings
Labels: rule, severity (low/medium/high/critical)
Usage: Monitor secret detection effectiveness
UCXI (Protocol Resolution) Metrics
chorus_ucxi_requests_total
Type: Counter
Description: Total number of UCXI protocol requests
Labels: method, status (success/failure)
Usage: Track UCXI usage and success rate
chorus_ucxi_resolution_latency_seconds
Type: Histogram Description: UCXI address resolution latency Usage: Monitor resolution performance
chorus_ucxi_cache_hits_total
Type: Counter Description: UCXI cache hit count Usage: Monitor caching effectiveness
chorus_ucxi_cache_misses_total
Type: Counter Description: UCXI cache miss count Usage: Monitor caching effectiveness
chorus_ucxi_content_size_bytes
Type: Histogram Description: Size of resolved UCXI content Usage: Monitor content distribution
Resource Utilization Metrics
chorus_cpu_usage_ratio
Type: Gauge Description: CPU usage ratio Value: 0.0 (idle) to 1.0 (fully utilized)
chorus_memory_usage_bytes
Type: Gauge Description: Memory usage in bytes Value: Current memory consumption
chorus_disk_usage_ratio
Type: Gauge
Description: Disk usage ratio
Labels: mount_point
Value: 0.0 (empty) to 1.0 (full)
chorus_network_bytes_in_total
Type: Counter Description: Total bytes received from network Usage: Track inbound network traffic
chorus_network_bytes_out_total
Type: Counter Description: Total bytes sent to network Usage: Track outbound network traffic
chorus_goroutines
Type: Gauge Description: Number of active goroutines Value: Current goroutine count
Error Metrics
chorus_errors_total
Type: Counter
Description: Total number of errors
Labels: component, error_type
Usage: Track error frequency by component and type
chorus_panics_total
Type: Counter Description: Total number of panics recovered Usage: Monitor system stability
Usage Examples
Basic Initialization
import "chorus/pkg/metrics"
// Create metrics collector with default config
config := metrics.DefaultMetricsConfig()
config.NodeID = "chorus-node-01"
config.Version = "v1.0.0"
config.Environment = "production"
config.Cluster = "cluster-01"
metricsCollector := metrics.NewCHORUSMetrics(config)
// Start metrics HTTP server
if err := metricsCollector.StartServer(config); err != nil {
log.Fatalf("Failed to start metrics server: %v", err)
}
// Start background metric collection
metricsCollector.CollectMetrics(config)
Recording P2P Metrics
// Update peer count
metricsCollector.SetConnectedPeers(5)
// Record message sent
metricsCollector.IncrementMessagesSent("task_assignment", "peer-abc123")
// Record message received
metricsCollector.IncrementMessagesReceived("task_result", "peer-def456")
// Record message latency
startTime := time.Now()
// ... send message and wait for response ...
latency := time.Since(startTime)
metricsCollector.ObserveMessageLatency("task_assignment", latency)
Recording DHT Metrics
// Record DHT put operation
startTime := time.Now()
err := dht.Put(key, value)
latency := time.Since(startTime)
if err != nil {
metricsCollector.IncrementDHTPutOperations("failure")
metricsCollector.ObserveDHTOperationLatency("put", "failure", latency)
} else {
metricsCollector.IncrementDHTPutOperations("success")
metricsCollector.ObserveDHTOperationLatency("put", "success", latency)
}
// Update DHT statistics
metricsCollector.SetDHTProviderRecords(150)
metricsCollector.SetDHTContentKeys(450)
metricsCollector.SetDHTReplicationFactor("key-hash-123", 3.0)
Recording PubSub Metrics
// Update topic count
metricsCollector.SetPubSubTopics(10)
// Record message published
metricsCollector.IncrementPubSubMessages("CHORUS/tasks/v1", "sent", "task_created")
// Record message received
metricsCollector.IncrementPubSubMessages("CHORUS/tasks/v1", "received", "task_completed")
// Record message latency
startTime := time.Now()
// ... publish message and wait for delivery confirmation ...
latency := time.Since(startTime)
metricsCollector.ObservePubSubMessageLatency("CHORUS/tasks/v1", latency)
Recording Election Metrics
// Update election state
metricsCollector.SetElectionTerm(42)
metricsCollector.SetElectionState("idle")
// Record heartbeat sent (leader)
metricsCollector.IncrementHeartbeatsSent()
// Record heartbeat received (follower)
metricsCollector.IncrementHeartbeatsReceived()
// Record leadership change
metricsCollector.IncrementLeadershipChanges()
Recording Health Metrics
// Record health check success
metricsCollector.IncrementHealthCheckPassed("database-connectivity")
// Record health check failure
metricsCollector.IncrementHealthCheckFailed("p2p-connectivity", "no_peers")
// Update health scores
metricsCollector.SetSystemHealthScore(0.95)
metricsCollector.SetComponentHealthScore("dht", 0.98)
metricsCollector.SetComponentHealthScore("pubsub", 0.92)
Recording Task Metrics
// Update task counts
metricsCollector.SetActiveTasks(5)
metricsCollector.SetQueuedTasks(12)
// Record task completion
startTime := time.Now()
// ... execute task ...
duration := time.Since(startTime)
metricsCollector.IncrementTasksCompleted("success", "data_processing")
metricsCollector.ObserveTaskDuration("data_processing", "success", duration)
Recording SLURP Metrics
// Record context generation
startTime := time.Now()
// ... generate SLURP context ...
duration := time.Since(startTime)
metricsCollector.IncrementSLURPGenerated("admin", "success")
metricsCollector.ObserveSLURPGenerationTime(duration)
// Update queue length
metricsCollector.SetSLURPQueueLength(3)
Recording SHHH Metrics
// Record secret findings
findings := scanForSecrets(content)
for _, finding := range findings {
metricsCollector.IncrementSHHHFindings(finding.Rule, finding.Severity, 1)
}
Recording Resource Metrics
import "runtime"
// Get runtime stats
var memStats runtime.MemStats
runtime.ReadMemStats(&memStats)
metricsCollector.SetMemoryUsage(float64(memStats.Alloc))
metricsCollector.SetGoroutines(runtime.NumGoroutine())
// Record system resource usage
metricsCollector.SetCPUUsage(0.45) // 45% CPU usage
metricsCollector.SetDiskUsage("/var/lib/CHORUS", 0.73) // 73% disk usage
Recording Errors
// Record error occurrence
if err != nil {
metricsCollector.IncrementErrors("dht", "timeout")
}
// Record recovered panic
defer func() {
if r := recover(); r != nil {
metricsCollector.IncrementPanics()
// Handle panic...
}
}()
Prometheus Integration
Scrape Configuration
Add the following to your prometheus.yml:
scrape_configs:
- job_name: 'chorus-nodes'
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: '/metrics'
static_configs:
- targets:
- 'chorus-node-01:9090'
- 'chorus-node-02:9090'
- 'chorus-node-03:9090'
relabel_configs:
- source_labels: [__address__]
target_label: instance
- source_labels: [__address__]
regex: '([^:]+):.*'
target_label: node
replacement: '${1}'
Example Queries
P2P Network Health
# Average connected peers across cluster
avg(chorus_p2p_connected_peers)
# Message rate per second
rate(chorus_p2p_messages_sent_total[5m])
# 95th percentile message latency
histogram_quantile(0.95, rate(chorus_p2p_message_latency_seconds_bucket[5m]))
DHT Performance
# DHT operation success rate
rate(chorus_dht_get_operations_total{status="success"}[5m]) /
rate(chorus_dht_get_operations_total[5m])
# Average DHT operation latency
rate(chorus_dht_operation_latency_seconds_sum[5m]) /
rate(chorus_dht_operation_latency_seconds_count[5m])
# DHT cache hit rate
rate(chorus_dht_cache_hits_total[5m]) /
(rate(chorus_dht_cache_hits_total[5m]) + rate(chorus_dht_cache_misses_total[5m]))
Election Stability
# Leadership changes per hour
rate(chorus_leadership_changes_total[1h]) * 3600
# Nodes by election state
sum by (state) (chorus_election_state)
# Heartbeat rate
rate(chorus_heartbeats_sent_total[5m])
Task Management
# Task success rate
rate(chorus_tasks_completed_total{status="success"}[5m]) /
rate(chorus_tasks_completed_total[5m])
# Average task duration
histogram_quantile(0.50, rate(chorus_task_duration_seconds_bucket[5m]))
# Task queue depth
chorus_tasks_queued
Resource Utilization
# CPU usage by node
chorus_cpu_usage_ratio
# Memory usage by node
chorus_memory_usage_bytes / (1024 * 1024 * 1024) # Convert to GB
# Disk usage alert (>90%)
chorus_disk_usage_ratio > 0.9
System Health
# Overall system health score
chorus_system_health_score
# Component health scores
chorus_component_health_score
# Health check failure rate
rate(chorus_health_checks_failed_total[5m])
Alerting Rules
Example Prometheus alerting rules for CHORUS:
groups:
- name: chorus_alerts
interval: 30s
rules:
# P2P connectivity alerts
- alert: LowPeerCount
expr: chorus_p2p_connected_peers < 2
for: 5m
labels:
severity: warning
annotations:
summary: "Low P2P peer count on {{ $labels.instance }}"
description: "Node has {{ $value }} peers (minimum: 2)"
# DHT performance alerts
- alert: HighDHTFailureRate
expr: |
rate(chorus_dht_get_operations_total{status="failure"}[5m]) /
rate(chorus_dht_get_operations_total[5m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "High DHT failure rate on {{ $labels.instance }}"
description: "DHT failure rate: {{ $value | humanizePercentage }}"
# Election stability alerts
- alert: FrequentLeadershipChanges
expr: rate(chorus_leadership_changes_total[1h]) * 3600 > 5
for: 15m
labels:
severity: warning
annotations:
summary: "Frequent leadership changes"
description: "{{ $value }} leadership changes per hour"
# Task management alerts
- alert: HighTaskQueueDepth
expr: chorus_tasks_queued > 100
for: 10m
labels:
severity: warning
annotations:
summary: "High task queue depth on {{ $labels.instance }}"
description: "{{ $value }} tasks queued"
# Resource alerts
- alert: HighMemoryUsage
expr: chorus_memory_usage_bytes > 8 * 1024 * 1024 * 1024 # 8GB
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage: {{ $value | humanize1024 }}B"
- alert: HighDiskUsage
expr: chorus_disk_usage_ratio > 0.9
for: 10m
labels:
severity: critical
annotations:
summary: "High disk usage on {{ $labels.instance }}"
description: "Disk usage: {{ $value | humanizePercentage }}"
# Health monitoring alerts
- alert: LowSystemHealth
expr: chorus_system_health_score < 0.75
for: 5m
labels:
severity: warning
annotations:
summary: "Low system health score on {{ $labels.instance }}"
description: "Health score: {{ $value }}"
- alert: ComponentUnhealthy
expr: chorus_component_health_score < 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "Component {{ $labels.component }} unhealthy"
description: "Health score: {{ $value }}"
HTTP Endpoints
Metrics Endpoint
URL: /metrics
Method: GET
Description: Prometheus metrics in text exposition format
Response Format:
# HELP chorus_p2p_connected_peers Number of connected P2P peers
# TYPE chorus_p2p_connected_peers gauge
chorus_p2p_connected_peers 5
# HELP chorus_dht_put_operations_total Total number of DHT put operations
# TYPE chorus_dht_put_operations_total counter
chorus_dht_put_operations_total{status="success"} 1523
chorus_dht_put_operations_total{status="failure"} 12
# HELP chorus_task_duration_seconds Task execution duration
# TYPE chorus_task_duration_seconds histogram
chorus_task_duration_seconds_bucket{task_type="data_processing",status="success",le="0.001"} 0
chorus_task_duration_seconds_bucket{task_type="data_processing",status="success",le="0.005"} 12
chorus_task_duration_seconds_bucket{task_type="data_processing",status="success",le="0.01"} 45
...
Health Endpoint
URL: /health
Method: GET
Description: Basic health check for metrics server
Response: 200 OK with body OK
Best Practices
Metric Naming
- Use descriptive metric names with
chorus_prefix - Follow Prometheus naming conventions:
component_metric_unit - Use
_totalsuffix for counters - Use
_secondssuffix for time measurements - Use
_bytessuffix for size measurements
Label Usage
- Keep label cardinality low (avoid high-cardinality labels like request IDs)
- Use consistent label names across metrics
- Document label meanings and expected values
- Avoid labels that change frequently
Performance Considerations
- Metrics collection is lock-free for read operations
- Histogram observations are optimized for high throughput
- Background collectors run on separate goroutines
- Custom registry prevents pollution of default registry
Error Handling
- Metrics collection should never panic
- Failed metric updates should be logged but not block operations
- Use nil checks before accessing metrics collectors
Testing
func TestMetrics(t *testing.T) {
config := metrics.DefaultMetricsConfig()
config.NodeID = "test-node"
m := metrics.NewCHORUSMetrics(config)
// Test metric updates
m.SetConnectedPeers(5)
m.IncrementMessagesSent("test", "peer1")
// Verify metrics are collected
// (Use prometheus testutil for verification)
}
Troubleshooting
Metrics Not Appearing
- Verify metrics server is running:
curl http://localhost:9090/metrics - Check configuration: ensure correct
ListenAddrandMetricsPath - Verify Prometheus scrape configuration
- Check for errors in application logs
High Memory Usage
- Review label cardinality (check for unbounded label values)
- Adjust histogram buckets if too granular
- Reduce metric collection frequency
- Consider metric retention policies in Prometheus
Missing Metrics
- Ensure metric is being updated by application code
- Verify metric registration in
initializeMetrics() - Check for race conditions in metric access
- Review metric type compatibility (Counter vs Gauge vs Histogram)
Migration Guide
From Default Prometheus Registry
// Old approach
prometheus.MustRegister(myCounter)
// New approach
config := metrics.DefaultMetricsConfig()
m := metrics.NewCHORUSMetrics(config)
// Use m.IncrementErrors(...) instead of direct counter access
Adding New Metrics
- Add metric field to
CHORUSMetricsstruct - Initialize metric in
initializeMetrics()method - Add helper methods for updating the metric
- Document the metric in this file
- Add Prometheus queries and alerts as needed