Comprehensive documentation for coordination, messaging, discovery, and internal systems. Core Coordination Packages: - pkg/election - Democratic leader election (uptime-based, heartbeat mechanism, SLURP integration) - pkg/coordination - Meta-coordination with dependency detection (4 built-in rules) - coordinator/ - Task orchestration and assignment (AI-powered scoring) - discovery/ - mDNS peer discovery (automatic LAN detection) Messaging & P2P Infrastructure: - pubsub/ - GossipSub messaging (31 message types, role-based topics, HMMM integration) - p2p/ - libp2p networking (DHT modes, connection management, security) Monitoring & Health: - pkg/metrics - Prometheus metrics (80+ metrics across 12 categories) - pkg/health - Health monitoring (4 HTTP endpoints, enhanced checks, graceful degradation) Internal Systems: - internal/licensing - License validation (KACHING integration, cluster leases, fail-closed) - internal/hapui - Human Agent Portal UI (9 commands, HMMM wizard, UCXL browser, decision voting) - internal/backbeat - P2P operation telemetry (6 phases, beat synchronization, health reporting) Documentation Statistics (Phase 3): - 10 packages documented (~18,000 lines) - 31 PubSub message types cataloged - 80+ Prometheus metrics documented - Complete API references with examples - Integration patterns and best practices Key Features Documented: - Election: 5 triggers, candidate scoring (5 weighted components), stability windows - Coordination: AI-powered dependency detection, cross-repo sessions, escalation handling - PubSub: Topic patterns, message envelopes, SHHH redaction, Hypercore logging - Metrics: All metric types with labels, Prometheus scrape config, alert rules - Health: Liveness vs readiness, critical checks, Kubernetes integration - Licensing: Grace periods, circuit breaker, cluster lease management - HAP UI: Interactive terminal commands, HMMM composition wizard, web interface (beta) - BACKBEAT: 6-phase operation tracking, beat budget estimation, drift detection Implementation Status Marked: - ✅ Production: Election, metrics, health, licensing, pubsub, p2p, discovery, coordinator - 🔶 Beta: HAP web interface, BACKBEAT telemetry, advanced coordination - 🔷 Alpha: SLURP election scoring - ⚠️ Experimental: Meta-coordination, AI-powered dependency detection Progress: 22/62 files complete (35%) Next Phase: AI providers, SLURP system, API layer, reasoning engine 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
24 KiB
CHORUS Internal Package: backbeat
Package: chorus/internal/backbeat
Purpose: BACKBEAT Timing System Integration for CHORUS P2P Operations
Lines of Code: 400 lines (integration.go)
Overview
The backbeat package provides integration between CHORUS and the BACKBEAT distributed timing system. BACKBEAT synchronizes agent operations across the cluster using a shared "heartbeat" that enables coordinated, time-aware distributed computing.
This integration allows CHORUS agents to:
- Track P2P operations against beat budgets
- Report operation progress via status claims
- Synchronize multi-agent coordination
- Monitor timing drift and degradation
- Emit health metrics on a beat schedule
Core Concepts
BACKBEAT Timing System
BACKBEAT provides a distributed metronome that all agents synchronize to:
- Beat Index: Sequential beat number across the cluster
- Tempo: Beats per minute (default: 2 BPM = 30 seconds per beat)
- Phase: Current position within beat cycle
- Window ID: Time window identifier for grouping operations
- Downbeat: Bar start marker (analogous to musical downbeat)
P2P Operation Tracking
CHORUS uses BACKBEAT to track P2P operations:
- Beat Budget: Estimated beats for operation completion
- Progress Tracking: Real-time percentage completion
- Phase Transitions: Operation lifecycle stages
- Peer Coordination: Multi-agent operation synchronization
Architecture
Integration Type
type Integration struct {
client sdk.Client
config *BackbeatConfig
logger Logger
ctx context.Context
cancel context.CancelFunc
started bool
nodeID string
// P2P operation tracking
activeOperations map[string]*P2POperation
}
Responsibilities:
- BACKBEAT SDK client lifecycle management
- Beat and downbeat callback registration
- P2P operation tracking and reporting
- Status claim emission
- Health monitoring
BackbeatConfig
Configuration for BACKBEAT integration.
type BackbeatConfig struct {
Enabled bool
ClusterID string
AgentID string
NATSUrl string
}
Configuration Sources:
- Environment variables (prefixed with
CHORUS_BACKBEAT_) - CHORUS config.Config integration
- Defaults for local development
Environment Variables:
CHORUS_BACKBEAT_ENABLED- Enable/disable integration (default: true)CHORUS_BACKBEAT_CLUSTER_ID- Cluster identifier (default: "chorus-production")CHORUS_BACKBEAT_AGENT_ID- Agent identifier (default: "chorus-{agent_id}")CHORUS_BACKBEAT_NATS_URL- NATS server URL (default: "nats://backbeat-nats:4222")
P2POperation
Tracks a P2P coordination operation through BACKBEAT.
type P2POperation struct {
ID string
Type string // "election", "dht_store", "pubsub_sync", "peer_discovery"
StartBeat int64
EstimatedBeats int
Phase OperationPhase
PeerCount int
StartTime time.Time
Data interface{}
}
Operation Types:
election- Leader election or consensus operationdht_store- DHT storage or retrieval operationpubsub_sync- PubSub message propagationpeer_discovery- P2P peer discovery and connection
Lifecycle:
- Register operation with
StartP2POperation() - Update phase as operation progresses
- Complete with
CompleteP2POperation()or fail withFailP2POperation() - Automatic cleanup on completion
OperationPhase
Represents the current phase of a P2P operation.
type OperationPhase int
const (
PhaseStarted OperationPhase = iota
PhaseConnecting
PhaseNegotiating
PhaseExecuting
PhaseCompleted
PhaseFailed
)
Phase Transitions:
PhaseStarted → PhaseConnecting → PhaseNegotiating → PhaseExecuting → PhaseCompleted
↓
PhaseFailed
Typical Flow:
- PhaseStarted - Operation registered, initialization
- PhaseConnecting - Establishing connections to peers
- PhaseNegotiating - Consensus or coordination negotiation
- PhaseExecuting - Main operation execution
- PhaseCompleted - Operation successful
- PhaseFailed - Operation failed (any stage)
Logger Interface
Abstraction for CHORUS logging integration.
type Logger interface {
Info(msg string, args ...interface{})
Warn(msg string, args ...interface{})
Error(msg string, args ...interface{})
}
Allows integration with CHORUS's existing logging system without direct dependency.
Public API
Constructor
NewIntegration
Creates a new BACKBEAT integration for CHORUS.
func NewIntegration(cfg *config.Config, nodeID string, logger Logger) (*Integration, error)
Parameters:
cfg- CHORUS configuration objectnodeID- P2P node identifierlogger- CHORUS logger implementation
Returns:
- Configured Integration instance
- Error if BACKBEAT is disabled or configuration is invalid
Example:
integration, err := backbeat.NewIntegration(
config,
node.ID().String(),
runtime.Logger,
)
if err != nil {
log.Fatal("BACKBEAT integration failed:", err)
}
Lifecycle Management
Start
Initializes the BACKBEAT integration and starts the SDK client.
func (i *Integration) Start(ctx context.Context) error
Actions:
- Create cancellation context
- Start BACKBEAT SDK client
- Register beat callbacks (
onBeat,onDownbeat) - Log startup confirmation
Returns: Error if already started or SDK initialization fails
Example:
ctx := context.Background()
if err := integration.Start(ctx); err != nil {
log.Fatal("Failed to start BACKBEAT:", err)
}
Logged Output:
🎵 CHORUS BACKBEAT integration started - cluster=chorus-production agent=chorus-agent-42
Stop
Gracefully shuts down the BACKBEAT integration.
func (i *Integration) Stop() error
Actions:
- Cancel context
- Stop SDK client
- Cleanup resources
- Log shutdown confirmation
Returns: Error if SDK shutdown fails (logged as warning)
Example:
if err := integration.Stop(); err != nil {
log.Warn("BACKBEAT shutdown warning:", err)
}
Logged Output:
🎵 CHORUS BACKBEAT integration stopped
P2P Operation Management
StartP2POperation
Registers a new P2P operation with BACKBEAT.
func (i *Integration) StartP2POperation(
operationID string,
operationType string,
estimatedBeats int,
data interface{},
) error
Parameters:
operationID- Unique operation identifieroperationType- Operation category (election, dht_store, pubsub_sync, peer_discovery)estimatedBeats- Expected beats to completiondata- Optional operation-specific data
Actions:
- Create P2POperation record
- Record start beat from current beat index
- Add to activeOperations map
- Emit initial status claim
Returns: Error if integration not started
Example:
err := integration.StartP2POperation(
"election-leader-2025",
"election",
5, // Expect completion in 5 beats (~2.5 minutes at 2 BPM)
map[string]interface{}{
"candidates": 3,
"quorum": 2,
},
)
Status Claim Emitted:
{
"task_id": "election-leader-2025",
"state": "executing",
"beats_left": 5,
"progress": 0.0,
"notes": "P2P election: started (peers: 0, node: 12D3KooW...)"
}
UpdateP2POperationPhase
Updates the phase of an active P2P operation.
func (i *Integration) UpdateP2POperationPhase(
operationID string,
phase OperationPhase,
peerCount int,
) error
Parameters:
operationID- Operation identifierphase- New phase (PhaseConnecting, PhaseNegotiating, etc.)peerCount- Current peer count involved in operation
Actions:
- Lookup operation in activeOperations
- Update phase and peer count
- Emit updated status claim
Returns: Error if operation not found
Example:
// Connected to peers
err := integration.UpdateP2POperationPhase(
"election-leader-2025",
backbeat.PhaseConnecting,
3,
)
// Negotiating consensus
err = integration.UpdateP2POperationPhase(
"election-leader-2025",
backbeat.PhaseNegotiating,
3,
)
// Executing election
err = integration.UpdateP2POperationPhase(
"election-leader-2025",
backbeat.PhaseExecuting,
3,
)
CompleteP2POperation
Marks a P2P operation as completed successfully.
func (i *Integration) CompleteP2POperation(operationID string, peerCount int) error
Parameters:
operationID- Operation identifierpeerCount- Final peer count
Actions:
- Lookup operation
- Set phase to PhaseCompleted
- Emit completion status claim (state: "done", progress: 1.0)
- Remove from activeOperations map
Returns: Error if operation not found or status emission fails
Example:
err := integration.CompleteP2POperation("election-leader-2025", 3)
Status Claim Emitted:
{
"task_id": "election-leader-2025",
"state": "done",
"beats_left": 0,
"progress": 1.0,
"notes": "P2P election: completed (peers: 3, node: 12D3KooW...)"
}
FailP2POperation
Marks a P2P operation as failed.
func (i *Integration) FailP2POperation(operationID string, reason string) error
Parameters:
operationID- Operation identifierreason- Failure reason (for logging and status)
Actions:
- Lookup operation
- Set phase to PhaseFailed
- Emit failure status claim (state: "failed", progress: 0.0)
- Remove from activeOperations map
Returns: Error if operation not found or status emission fails
Example:
err := integration.FailP2POperation(
"election-leader-2025",
"quorum not reached within timeout",
)
Status Claim Emitted:
{
"task_id": "election-leader-2025",
"state": "failed",
"beats_left": 0,
"progress": 0.0,
"notes": "P2P operation failed: quorum not reached within timeout (type: election)"
}
Health and Monitoring
GetHealth
Returns the current BACKBEAT integration health status.
func (i *Integration) GetHealth() map[string]interface{}
Returns: Map with health metrics:
enabled- Integration enabled flagstarted- Integration started flagconnected- NATS connection statuscurrent_beat- Current beat indexcurrent_tempo- Current tempo (BPM)measured_bpm- Measured beats per minutetempo_drift- Tempo drift statusreconnect_count- NATS reconnection countactive_operations- Count of active operationslocal_degradation- Local performance degradation flagerrors- Recent error messagesnode_id- CHORUS node ID
Example:
health := integration.GetHealth()
fmt.Printf("BACKBEAT connected: %v\n", health["connected"])
fmt.Printf("Active operations: %d\n", health["active_operations"])
Example Response:
{
"enabled": true,
"started": true,
"connected": true,
"current_beat": 12345,
"current_tempo": 2,
"measured_bpm": 2.01,
"tempo_drift": "acceptable",
"reconnect_count": 0,
"active_operations": 2,
"local_degradation": false,
"errors": [],
"node_id": "12D3KooWAbc..."
}
ExecuteWithBeatBudget
Executes a function with a BACKBEAT beat budget.
func (i *Integration) ExecuteWithBeatBudget(beats int, fn func() error) error
Parameters:
beats- Beat budget for operationfn- Function to execute
Actions:
- Check if integration is started
- Delegate to SDK
WithBeatBudget()for timing enforcement - Fall back to regular execution if not started
Returns: Error from function execution or timeout
Example:
err := integration.ExecuteWithBeatBudget(10, func() error {
// This operation should complete within 10 beats
return performExpensiveOperation()
})
if err != nil {
log.Error("Operation exceeded beat budget:", err)
}
Beat Callbacks
onBeat
Handles regular beat events from BACKBEAT.
func (i *Integration) onBeat(beat sdk.BeatFrame)
Called: Every beat (every 30 seconds at 2 BPM)
BeatFrame Structure:
BeatIndex- Sequential beat numberPhase- Current phase within beatTempoBPM- Current tempoWindowID- Time window identifier
Actions:
- Log beat reception with details
- Emit status claims for all active operations
- Periodic health status emission (every 8 beats = ~4 minutes)
Example Log:
🥁 BACKBEAT beat received - beat=12345 phase=upbeat tempo=2 window=w-1234
onDownbeat
Handles downbeat (bar start) events.
func (i *Integration) onDownbeat(beat sdk.BeatFrame)
Called: At the start of each bar (every N beats, configurable)
Actions:
- Log downbeat reception
- Cleanup completed operations
- Log active operation count
Example Log:
🎼 BACKBEAT downbeat - new bar started - beat=12344 window=w-1234
🧹 BACKBEAT operations cleanup check - active: 2
Status Claim Emission
Operation Status Claims
Emitted for each active operation on every beat.
func (i *Integration) emitOperationStatus(operation *P2POperation) error
Calculated Fields:
- Beats Passed: Current beat - start beat
- Beats Left: Estimated beats - beats passed (minimum 0)
- Progress: Beats passed / estimated beats (maximum 1.0)
- State: "executing", "done", or "failed"
Status Claim Structure:
{
"task_id": "operation-id",
"state": "executing",
"beats_left": 3,
"progress": 0.4,
"notes": "P2P dht_store: executing (peers: 5, node: 12D3KooW...)"
}
Health Status Claims
Emitted periodically (every 8 beats = ~4 minutes at 2 BPM).
func (i *Integration) emitHealthStatus() error
Health Claim Structure:
{
"task_id": "chorus-p2p-health",
"state": "executing",
"beats_left": 0,
"progress": 1.0,
"notes": "CHORUS P2P healthy: connected=true, operations=2, tempo=2 BPM, node=12D3KooW..."
}
State Determination:
waiting- No active operationsexecuting- One or more active operationsfailed- SDK reports errors
Integration with CHORUS
SharedRuntime Integration
The Integration is created and managed by runtime.SharedRuntime:
type SharedRuntime struct {
// ... other fields
BackbeatIntegration *backbeat.Integration
}
func (sr *SharedRuntime) Initialize(cfg *config.Config) error {
// ... other initialization
// Create BACKBEAT integration
if cfg.Backbeat.Enabled {
integration, err := backbeat.NewIntegration(
cfg,
sr.Node.ID().String(),
sr.Logger,
)
if err == nil {
sr.BackbeatIntegration = integration
integration.Start(context.Background())
}
}
}
P2P Operation Tracking
CHORUS components use BACKBEAT to track distributed operations:
DHT Operations:
// Start tracking
integration.StartP2POperation(
"dht-store-"+key,
"dht_store",
3, // Expect 3 beats
map[string]interface{}{"key": key},
)
// Update phase
integration.UpdateP2POperationPhase("dht-store-"+key, backbeat.PhaseExecuting, peerCount)
// Complete
integration.CompleteP2POperation("dht-store-"+key, peerCount)
PubSub Sync:
integration.StartP2POperation(
"pubsub-sync-"+messageID,
"pubsub_sync",
2,
map[string]interface{}{"topic": topic},
)
Peer Discovery:
integration.StartP2POperation(
"peer-discovery-"+sessionID,
"peer_discovery",
5,
map[string]interface{}{"target_peers": 10},
)
HAP Status Display
Human Agent Portal displays BACKBEAT status:
func (t *TerminalInterface) printStatus() {
// ... other status
if t.runtime.BackbeatIntegration != nil {
health := t.runtime.BackbeatIntegration.GetHealth()
if connected, ok := health["connected"].(bool); ok && connected {
fmt.Printf("BACKBEAT: ✅ Connected\n")
} else {
fmt.Printf("BACKBEAT: ⚠️ Disconnected\n")
}
} else {
fmt.Printf("BACKBEAT: ❌ Disabled\n")
}
}
Configuration Examples
Production Configuration
export CHORUS_BACKBEAT_ENABLED=true
export CHORUS_BACKBEAT_CLUSTER_ID=chorus-production
export CHORUS_BACKBEAT_AGENT_ID=chorus-agent-42
export CHORUS_BACKBEAT_NATS_URL=nats://backbeat-nats.chorus.services:4222
Development Configuration
export CHORUS_BACKBEAT_ENABLED=true
export CHORUS_BACKBEAT_CLUSTER_ID=chorus-dev
export CHORUS_BACKBEAT_AGENT_ID=chorus-dev-alice
export CHORUS_BACKBEAT_NATS_URL=nats://localhost:4222
Disabled Configuration
export CHORUS_BACKBEAT_ENABLED=false
Beat Budget Guidelines
Recommended beat budgets for common operations:
| Operation Type | Estimated Beats | Time at 2 BPM | Rationale |
|---|---|---|---|
| Peer Discovery | 2-5 beats | 1-2.5 min | Network discovery and handshake |
| DHT Store | 2-4 beats | 1-2 min | Distributed storage with replication |
| DHT Retrieve | 1-3 beats | 30-90 sec | Distributed lookup and retrieval |
| PubSub Sync | 1-2 beats | 30-60 sec | Message propagation |
| Leader Election | 3-10 beats | 1.5-5 min | Consensus negotiation |
| Task Coordination | 5-20 beats | 2.5-10 min | Multi-agent task assignment |
Factors Affecting Beat Budget:
- Network latency
- Peer count
- Data size
- Consensus requirements
- Retry logic
Error Handling
Integration Errors
Not Started:
if !i.started {
return fmt.Errorf("BACKBEAT integration not started")
}
Operation Not Found:
operation, exists := i.activeOperations[operationID]
if !exists {
return fmt.Errorf("operation %s not found", operationID)
}
SDK Errors:
if err := i.client.Start(i.ctx); err != nil {
return fmt.Errorf("failed to start BACKBEAT client: %w", err)
}
Degradation Handling
BACKBEAT SDK tracks timing degradation:
- Tempo Drift: Difference between expected and measured BPM
- Local Degradation: Local system performance issues
- Reconnect Count: NATS connection stability
Health status includes these metrics for monitoring:
{
"tempo_drift": "acceptable",
"local_degradation": false,
"reconnect_count": 0
}
Performance Characteristics
Resource Usage
- Memory: O(n) where n = active operations count
- CPU: Minimal, callback-driven architecture
- Network: Status claims on each beat (low bandwidth)
- Latency: Beat-aligned, not real-time (30-second granularity at 2 BPM)
Scalability
- Active Operations: Designed for 100s of concurrent operations
- Beat Frequency: Configurable tempo (1-60 BPM typical)
- Status Claims: Batched per beat, not per operation event
- Cleanup: Automatic on completion/failure
Timing Characteristics
At default 2 BPM (30 seconds per beat):
- Minimum tracking granularity: 30 seconds
- Health check frequency: 4 minutes (8 beats)
- Operation overhead: ~0.1s per beat callback
- Status claim latency: <1s to NATS
Debugging and Monitoring
Enable Debug Logging
// In BACKBEAT SDK configuration
sdkConfig.Logger = slog.New(slog.NewTextHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelDebug,
}))
Monitor Active Operations
health := integration.GetHealth()
activeOps := health["active_operations"].(int)
fmt.Printf("Active P2P operations: %d\n", activeOps)
Check NATS Connectivity
health := integration.GetHealth()
if connected, ok := health["connected"].(bool); !ok || !connected {
log.Warn("BACKBEAT disconnected from NATS")
reconnectCount := health["reconnect_count"].(int)
log.Warn("Reconnection attempts:", reconnectCount)
}
Tempo Drift Monitoring
health := integration.GetHealth()
drift := health["tempo_drift"].(string)
measuredBPM := health["measured_bpm"].(float64)
expectedBPM := health["current_tempo"].(int)
if drift != "acceptable" {
log.Warn("Tempo drift detected:", drift)
log.Warn("Expected:", expectedBPM, "Measured:", measuredBPM)
}
Testing
Unit Testing
Mock the SDK client for unit tests:
type MockSDKClient struct {
// ... mock fields
}
func (m *MockSDKClient) Start(ctx context.Context) error {
return nil
}
func (m *MockSDKClient) GetCurrentBeat() int64 {
return 1000
}
// ... implement other SDK methods
Integration Testing
Test with real BACKBEAT cluster:
# Start BACKBEAT services
docker-compose -f backbeat-compose.yml up -d
# Run CHORUS with BACKBEAT enabled
export CHORUS_BACKBEAT_ENABLED=true
export CHORUS_BACKBEAT_NATS_URL=nats://localhost:4222
./chorus-agent
# Monitor status claims
nats sub "backbeat.status.>"
Load Testing
Test with many concurrent operations:
func TestManyOperations(t *testing.T) {
integration := setupIntegration(t)
for i := 0; i < 1000; i++ {
opID := fmt.Sprintf("test-op-%d", i)
err := integration.StartP2POperation(opID, "dht_store", 5, nil)
require.NoError(t, err)
}
// Wait for beats
time.Sleep(3 * time.Minute)
// Complete operations
for i := 0; i < 1000; i++ {
opID := fmt.Sprintf("test-op-%d", i)
err := integration.CompleteP2POperation(opID, 5)
require.NoError(t, err)
}
// Verify cleanup
health := integration.GetHealth()
assert.Equal(t, 0, health["active_operations"])
}
Troubleshooting
Common Issues
"BACKBEAT integration is disabled"
- Check
CHORUS_BACKBEAT_ENABLEDenvironment variable - Verify configuration in CHORUS config file
"Failed to start BACKBEAT client"
- Check NATS connectivity
- Verify NATS URL is correct
- Ensure NATS server is running
- Check firewall rules
"Operation not found"
- Operation may have already completed
- Operation ID mismatch
- Integration not started before operation registration
High reconnect count
- Network instability
- NATS server restarts
- Connection timeout configuration
Tempo drift
- System clock synchronization issues (NTP)
- High CPU load affecting timing
- Network latency spikes
Debug Commands
Check NATS connectivity:
nats server check
Monitor BACKBEAT messages:
nats sub "backbeat.>"
View status claims:
nats sub "backbeat.status.>"
Check CHORUS health:
# Via HAP
hap> status
Future Enhancements
Planned Features
- Operation Dependencies: Track operation dependencies for complex workflows
- Beat Budget Warnings: Alert when operations approach budget limits
- Historical Metrics: Track operation completion times for better estimates
- Dynamic Beat Budgets: Adjust budgets based on historical performance
- Operation Priorities: Prioritize critical operations during contention
Potential Improvements
- Adaptive Beat Budgets: Learn optimal budgets from execution history
- Operation Correlation: Link related operations for workflow tracking
- Beat Budget Profiles: Pre-defined budgets for common operation patterns
- Performance Analytics: Detailed metrics on operation performance vs. budget
Related Documentation
BACKBEAT SDK Documentation- BACKBEAT Go SDK reference/docs/comprehensive/internal/runtime.md- SharedRuntime integration/docs/comprehensive/pkg/p2p.md- P2P operations tracked by BACKBEAT/docs/comprehensive/pkg/storage.md- DHT operations with beat budgets
Summary
The backbeat package provides essential timing and coordination infrastructure for CHORUS P2P operations:
- 400 lines of integration code
- P2P operation tracking with beat budgets
- 6 operation phases for lifecycle management
- 4 operation types (election, dht_store, pubsub_sync, peer_discovery)
- Status claim emission on every beat
- Health monitoring with tempo drift detection
- Graceful degradation when BACKBEAT unavailable
The integration enables CHORUS to participate in cluster-wide coordinated operations with timing guarantees, progress tracking, and health monitoring, making distributed P2P operations observable and manageable across the agent network.
Current Status: Production-ready, actively used for P2P operation telemetry and coordination across CHORUS cluster.