# BACKBEAT Prototype A production-grade distributed task orchestration system with time-synchronized beat generation and agent status aggregation. ## Overview BACKBEAT implements a novel approach to distributed system coordination using musical concepts: - **Pulse Service**: Leader-elected nodes generate synchronized "beats" as timing references - **Reverb Service**: Aggregates agent status claims and produces summary reports per "window" - **Agent Simulation**: Simulates distributed agents reporting task status ## Architecture ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Pulse │────▶│ NATS │◀────│ Reverb │ │ (Leader) │ │ Broker │ │ (Aggregator)│ └─────────────┘ └─────────────┘ └─────────────┘ │ ▼ ┌─────────────┐ │ Agents │ │ (Simulated) │ └─────────────┘ ``` ### Key Components 1. **Pulse Service** (`cmd/pulse/`) - Raft-based leader election - Hybrid Logical Clock (HLC) synchronization - Tempo control with ±10% change limits - Beat frame generation at configurable BPM - Degradation mode for fault tolerance 2. **Reverb Service** (`cmd/reverb/`) - StatusClaim ingestion and validation - Window-based aggregation - BarReport generation with KPIs - Performance monitoring and SLO tracking - Admin API for operational visibility 3. **Agent Simulator** (`cmd/agent-sim/`) - Multi-agent simulation - Realistic task state transitions - Configurable reporting rates - Load testing capabilities ## Requirements Implementation The system implements the following requirements: ### Core Requirements - **BACKBEAT-REQ-020**: StatusClaim ingestion and window grouping - **BACKBEAT-REQ-021**: BarReport emission at downbeats with KPIs - **BACKBEAT-REQ-022**: DHT persistence placeholder (future implementation) ### Performance Requirements - **BACKBEAT-PER-001**: End-to-end delivery p95 ≤ 100ms at 2Hz - **BACKBEAT-PER-002**: Reverb rollup ≤ 1 beat after downbeat - **BACKBEAT-PER-003**: SDK timer drift ≤ 1% over 1 hour ### Observability Requirements - **BACKBEAT-OBS-002**: Comprehensive reverb metrics - Prometheus metrics export - Structured logging with zerolog - Health and readiness endpoints ## Quick Start ### Development Environment 1. **Start the complete stack:** ```bash make run-dev ``` 2. **Monitor the services:** - Pulse Node 1: http://localhost:8080 - Pulse Node 2: http://localhost:8081 - Reverb Service: http://localhost:8082 - Prometheus: http://localhost:9090 - Grafana: http://localhost:3000 (admin/admin) 3. **View logs:** ```bash make logs ``` 4. **Check service status:** ```bash make status ``` ### Manual Build ```bash # Build all services make build # Run individual services ./bin/pulse -cluster=test-cluster -nats=nats://localhost:4222 ./bin/reverb -cluster=test-cluster -nats=nats://localhost:4222 ./bin/agent-sim -cluster=test-cluster -nats=nats://localhost:4222 ``` ## Interface Specifications ### INT-A: BeatFrame (Pulse → All) ```json { "type": "backbeat.beatframe.v1", "cluster_id": "chorus-production", "beat_index": 1234, "downbeat": true, "phase": "execution", "hlc": "7ffd:0001:beef", "deadline_at": "2024-01-15T10:30:00Z", "tempo_bpm": 120, "window_id": "7e9b0e6c4c9a4e59b7f2d9a3c1b2e4d5" } ``` ### INT-B: StatusClaim (Agents → Reverb) ```json { "type": "backbeat.statusclaim.v1", "agent_id": "agent:xyz", "task_id": "task:123", "beat_index": 1234, "state": "executing", "beats_left": 3, "progress": 0.5, "notes": "fetching inputs", "hlc": "7ffd:0001:beef" } ``` ### INT-C: BarReport (Reverb → Consumers) ```json { "type": "backbeat.barreport.v1", "window_id": "7e9b0e6c4c9a4e59b7f2d9a3c1b2e4d5", "from_beat": 240, "to_beat": 359, "agents_reporting": 978, "on_time_reviews": 842, "help_promises_fulfilled": 91, "secret_rotations_ok": true, "tempo_drift_ms": 7, "issues": [] } ``` ## API Endpoints ### Pulse Service - `GET /health` - Health check - `GET /ready` - Readiness check - `GET /metrics` - Prometheus metrics - `POST /api/v1/tempo` - Change tempo - `GET /api/v1/status` - Service status ### Reverb Service - `GET /health` - Health check - `GET /ready` - Readiness check - `GET /metrics` - Prometheus metrics - `GET /api/v1/windows` - List active windows - `GET /api/v1/windows/{id}` - Get window details - `GET /api/v1/status` - Service status ## Configuration ### Environment Variables - `BACKBEAT_ENV` - Environment (development/production) - `NATS_URL` - NATS server URL - `LOG_LEVEL` - Logging level (debug/info/warn/error) ### Command Line Flags #### Pulse Service - `-cluster` - Cluster identifier - `-node` - Node identifier - `-admin-port` - HTTP admin port - `-raft-bind` - Raft cluster bind address - `-data-dir` - Data directory - `-nats` - NATS server URL #### Reverb Service - `-cluster` - Cluster identifier - `-node` - Node identifier - `-nats` - NATS server URL - `-bar-length` - Bar length in beats - `-log-level` - Log level ## Monitoring ### Key Metrics **Pulse Service:** - `backbeat_beats_total` - Total beats published - `backbeat_pulse_jitter_seconds` - Beat timing jitter - `backbeat_is_leader` - Leadership status - `backbeat_current_tempo_bpm` - Current tempo **Reverb Service:** - `backbeat_reverb_agents_reporting` - Agents in current window - `backbeat_reverb_on_time_reviews` - On-time task completions - `backbeat_reverb_windows_completed_total` - Total windows processed - `backbeat_reverb_window_processing_seconds` - Window processing time ### Performance SLOs The system tracks compliance with performance requirements: - Beat delivery latency p95 ≤ 100ms - Pulse jitter p95 ≤ 20ms - Reverb processing ≤ 1 beat duration - Timer drift ≤ 1% over 1 hour ## Development ### Build Requirements - Go 1.22+ - Docker & Docker Compose - Make ### Development Workflow ```bash # Format, vet, test, and build make dev # Run full CI pipeline make ci # Build for production make production ``` ### Testing ```bash # Run tests make test # Run with race detection go test -race ./... # Run specific test suites go test ./internal/backbeat -v ``` ## Production Deployment ### Docker Images The multi-stage Dockerfile produces separate images for each service: - `backbeat-pulse:v1.0.0` - Pulse service - `backbeat-reverb:v1.0.0` - Reverb service - `backbeat-agent-sim:v1.0.0` - Agent simulator ### Kubernetes Deployment ```bash # Build and push images make docker-push VERSION=v1.0.0 # Deploy to Kubernetes (example) kubectl apply -f k8s/ ``` ### Docker Swarm Deployment ```bash # Build images make docker # Deploy stack docker stack deploy -c docker-compose.swarm.yml backbeat ``` ## Troubleshooting ### Common Issues 1. **NATS Connection Failed** - Verify NATS server is running - Check network connectivity - Verify NATS URL configuration 2. **Leader Election Issues** - Check Raft logs for cluster formation - Verify peer connectivity on Raft ports - Ensure persistent storage is available 3. **Missing StatusClaims** - Verify agents are publishing to correct NATS subjects - Check StatusClaim validation errors in reverb logs - Monitor `backbeat_reverb_claims_processed_total` metric ### Log Analysis ```bash # Follow reverb service logs docker-compose logs -f reverb # Search for specific window processing docker-compose logs reverb | grep "window_id=abc123" # Monitor performance metrics curl http://localhost:8082/metrics | grep backbeat_reverb ``` ## License This is prototype software for the CHORUS platform. See licensing documentation for details. ## Support For issues and questions, please refer to the CHORUS platform documentation or contact the development team.