backbeat: add module sources
This commit is contained in:
325
README.md
Normal file
325
README.md
Normal file
@@ -0,0 +1,325 @@
|
||||
# BACKBEAT Prototype
|
||||
|
||||
A production-grade distributed task orchestration system with time-synchronized beat generation and agent status aggregation.
|
||||
|
||||
## Overview
|
||||
|
||||
BACKBEAT implements a novel approach to distributed system coordination using musical concepts:
|
||||
|
||||
- **Pulse Service**: Leader-elected nodes generate synchronized "beats" as timing references
|
||||
- **Reverb Service**: Aggregates agent status claims and produces summary reports per "window"
|
||||
- **Agent Simulation**: Simulates distributed agents reporting task status
|
||||
|
||||
## Module Availability
|
||||
|
||||
BACKBEAT is published as a Go module. Consumers can pin the current release directly:
|
||||
|
||||
```bash
|
||||
go get github.com/chorus-services/backbeat@v0.1.0
|
||||
```
|
||||
|
||||
After downloading, the SDK helpers are available via `github.com/chorus-services/backbeat/pkg/sdk`.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Pulse │────▶│ NATS │◀────│ Reverb │
|
||||
│ (Leader) │ │ Broker │ │ (Aggregator)│
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ Agents │
|
||||
│ (Simulated) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### Key Components
|
||||
|
||||
1. **Pulse Service** (`cmd/pulse/`)
|
||||
- Raft-based leader election
|
||||
- Hybrid Logical Clock (HLC) synchronization
|
||||
- Tempo control with ±10% change limits
|
||||
- Beat frame generation at configurable BPM
|
||||
- Degradation mode for fault tolerance
|
||||
|
||||
2. **Reverb Service** (`cmd/reverb/`)
|
||||
- StatusClaim ingestion and validation
|
||||
- Window-based aggregation
|
||||
- BarReport generation with KPIs
|
||||
- Performance monitoring and SLO tracking
|
||||
- Admin API for operational visibility
|
||||
|
||||
3. **Agent Simulator** (`cmd/agent-sim/`)
|
||||
- Multi-agent simulation
|
||||
- Realistic task state transitions
|
||||
- Configurable reporting rates
|
||||
- Load testing capabilities
|
||||
|
||||
## Requirements Implementation
|
||||
|
||||
The system implements the following requirements:
|
||||
|
||||
### Core Requirements
|
||||
- **BACKBEAT-REQ-020**: StatusClaim ingestion and window grouping
|
||||
- **BACKBEAT-REQ-021**: BarReport emission at downbeats with KPIs
|
||||
- **BACKBEAT-REQ-022**: DHT persistence placeholder (future implementation)
|
||||
|
||||
### Performance Requirements
|
||||
- **BACKBEAT-PER-001**: End-to-end delivery p95 ≤ 100ms at 2Hz
|
||||
- **BACKBEAT-PER-002**: Reverb rollup ≤ 1 beat after downbeat
|
||||
- **BACKBEAT-PER-003**: SDK timer drift ≤ 1% over 1 hour
|
||||
|
||||
### Observability Requirements
|
||||
- **BACKBEAT-OBS-002**: Comprehensive reverb metrics
|
||||
- Prometheus metrics export
|
||||
- Structured logging with zerolog
|
||||
- Health and readiness endpoints
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Development Environment
|
||||
|
||||
1. **Start the complete stack:**
|
||||
```bash
|
||||
make run-dev
|
||||
```
|
||||
|
||||
2. **Monitor the services:**
|
||||
- Pulse Node 1: http://localhost:8080
|
||||
- Pulse Node 2: http://localhost:8081
|
||||
- Reverb Service: http://localhost:8082
|
||||
- Prometheus: http://localhost:9090
|
||||
- Grafana: http://localhost:3000 (admin/admin)
|
||||
|
||||
3. **View logs:**
|
||||
```bash
|
||||
make logs
|
||||
```
|
||||
|
||||
4. **Check service status:**
|
||||
```bash
|
||||
make status
|
||||
```
|
||||
|
||||
### Manual Build
|
||||
|
||||
```bash
|
||||
# Build all services
|
||||
make build
|
||||
|
||||
# Run individual services
|
||||
./bin/pulse -cluster=test-cluster -nats=nats://localhost:4222
|
||||
./bin/reverb -cluster=test-cluster -nats=nats://localhost:4222
|
||||
./bin/agent-sim -cluster=test-cluster -nats=nats://localhost:4222
|
||||
```
|
||||
|
||||
## Interface Specifications
|
||||
|
||||
### INT-A: BeatFrame (Pulse → All)
|
||||
```json
|
||||
{
|
||||
"type": "backbeat.beatframe.v1",
|
||||
"cluster_id": "chorus-production",
|
||||
"beat_index": 1234,
|
||||
"downbeat": true,
|
||||
"phase": "execution",
|
||||
"hlc": "7ffd:0001:beef",
|
||||
"deadline_at": "2024-01-15T10:30:00Z",
|
||||
"tempo_bpm": 120,
|
||||
"window_id": "7e9b0e6c4c9a4e59b7f2d9a3c1b2e4d5"
|
||||
}
|
||||
```
|
||||
|
||||
### INT-B: StatusClaim (Agents → Reverb)
|
||||
```json
|
||||
{
|
||||
"type": "backbeat.statusclaim.v1",
|
||||
"agent_id": "agent:xyz",
|
||||
"task_id": "task:123",
|
||||
"beat_index": 1234,
|
||||
"state": "executing",
|
||||
"beats_left": 3,
|
||||
"progress": 0.5,
|
||||
"notes": "fetching inputs",
|
||||
"hlc": "7ffd:0001:beef"
|
||||
}
|
||||
```
|
||||
|
||||
### INT-C: BarReport (Reverb → Consumers)
|
||||
```json
|
||||
{
|
||||
"type": "backbeat.barreport.v1",
|
||||
"window_id": "7e9b0e6c4c9a4e59b7f2d9a3c1b2e4d5",
|
||||
"from_beat": 240,
|
||||
"to_beat": 359,
|
||||
"agents_reporting": 978,
|
||||
"on_time_reviews": 842,
|
||||
"help_promises_fulfilled": 91,
|
||||
"secret_rotations_ok": true,
|
||||
"tempo_drift_ms": 7,
|
||||
"issues": []
|
||||
}
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Pulse Service
|
||||
- `GET /health` - Health check
|
||||
- `GET /ready` - Readiness check
|
||||
- `GET /metrics` - Prometheus metrics
|
||||
- `POST /api/v1/tempo` - Change tempo
|
||||
- `GET /api/v1/status` - Service status
|
||||
|
||||
### Reverb Service
|
||||
- `GET /health` - Health check
|
||||
- `GET /ready` - Readiness check
|
||||
- `GET /metrics` - Prometheus metrics
|
||||
- `GET /api/v1/windows` - List active windows
|
||||
- `GET /api/v1/windows/{id}` - Get window details
|
||||
- `GET /api/v1/status` - Service status
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
- `BACKBEAT_ENV` - Environment (development/production)
|
||||
- `NATS_URL` - NATS server URL
|
||||
- `LOG_LEVEL` - Logging level (debug/info/warn/error)
|
||||
|
||||
### Command Line Flags
|
||||
|
||||
#### Pulse Service
|
||||
- `-cluster` - Cluster identifier
|
||||
- `-node` - Node identifier
|
||||
- `-admin-port` - HTTP admin port
|
||||
- `-raft-bind` - Raft cluster bind address
|
||||
- `-data-dir` - Data directory
|
||||
- `-nats` - NATS server URL
|
||||
|
||||
#### Reverb Service
|
||||
- `-cluster` - Cluster identifier
|
||||
- `-node` - Node identifier
|
||||
- `-nats` - NATS server URL
|
||||
- `-bar-length` - Bar length in beats
|
||||
- `-log-level` - Log level
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Key Metrics
|
||||
|
||||
**Pulse Service:**
|
||||
- `backbeat_beats_total` - Total beats published
|
||||
- `backbeat_pulse_jitter_seconds` - Beat timing jitter
|
||||
- `backbeat_is_leader` - Leadership status
|
||||
- `backbeat_current_tempo_bpm` - Current tempo
|
||||
|
||||
**Reverb Service:**
|
||||
- `backbeat_reverb_agents_reporting` - Agents in current window
|
||||
- `backbeat_reverb_on_time_reviews` - On-time task completions
|
||||
- `backbeat_reverb_windows_completed_total` - Total windows processed
|
||||
- `backbeat_reverb_window_processing_seconds` - Window processing time
|
||||
|
||||
### Performance SLOs
|
||||
|
||||
The system tracks compliance with performance requirements:
|
||||
- Beat delivery latency p95 ≤ 100ms
|
||||
- Pulse jitter p95 ≤ 20ms
|
||||
- Reverb processing ≤ 1 beat duration
|
||||
- Timer drift ≤ 1% over 1 hour
|
||||
|
||||
## Development
|
||||
|
||||
### Build Requirements
|
||||
- Go 1.22+
|
||||
- Docker & Docker Compose
|
||||
- Make
|
||||
|
||||
### Development Workflow
|
||||
```bash
|
||||
# Format, vet, test, and build
|
||||
make dev
|
||||
|
||||
# Run full CI pipeline
|
||||
make ci
|
||||
|
||||
# Build for production
|
||||
make production
|
||||
```
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
# Run tests
|
||||
make test
|
||||
|
||||
# Run with race detection
|
||||
go test -race ./...
|
||||
|
||||
# Run specific test suites
|
||||
go test ./internal/backbeat -v
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Docker Images
|
||||
The multi-stage Dockerfile produces separate images for each service:
|
||||
- `backbeat-pulse:v1.0.0` - Pulse service
|
||||
- `backbeat-reverb:v1.0.0` - Reverb service
|
||||
- `backbeat-agent-sim:v1.0.0` - Agent simulator
|
||||
|
||||
### Kubernetes Deployment
|
||||
```bash
|
||||
# Build and push images
|
||||
make docker-push VERSION=v1.0.0
|
||||
|
||||
# Deploy to Kubernetes (example)
|
||||
kubectl apply -f k8s/
|
||||
```
|
||||
|
||||
### Docker Swarm Deployment
|
||||
```bash
|
||||
# Build images
|
||||
make docker
|
||||
|
||||
# Deploy stack
|
||||
docker stack deploy -c docker-compose.swarm.yml backbeat
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **NATS Connection Failed**
|
||||
- Verify NATS server is running
|
||||
- Check network connectivity
|
||||
- Verify NATS URL configuration
|
||||
|
||||
2. **Leader Election Issues**
|
||||
- Check Raft logs for cluster formation
|
||||
- Verify peer connectivity on Raft ports
|
||||
- Ensure persistent storage is available
|
||||
|
||||
3. **Missing StatusClaims**
|
||||
- Verify agents are publishing to correct NATS subjects
|
||||
- Check StatusClaim validation errors in reverb logs
|
||||
- Monitor `backbeat_reverb_claims_processed_total` metric
|
||||
|
||||
### Log Analysis
|
||||
```bash
|
||||
# Follow reverb service logs
|
||||
docker-compose logs -f reverb
|
||||
|
||||
# Search for specific window processing
|
||||
docker-compose logs reverb | grep "window_id=abc123"
|
||||
|
||||
# Monitor performance metrics
|
||||
curl http://localhost:8082/metrics | grep backbeat_reverb
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
This is prototype software for the CHORUS platform. See licensing documentation for details.
|
||||
|
||||
## Support
|
||||
|
||||
For issues and questions, please refer to the CHORUS platform documentation or contact the development team.
|
||||
Reference in New Issue
Block a user