backbeat: add module sources

2025-10-17 08:56:25 +11:00
parent 627d15b3f7
commit 4b4eb16efb
48 changed files with 11636 additions and 0 deletions
--- a/contracts/docs/tempo-guide.md
+++ b/contracts/docs/tempo-guide.md
@@ -0,0 +1,610 @@
+# BACKBEAT Tempo Guide: Beat Timing and Performance Recommendations
+
+This guide provides comprehensive recommendations for choosing optimal tempo settings, implementing beat processing, and achieving optimal performance in BACKBEAT-enabled CHORUS 2.0.0 services.
+
+## Understanding BACKBEAT Tempo
+
+### Tempo Basics
+
+BACKBEAT tempo is measured in **Beats Per Minute (BPM)**, similar to musical tempo:
+
+- **1 BPM** = 60-second beats (**default**, good for batch processing and recovery windows)
+- **2 BPM** = 30-second beats (good for most services)
+- **4 BPM** = 15-second beats (good for responsive services)
+- **60 BPM** = 1-second beats (good for high-frequency operations)
+
+### Beat Structure
+
+Each beat consists of three equal phases:
+
+```
+Beat Duration = 60 / TempoBPM seconds
+Phase Duration = Beat Duration / 3
+
+┌─────────────┬─────────────┬─────────────┐
+│    PLAN     │   EXECUTE   │   REVIEW    │
+│   Phase 1   │   Phase 2   │   Phase 3   │
+└─────────────┴─────────────┴─────────────┘
+│←────────── Beat Duration ──────────────→│
+```
+
+### Tempo Ranges and Use Cases
+
+| Tempo Range | Beat Duration | Use Cases | Examples |
+|-------------|---------------|-----------|----------|
+| 0.1 - 0.5 BPM | 2-10 minutes | Large batch jobs, ETL | Data warehouse loads, ML training |
+| 0.5 - 2 BPM | 30s - 2 minutes | Standard operations | API services, web apps |
+| 2 - 10 BPM | 6-30 seconds | Responsive services | Real-time dashboards, monitoring |
+| 10 - 60 BPM | 1-6 seconds | High-frequency | Trading systems, IoT data processing |
+| 60+ BPM | <1 second | Ultra-high-frequency | Hardware control, real-time gaming |
+
+## Choosing the Right Tempo
+
+### Workload Analysis
+
+Before selecting tempo, analyze your workload characteristics:
+
+1. **Task Duration**: How long do typical operations take?
+2. **Coordination Needs**: How often do services need to synchronize?
+3. **Resource Requirements**: How much CPU/memory/I/O does work consume?
+4. **Latency Tolerance**: How quickly must the system respond to changes?
+5. **Error Recovery**: How quickly should the system detect and recover from failures?
+
+### Tempo Selection Guidelines
+
+#### Rule 1: Task Duration Constraint
+```
+Recommended Tempo ≤ 60 / (Average Task Duration × 3)
+```
+
+**Example**: If tasks take 5 seconds on average:
+- Maximum recommended tempo = 60 / (5 × 3) = 4 BPM
+- Use 2-4 BPM for safe operation
+
+#### Rule 2: Coordination Frequency
+```
+Coordination Tempo = 60 / Desired Sync Interval
+```
+
+**Example**: If services should sync every 2 minutes:
+- Recommended tempo = 60 / 120 = 0.5 BPM
+
+#### Rule 3: Resource Utilization
+```
+Sustainable Tempo = 60 / (Task Duration + Recovery Time)
+```
+
+**Example**: 10s tasks with 5s recovery time:
+- Maximum sustainable tempo = 60 / (10 + 5) = 4 BPM
+
+### Common Tempo Patterns
+
+#### Development/Testing: 0.1-0.5 BPM
+```json
+{
+  "tempo_bpm": 0.2,
+  "beat_duration": "5 minutes",
+  "use_case": "Development and debugging",
+  "advantages": ["Easy to observe", "Time to investigate issues"],
+  "disadvantages": ["Slow feedback", "Not production realistic"]
+}
+```
+
+#### Standard Services: 1-4 BPM
+```json
+{
+  "tempo_bpm": 2.0,
+  "beat_duration": "30 seconds", 
+  "use_case": "Most production services",
+  "advantages": ["Good balance", "Reasonable coordination", "Error recovery"],
+  "disadvantages": ["May be slow for real-time needs"]
+}
+```
+
+#### Responsive Applications: 4-20 BPM
+```json
+{
+  "tempo_bpm": 10.0,
+  "beat_duration": "6 seconds",
+  "use_case": "Interactive applications",
+  "advantages": ["Quick response", "Fast error detection"],
+  "disadvantages": ["Higher overhead", "More network traffic"]
+}
+```
+
+#### High-Frequency Systems: 20+ BPM
+```json
+{
+  "tempo_bpm": 60.0,
+  "beat_duration": "1 second",
+  "use_case": "Real-time trading, IoT",
+  "advantages": ["Ultra-responsive", "Immediate coordination"],
+  "disadvantages": ["High resource usage", "Network intensive"]
+}
+```
+
+## Implementation Guidelines
+
+### Beat Processing Architecture
+
+#### Single-Threaded Processing
+Best for low-medium tempo (≤10 BPM):
+
+```go
+type BeatProcessor struct {
+    currentBeat int64
+    phase       string
+    workQueue   chan Task
+}
+
+func (p *BeatProcessor) ProcessBeat(frame BeatFrame) {
+    // Update state
+    p.currentBeat = frame.BeatIndex
+    p.phase = frame.Phase
+    
+    // Process phase synchronously
+    switch frame.Phase {
+    case "plan":
+        p.planPhase(frame)
+    case "execute":
+        p.executePhase(frame)
+    case "review":
+        p.reviewPhase(frame)
+    }
+    
+    // Report status before deadline
+    p.reportStatus(frame.BeatIndex, "completed")
+}
+```
+
+#### Pipelined Processing
+Best for high tempo (>10 BPM):
+
+```go
+type PipelinedProcessor struct {
+    planQueue    chan BeatFrame
+    executeQueue chan BeatFrame  
+    reviewQueue  chan BeatFrame
+}
+
+func (p *PipelinedProcessor) Start() {
+    // Separate goroutines for each phase
+    go p.planWorker()
+    go p.executeWorker()
+    go p.reviewWorker()
+}
+
+func (p *PipelinedProcessor) ProcessBeat(frame BeatFrame) {
+    switch frame.Phase {
+    case "plan":
+        p.planQueue <- frame
+    case "execute":
+        p.executeQueue <- frame
+    case "review":
+        p.reviewQueue <- frame
+    }
+}
+```
+
+### Timing Implementation
+
+#### Deadline Management
+
+```go
+func (p *BeatProcessor) executeWithDeadline(frame BeatFrame, work func() error) error {
+    // Calculate remaining time
+    remainingTime := time.Until(frame.DeadlineAt)
+    
+    // Create timeout context
+    ctx, cancel := context.WithTimeout(context.Background(), remainingTime)
+    defer cancel()
+    
+    // Execute with timeout
+    done := make(chan error, 1)
+    go func() {
+        done <- work()
+    }()
+    
+    select {
+    case err := <-done:
+        return err
+    case <-ctx.Done():
+        return fmt.Errorf("work timed out after %v", remainingTime)
+    }
+}
+```
+
+#### Adaptive Processing
+
+```go
+type AdaptiveProcessor struct {
+    processingTimes []time.Duration
+    targetUtilization float64 // 0.8 = use 80% of available time
+}
+
+func (p *AdaptiveProcessor) shouldProcessWork(frame BeatFrame) bool {
+    // Calculate phase time available
+    phaseTime := time.Duration(60/frame.TempoBPM*1000/3) * time.Millisecond
+    
+    // Estimate processing time based on history
+    avgProcessingTime := p.calculateAverage()
+    
+    // Only process if we have enough time
+    requiredTime := time.Duration(float64(avgProcessingTime) / p.targetUtilization)
+    return phaseTime >= requiredTime
+}
+```
+
+### Performance Optimization
+
+#### Batch Processing within Beats
+
+```go
+func (p *BeatProcessor) executePhase(frame BeatFrame) error {
+    // Calculate optimal batch size based on tempo
+    phaseDuration := time.Duration(60/frame.TempoBPM*1000/3) * time.Millisecond
+    targetTime := time.Duration(float64(phaseDuration) * 0.8) // Use 80% of time
+    
+    // Process work in batches
+    batchSize := p.calculateOptimalBatchSize(targetTime)
+    
+    for p.hasWork() && time.Until(frame.DeadlineAt) > time.Second {
+        batch := p.getWorkBatch(batchSize)
+        if err := p.processBatch(batch); err != nil {
+            return err
+        }
+    }
+    
+    return nil
+}
+```
+
+#### Caching and Pre-computation
+
+```go
+type SmartProcessor struct {
+    cache       map[string]interface{}
+    precomputed map[int64]interface{} // Keyed by beat index
+}
+
+func (p *SmartProcessor) planPhase(frame BeatFrame) {
+    // Pre-compute work for future beats during plan phase
+    nextBeat := frame.BeatIndex + 1
+    if _, exists := p.precomputed[nextBeat]; !exists {
+        p.precomputed[nextBeat] = p.precomputeWork(nextBeat)
+    }
+    
+    // Cache frequently accessed data
+    p.cacheRelevantData(frame)
+}
+
+func (p *SmartProcessor) executePhase(frame BeatFrame) {
+    // Use pre-computed results if available
+    if precomputed, exists := p.precomputed[frame.BeatIndex]; exists {
+        return p.usePrecomputedWork(precomputed)
+    }
+    
+    // Fall back to real-time computation
+    return p.computeWork(frame)
+}
+```
+
+## Performance Monitoring
+
+### Key Metrics
+
+Track these metrics for tempo optimization:
+
+```go
+type TempoMetrics struct {
+    // Timing metrics
+    BeatProcessingLatency time.Duration // How long beats take to process
+    PhaseCompletionRate   float64      // % of phases completed on time
+    DeadlineMissRate      float64      // % of deadlines missed
+    
+    // Resource metrics  
+    CPUUtilization        float64      // CPU usage during beats
+    MemoryUtilization     float64      // Memory usage
+    NetworkBandwidth      int64        // Bytes/sec for BACKBEAT messages
+    
+    // Throughput metrics
+    TasksPerBeat         int          // Work completed per beat
+    BeatsPerSecond       float64      // Effective beat processing rate
+    TempoDriftMS         float64      // How far behind/ahead we're running
+}
+```
+
+### Performance Alerts
+
+```go
+func (m *TempoMetrics) checkAlerts() []Alert {
+    var alerts []Alert
+    
+    // Beat processing taking too long
+    if m.BeatProcessingLatency > m.phaseDuration() * 0.9 {
+        alerts = append(alerts, Alert{
+            Level: "warning",
+            Message: "Beat processing approaching deadline",
+            Recommendation: "Consider reducing tempo or optimizing processing",
+        })
+    }
+    
+    // Missing too many deadlines
+    if m.DeadlineMissRate > 0.05 { // 5%
+        alerts = append(alerts, Alert{
+            Level: "critical", 
+            Message: "High deadline miss rate",
+            Recommendation: "Reduce tempo immediately or scale resources",
+        })
+    }
+    
+    // Resource exhaustion
+    if m.CPUUtilization > 0.9 {
+        alerts = append(alerts, Alert{
+            Level: "warning",
+            Message: "High CPU utilization",
+            Recommendation: "Scale up or reduce workload per beat",
+        })
+    }
+    
+    return alerts
+}
+```
+
+### Adaptive Tempo Adjustment
+
+```go
+type TempoController struct {
+    currentTempo   float64
+    targetLatency  time.Duration
+    adjustmentRate float64 // How aggressively to adjust
+}
+
+func (tc *TempoController) adjustTempo(metrics TempoMetrics) float64 {
+    // Calculate desired tempo based on performance
+    if metrics.DeadlineMissRate > 0.02 { // 2% miss rate
+        // Slow down
+        tc.currentTempo *= (1.0 - tc.adjustmentRate)
+    } else if metrics.PhaseCompletionRate > 0.95 && metrics.CPUUtilization < 0.7 {
+        // Speed up
+        tc.currentTempo *= (1.0 + tc.adjustmentRate)
+    }
+    
+    // Apply constraints
+    tc.currentTempo = math.Max(0.1, tc.currentTempo) // Minimum 0.1 BPM
+    tc.currentTempo = math.Min(1000, tc.currentTempo) // Maximum 1000 BPM
+    
+    return tc.currentTempo
+}
+```
+
+## Load Testing and Capacity Planning
+
+### Beat Load Testing
+
+```go
+func TestBeatProcessingUnderLoad(t *testing.T) {
+    processor := NewBeatProcessor()
+    tempo := 10.0 // 10 BPM = 6-second beats
+    beatInterval := time.Duration(60/tempo) * time.Second
+    
+    // Simulate sustained load
+    for i := 0; i < 1000; i++ {
+        frame := generateBeatFrame(i, tempo)
+        
+        start := time.Now()
+        err := processor.ProcessBeat(frame)
+        duration := time.Since(start)
+        
+        // Verify processing completed within phase duration
+        phaseDuration := beatInterval / 3
+        assert.Less(t, duration, phaseDuration)
+        assert.NoError(t, err)
+        
+        // Wait for next beat
+        time.Sleep(beatInterval)
+    }
+}
+```
+
+### Capacity Planning
+
+```go
+type CapacityPlanner struct {
+    maxTempo           float64
+    resourceLimits     ResourceLimits
+    taskCharacteristics TaskProfile
+}
+
+func (cp *CapacityPlanner) calculateMaxTempo() float64 {
+    // Based on CPU capacity
+    cpuConstrainedTempo := 60.0 / (cp.taskCharacteristics.CPUTime * 3)
+    
+    // Based on memory capacity  
+    memConstrainedTempo := cp.resourceLimits.Memory / cp.taskCharacteristics.MemoryPerBeat
+    
+    // Based on I/O capacity
+    ioConstrainedTempo := cp.resourceLimits.IOPS / cp.taskCharacteristics.IOPerBeat
+    
+    // Take the minimum (most restrictive constraint)
+    return math.Min(cpuConstrainedTempo, math.Min(memConstrainedTempo, ioConstrainedTempo))
+}
+```
+
+## Common Patterns and Anti-Patterns
+
+### ✅ Good Patterns
+
+#### Progressive Backoff
+```go
+func (p *Processor) handleOverload() {
+    if p.metrics.DeadlineMissRate > 0.1 {
+        // Temporarily reduce work per beat
+        p.workPerBeat *= 0.8
+        log.Warn("Reducing work per beat due to overload")
+    }
+}
+```
+
+#### Graceful Degradation
+```go
+func (p *Processor) executePhase(frame BeatFrame) error {
+    timeRemaining := time.Until(frame.DeadlineAt)
+    
+    if timeRemaining < p.minimumTime {
+        // Skip non-essential work
+        return p.executeEssentialOnly(frame)
+    }
+    
+    return p.executeFullWorkload(frame)
+}
+```
+
+#### Work Prioritization
+```go
+func (p *Processor) planPhase(frame BeatFrame) {
+    // Sort work by priority and deadline
+    work := p.getAvailableWork()
+    sort.Sort(ByPriorityAndDeadline(work))
+    
+    // Plan only what can be completed in time
+    plannedWork := p.selectWorkForTempo(work, frame.TempoBPM)
+    p.scheduleWork(plannedWork)
+}
+```
+
+### ❌ Anti-Patterns
+
+#### Blocking I/O in Beat Processing
+```go
+// DON'T: Synchronous I/O can cause deadline misses
+func badExecutePhase(frame BeatFrame) error {
+    data := fetchFromDatabase() // Blocking call!
+    return processData(data)
+}
+
+// DO: Use async I/O with timeouts
+func goodExecutePhase(frame BeatFrame) error {
+    ctx, cancel := context.WithDeadline(context.Background(), frame.DeadlineAt)
+    defer cancel()
+    
+    data, err := fetchFromDatabaseAsync(ctx)
+    if err != nil {
+        return err
+    }
+    return processData(data)
+}
+```
+
+#### Ignoring Tempo Changes
+```go
+// DON'T: Assume tempo is constant
+func badBeatHandler(frame BeatFrame) {
+    // Hard-coded timing assumptions
+    time.Sleep(10 * time.Second) // Fails if tempo > 6 BPM!
+}
+
+// DO: Adapt to current tempo
+func goodBeatHandler(frame BeatFrame) {
+    phaseDuration := time.Duration(60/frame.TempoBPM*1000/3) * time.Millisecond
+    maxWorkTime := time.Duration(float64(phaseDuration) * 0.8)
+    
+    // Adapt work to available time
+    ctx, cancel := context.WithTimeout(context.Background(), maxWorkTime)
+    defer cancel()
+    
+    doWork(ctx)
+}
+```
+
+#### Unbounded Work Queues
+```go
+// DON'T: Let work queues grow infinitely
+type BadProcessor struct {
+    workQueue chan Task // Unbounded queue
+}
+
+// DO: Use bounded queues with backpressure
+type GoodProcessor struct {
+    workQueue chan Task // Bounded queue
+    metrics   *TempoMetrics
+}
+
+func (p *GoodProcessor) addWork(task Task) error {
+    select {
+    case p.workQueue <- task:
+        return nil
+    default:
+        p.metrics.WorkRejectedCount++
+        return ErrQueueFull
+    }
+}
+```
+
+## Troubleshooting Performance Issues
+
+### Diagnostic Checklist
+
+1. **Beat Processing Time**: Are beats completing within phase deadlines?
+2. **Resource Utilization**: Is CPU/memory/I/O being over-utilized?
+3. **Network Latency**: Are BACKBEAT messages arriving late?
+4. **Work Distribution**: Is work evenly distributed across beats?
+5. **Error Rates**: Are errors causing processing delays?
+
+### Performance Tuning Steps
+
+1. **Measure Current Performance**
+   ```bash
+   # Monitor beat processing metrics
+   kubectl logs deployment/my-service | grep "beat_processing_time"
+   
+   # Check resource utilization
+   kubectl top pods
+   ```
+
+2. **Identify Bottlenecks**
+   ```go
+   func profileBeatProcessing(frame BeatFrame) {
+       defer func(start time.Time) {
+           log.Infof("Beat %d phase %s took %v", 
+               frame.BeatIndex, frame.Phase, time.Since(start))
+       }(time.Now())
+       
+       // Your beat processing code here
+   }
+   ```
+
+3. **Optimize Critical Paths**
+   - Cache frequently accessed data
+   - Use connection pooling
+   - Implement circuit breakers
+   - Add request timeouts
+
+4. **Scale Resources**
+   - Increase CPU/memory limits
+   - Add more replicas
+   - Use faster storage
+   - Optimize network configuration
+
+5. **Adjust Tempo**
+   - Reduce tempo if overloaded
+   - Increase tempo if under-utilized
+   - Consider tempo auto-scaling
+
+## Future Enhancements
+
+### Planned Features
+
+1. **Dynamic Tempo Scaling**: Automatic tempo adjustment based on load
+2. **Beat Prediction**: ML-based prediction of optimal tempo
+3. **Resource-Aware Scheduling**: Beat scheduling based on resource availability
+4. **Cross-Service Tempo Negotiation**: Services negotiate optimal cluster tempo
+
+### Experimental Features
+
+1. **Hierarchical Beats**: Different tempo for different service types
+2. **Beat Priorities**: Critical beats get processing preference
+3. **Temporal Load Balancing**: Distribute work across beat phases
+4. **Beat Replay**: Replay missed beats during low-load periods
+
+Understanding and implementing these tempo guidelines will ensure your BACKBEAT-enabled services operate efficiently and reliably across the full range of CHORUS 2.0.0 workloads.