Files
BACKBEAT/contracts/docs/tempo-guide.md
2025-10-17 08:56:25 +11:00

17 KiB
Raw Blame History

BACKBEAT Tempo Guide: Beat Timing and Performance Recommendations

This guide provides comprehensive recommendations for choosing optimal tempo settings, implementing beat processing, and achieving optimal performance in BACKBEAT-enabled CHORUS 2.0.0 services.

Understanding BACKBEAT Tempo

Tempo Basics

BACKBEAT tempo is measured in Beats Per Minute (BPM), similar to musical tempo:

  • 1 BPM = 60-second beats (default, good for batch processing and recovery windows)
  • 2 BPM = 30-second beats (good for most services)
  • 4 BPM = 15-second beats (good for responsive services)
  • 60 BPM = 1-second beats (good for high-frequency operations)

Beat Structure

Each beat consists of three equal phases:

Beat Duration = 60 / TempoBPM seconds
Phase Duration = Beat Duration / 3

┌─────────────┬─────────────┬─────────────┐
│    PLAN     │   EXECUTE   │   REVIEW    │
│   Phase 1   │   Phase 2   │   Phase 3   │
└─────────────┴─────────────┴─────────────┘
│←────────── Beat Duration ──────────────→│

Tempo Ranges and Use Cases

Tempo Range Beat Duration Use Cases Examples
0.1 - 0.5 BPM 2-10 minutes Large batch jobs, ETL Data warehouse loads, ML training
0.5 - 2 BPM 30s - 2 minutes Standard operations API services, web apps
2 - 10 BPM 6-30 seconds Responsive services Real-time dashboards, monitoring
10 - 60 BPM 1-6 seconds High-frequency Trading systems, IoT data processing
60+ BPM <1 second Ultra-high-frequency Hardware control, real-time gaming

Choosing the Right Tempo

Workload Analysis

Before selecting tempo, analyze your workload characteristics:

  1. Task Duration: How long do typical operations take?
  2. Coordination Needs: How often do services need to synchronize?
  3. Resource Requirements: How much CPU/memory/I/O does work consume?
  4. Latency Tolerance: How quickly must the system respond to changes?
  5. Error Recovery: How quickly should the system detect and recover from failures?

Tempo Selection Guidelines

Rule 1: Task Duration Constraint

Recommended Tempo ≤ 60 / (Average Task Duration × 3)

Example: If tasks take 5 seconds on average:

  • Maximum recommended tempo = 60 / (5 × 3) = 4 BPM
  • Use 2-4 BPM for safe operation

Rule 2: Coordination Frequency

Coordination Tempo = 60 / Desired Sync Interval

Example: If services should sync every 2 minutes:

  • Recommended tempo = 60 / 120 = 0.5 BPM

Rule 3: Resource Utilization

Sustainable Tempo = 60 / (Task Duration + Recovery Time)

Example: 10s tasks with 5s recovery time:

  • Maximum sustainable tempo = 60 / (10 + 5) = 4 BPM

Common Tempo Patterns

Development/Testing: 0.1-0.5 BPM

{
  "tempo_bpm": 0.2,
  "beat_duration": "5 minutes",
  "use_case": "Development and debugging",
  "advantages": ["Easy to observe", "Time to investigate issues"],
  "disadvantages": ["Slow feedback", "Not production realistic"]
}

Standard Services: 1-4 BPM

{
  "tempo_bpm": 2.0,
  "beat_duration": "30 seconds", 
  "use_case": "Most production services",
  "advantages": ["Good balance", "Reasonable coordination", "Error recovery"],
  "disadvantages": ["May be slow for real-time needs"]
}

Responsive Applications: 4-20 BPM

{
  "tempo_bpm": 10.0,
  "beat_duration": "6 seconds",
  "use_case": "Interactive applications",
  "advantages": ["Quick response", "Fast error detection"],
  "disadvantages": ["Higher overhead", "More network traffic"]
}

High-Frequency Systems: 20+ BPM

{
  "tempo_bpm": 60.0,
  "beat_duration": "1 second",
  "use_case": "Real-time trading, IoT",
  "advantages": ["Ultra-responsive", "Immediate coordination"],
  "disadvantages": ["High resource usage", "Network intensive"]
}

Implementation Guidelines

Beat Processing Architecture

Single-Threaded Processing

Best for low-medium tempo (≤10 BPM):

type BeatProcessor struct {
    currentBeat int64
    phase       string
    workQueue   chan Task
}

func (p *BeatProcessor) ProcessBeat(frame BeatFrame) {
    // Update state
    p.currentBeat = frame.BeatIndex
    p.phase = frame.Phase
    
    // Process phase synchronously
    switch frame.Phase {
    case "plan":
        p.planPhase(frame)
    case "execute":
        p.executePhase(frame)
    case "review":
        p.reviewPhase(frame)
    }
    
    // Report status before deadline
    p.reportStatus(frame.BeatIndex, "completed")
}

Pipelined Processing

Best for high tempo (>10 BPM):

type PipelinedProcessor struct {
    planQueue    chan BeatFrame
    executeQueue chan BeatFrame  
    reviewQueue  chan BeatFrame
}

func (p *PipelinedProcessor) Start() {
    // Separate goroutines for each phase
    go p.planWorker()
    go p.executeWorker()
    go p.reviewWorker()
}

func (p *PipelinedProcessor) ProcessBeat(frame BeatFrame) {
    switch frame.Phase {
    case "plan":
        p.planQueue <- frame
    case "execute":
        p.executeQueue <- frame
    case "review":
        p.reviewQueue <- frame
    }
}

Timing Implementation

Deadline Management

func (p *BeatProcessor) executeWithDeadline(frame BeatFrame, work func() error) error {
    // Calculate remaining time
    remainingTime := time.Until(frame.DeadlineAt)
    
    // Create timeout context
    ctx, cancel := context.WithTimeout(context.Background(), remainingTime)
    defer cancel()
    
    // Execute with timeout
    done := make(chan error, 1)
    go func() {
        done <- work()
    }()
    
    select {
    case err := <-done:
        return err
    case <-ctx.Done():
        return fmt.Errorf("work timed out after %v", remainingTime)
    }
}

Adaptive Processing

type AdaptiveProcessor struct {
    processingTimes []time.Duration
    targetUtilization float64 // 0.8 = use 80% of available time
}

func (p *AdaptiveProcessor) shouldProcessWork(frame BeatFrame) bool {
    // Calculate phase time available
    phaseTime := time.Duration(60/frame.TempoBPM*1000/3) * time.Millisecond
    
    // Estimate processing time based on history
    avgProcessingTime := p.calculateAverage()
    
    // Only process if we have enough time
    requiredTime := time.Duration(float64(avgProcessingTime) / p.targetUtilization)
    return phaseTime >= requiredTime
}

Performance Optimization

Batch Processing within Beats

func (p *BeatProcessor) executePhase(frame BeatFrame) error {
    // Calculate optimal batch size based on tempo
    phaseDuration := time.Duration(60/frame.TempoBPM*1000/3) * time.Millisecond
    targetTime := time.Duration(float64(phaseDuration) * 0.8) // Use 80% of time
    
    // Process work in batches
    batchSize := p.calculateOptimalBatchSize(targetTime)
    
    for p.hasWork() && time.Until(frame.DeadlineAt) > time.Second {
        batch := p.getWorkBatch(batchSize)
        if err := p.processBatch(batch); err != nil {
            return err
        }
    }
    
    return nil
}

Caching and Pre-computation

type SmartProcessor struct {
    cache       map[string]interface{}
    precomputed map[int64]interface{} // Keyed by beat index
}

func (p *SmartProcessor) planPhase(frame BeatFrame) {
    // Pre-compute work for future beats during plan phase
    nextBeat := frame.BeatIndex + 1
    if _, exists := p.precomputed[nextBeat]; !exists {
        p.precomputed[nextBeat] = p.precomputeWork(nextBeat)
    }
    
    // Cache frequently accessed data
    p.cacheRelevantData(frame)
}

func (p *SmartProcessor) executePhase(frame BeatFrame) {
    // Use pre-computed results if available
    if precomputed, exists := p.precomputed[frame.BeatIndex]; exists {
        return p.usePrecomputedWork(precomputed)
    }
    
    // Fall back to real-time computation
    return p.computeWork(frame)
}

Performance Monitoring

Key Metrics

Track these metrics for tempo optimization:

type TempoMetrics struct {
    // Timing metrics
    BeatProcessingLatency time.Duration // How long beats take to process
    PhaseCompletionRate   float64      // % of phases completed on time
    DeadlineMissRate      float64      // % of deadlines missed
    
    // Resource metrics  
    CPUUtilization        float64      // CPU usage during beats
    MemoryUtilization     float64      // Memory usage
    NetworkBandwidth      int64        // Bytes/sec for BACKBEAT messages
    
    // Throughput metrics
    TasksPerBeat         int          // Work completed per beat
    BeatsPerSecond       float64      // Effective beat processing rate
    TempoDriftMS         float64      // How far behind/ahead we're running
}

Performance Alerts

func (m *TempoMetrics) checkAlerts() []Alert {
    var alerts []Alert
    
    // Beat processing taking too long
    if m.BeatProcessingLatency > m.phaseDuration() * 0.9 {
        alerts = append(alerts, Alert{
            Level: "warning",
            Message: "Beat processing approaching deadline",
            Recommendation: "Consider reducing tempo or optimizing processing",
        })
    }
    
    // Missing too many deadlines
    if m.DeadlineMissRate > 0.05 { // 5%
        alerts = append(alerts, Alert{
            Level: "critical", 
            Message: "High deadline miss rate",
            Recommendation: "Reduce tempo immediately or scale resources",
        })
    }
    
    // Resource exhaustion
    if m.CPUUtilization > 0.9 {
        alerts = append(alerts, Alert{
            Level: "warning",
            Message: "High CPU utilization",
            Recommendation: "Scale up or reduce workload per beat",
        })
    }
    
    return alerts
}

Adaptive Tempo Adjustment

type TempoController struct {
    currentTempo   float64
    targetLatency  time.Duration
    adjustmentRate float64 // How aggressively to adjust
}

func (tc *TempoController) adjustTempo(metrics TempoMetrics) float64 {
    // Calculate desired tempo based on performance
    if metrics.DeadlineMissRate > 0.02 { // 2% miss rate
        // Slow down
        tc.currentTempo *= (1.0 - tc.adjustmentRate)
    } else if metrics.PhaseCompletionRate > 0.95 && metrics.CPUUtilization < 0.7 {
        // Speed up
        tc.currentTempo *= (1.0 + tc.adjustmentRate)
    }
    
    // Apply constraints
    tc.currentTempo = math.Max(0.1, tc.currentTempo) // Minimum 0.1 BPM
    tc.currentTempo = math.Min(1000, tc.currentTempo) // Maximum 1000 BPM
    
    return tc.currentTempo
}

Load Testing and Capacity Planning

Beat Load Testing

func TestBeatProcessingUnderLoad(t *testing.T) {
    processor := NewBeatProcessor()
    tempo := 10.0 // 10 BPM = 6-second beats
    beatInterval := time.Duration(60/tempo) * time.Second
    
    // Simulate sustained load
    for i := 0; i < 1000; i++ {
        frame := generateBeatFrame(i, tempo)
        
        start := time.Now()
        err := processor.ProcessBeat(frame)
        duration := time.Since(start)
        
        // Verify processing completed within phase duration
        phaseDuration := beatInterval / 3
        assert.Less(t, duration, phaseDuration)
        assert.NoError(t, err)
        
        // Wait for next beat
        time.Sleep(beatInterval)
    }
}

Capacity Planning

type CapacityPlanner struct {
    maxTempo           float64
    resourceLimits     ResourceLimits
    taskCharacteristics TaskProfile
}

func (cp *CapacityPlanner) calculateMaxTempo() float64 {
    // Based on CPU capacity
    cpuConstrainedTempo := 60.0 / (cp.taskCharacteristics.CPUTime * 3)
    
    // Based on memory capacity  
    memConstrainedTempo := cp.resourceLimits.Memory / cp.taskCharacteristics.MemoryPerBeat
    
    // Based on I/O capacity
    ioConstrainedTempo := cp.resourceLimits.IOPS / cp.taskCharacteristics.IOPerBeat
    
    // Take the minimum (most restrictive constraint)
    return math.Min(cpuConstrainedTempo, math.Min(memConstrainedTempo, ioConstrainedTempo))
}

Common Patterns and Anti-Patterns

Good Patterns

Progressive Backoff

func (p *Processor) handleOverload() {
    if p.metrics.DeadlineMissRate > 0.1 {
        // Temporarily reduce work per beat
        p.workPerBeat *= 0.8
        log.Warn("Reducing work per beat due to overload")
    }
}

Graceful Degradation

func (p *Processor) executePhase(frame BeatFrame) error {
    timeRemaining := time.Until(frame.DeadlineAt)
    
    if timeRemaining < p.minimumTime {
        // Skip non-essential work
        return p.executeEssentialOnly(frame)
    }
    
    return p.executeFullWorkload(frame)
}

Work Prioritization

func (p *Processor) planPhase(frame BeatFrame) {
    // Sort work by priority and deadline
    work := p.getAvailableWork()
    sort.Sort(ByPriorityAndDeadline(work))
    
    // Plan only what can be completed in time
    plannedWork := p.selectWorkForTempo(work, frame.TempoBPM)
    p.scheduleWork(plannedWork)
}

Anti-Patterns

Blocking I/O in Beat Processing

// DON'T: Synchronous I/O can cause deadline misses
func badExecutePhase(frame BeatFrame) error {
    data := fetchFromDatabase() // Blocking call!
    return processData(data)
}

// DO: Use async I/O with timeouts
func goodExecutePhase(frame BeatFrame) error {
    ctx, cancel := context.WithDeadline(context.Background(), frame.DeadlineAt)
    defer cancel()
    
    data, err := fetchFromDatabaseAsync(ctx)
    if err != nil {
        return err
    }
    return processData(data)
}

Ignoring Tempo Changes

// DON'T: Assume tempo is constant
func badBeatHandler(frame BeatFrame) {
    // Hard-coded timing assumptions
    time.Sleep(10 * time.Second) // Fails if tempo > 6 BPM!
}

// DO: Adapt to current tempo
func goodBeatHandler(frame BeatFrame) {
    phaseDuration := time.Duration(60/frame.TempoBPM*1000/3) * time.Millisecond
    maxWorkTime := time.Duration(float64(phaseDuration) * 0.8)
    
    // Adapt work to available time
    ctx, cancel := context.WithTimeout(context.Background(), maxWorkTime)
    defer cancel()
    
    doWork(ctx)
}

Unbounded Work Queues

// DON'T: Let work queues grow infinitely
type BadProcessor struct {
    workQueue chan Task // Unbounded queue
}

// DO: Use bounded queues with backpressure
type GoodProcessor struct {
    workQueue chan Task // Bounded queue
    metrics   *TempoMetrics
}

func (p *GoodProcessor) addWork(task Task) error {
    select {
    case p.workQueue <- task:
        return nil
    default:
        p.metrics.WorkRejectedCount++
        return ErrQueueFull
    }
}

Troubleshooting Performance Issues

Diagnostic Checklist

  1. Beat Processing Time: Are beats completing within phase deadlines?
  2. Resource Utilization: Is CPU/memory/I/O being over-utilized?
  3. Network Latency: Are BACKBEAT messages arriving late?
  4. Work Distribution: Is work evenly distributed across beats?
  5. Error Rates: Are errors causing processing delays?

Performance Tuning Steps

  1. Measure Current Performance

    # Monitor beat processing metrics
    kubectl logs deployment/my-service | grep "beat_processing_time"
    
    # Check resource utilization
    kubectl top pods
    
  2. Identify Bottlenecks

    func profileBeatProcessing(frame BeatFrame) {
        defer func(start time.Time) {
            log.Infof("Beat %d phase %s took %v", 
                frame.BeatIndex, frame.Phase, time.Since(start))
        }(time.Now())
    
        // Your beat processing code here
    }
    
  3. Optimize Critical Paths

    • Cache frequently accessed data
    • Use connection pooling
    • Implement circuit breakers
    • Add request timeouts
  4. Scale Resources

    • Increase CPU/memory limits
    • Add more replicas
    • Use faster storage
    • Optimize network configuration
  5. Adjust Tempo

    • Reduce tempo if overloaded
    • Increase tempo if under-utilized
    • Consider tempo auto-scaling

Future Enhancements

Planned Features

  1. Dynamic Tempo Scaling: Automatic tempo adjustment based on load
  2. Beat Prediction: ML-based prediction of optimal tempo
  3. Resource-Aware Scheduling: Beat scheduling based on resource availability
  4. Cross-Service Tempo Negotiation: Services negotiate optimal cluster tempo

Experimental Features

  1. Hierarchical Beats: Different tempo for different service types
  2. Beat Priorities: Critical beats get processing preference
  3. Temporal Load Balancing: Distribute work across beat phases
  4. Beat Replay: Replay missed beats during low-load periods

Understanding and implementing these tempo guidelines will ensure your BACKBEAT-enabled services operate efficiently and reliably across the full range of CHORUS 2.0.0 workloads.