Files
DistOS/BACKBEAT.md
2026-02-26 22:16:41 +00:00

14 KiB
Raw Permalink Blame History

Backbeat Protocol — Pulse/Reverb (v0.1)

Purpose: Give CHORUS a shared, lightweight rhythm so multiagent, p2p work doesnt deadlock or drift. Standardise expectations (plan/work/review), exit conditions, promises, and timebounded collaboration across CHORUS, HMMM, SLURP, SHHH, UCXL, WHOOSH, and COOEE.


1) Rationale

  • Problem: In pub/sub meshes, agents can wait indefinitely for help/context; theres no universal cadence for planning, execution, or reevaluation.

  • Principle: Use coarse, explicit tempo (beats/bars) for policy alignment; not for hard realtime sync. Must be partitiontolerant, observable, and cheap.

  • Design: Humanreadable beats/bars/phrases for policy, Hybrid Logical Clocks (HLC) for mergeable ordering.


2) Core Concepts

  • Tempo (BPM): Beats per minute (e.g., 630 BPM). Clusterlevel default; task classes may suggest hints.

  • Beat: Base epoch (e.g., 4 s @ 15 BPM).

  • Bar: Group of beats (e.g., 8). Downbeat (beat 1) is a soft barrier (checkpoints, secret rotation).

  • Phrase: A sequence of bars that maps to a work cycle: plan → work → review.

  • Score (per task): Declarative allocation of beats across phases + wait budgets + retries.


3) Roles & Components

  • Pulse: Cluster tempo broadcaster. Publishes BeatFrame each beat; single elected leader (Raft/etcd), followers can degrade to local.

  • Reverb: Aggregator/rollup. Ingests StatusClaims and emits perbar BarReports, plus hints for adaptive tempo.

  • Agents (CHORUS workers, HMMM collaborators, SLURP, etc.): Consume beats, enforce Score, publish StatusClaims.

  • SHHH: Rotates shortlived secrets on downbeats (perbar keys).

  • COOEE/DHT: Transport for topics backbeat://cluster/{id} and perproject status lanes.

Implementation Snapshot (2025-10)

  • Pulse service (cmd/pulse) Encapsulates Raft leader election (internal/backbeat/leader.go), Hybrid Logical Clock maintenance (internal/backbeat/hlc.go), degradation control (internal/backbeat/degradation.go), and beat publishing over NATS. It also exposes an admin HTTP surface and collects tempo/drift metrics via internal/backbeat/metrics.go.
  • Reverb service (cmd/reverb) Subscribes to pulse beats and agent status subjects, aggregates StatusClaims into rolling windows, and emits BarReports on downbeats. Readiness, health, and Prometheus endpoints report claim throughput, aggregation latency, and NATs failures.
  • Go SDK (pkg/sdk) Provides clients for beat callbacks, status emission, and health reporting with retry/circuit breaker hooks. CHORUS (project-queues/active/CHORUS/internal/backbeat/integration.go) and WHOOSH (project-queues/active/WHOOSH/internal/backbeat/integration.go) embed the SDK to align runtime operations with cluster tempo.
  • Inter-module telemetry CHORUS maps P2P lifecycle operations (elections, DHT bootstrap, council delivery) into BACKBEAT status claims, while WHOOSH emits search/composer activity. This keeps Reverb windows authoritative for council health and informs SLURP/BUBBLE provenance.
  • Observability bundle Monitoring assets (monitoring/, prometheus.yml) plus service metrics export drift, tempo adjustments, Raft state, and window KPIs, meeting BACKBEAT-PER-001/002/003 targets and enabling WHOOSH scaling gates to react to rhythm degradation.

4) Wire Model

4.1 BeatFrame (Pulse → all)

{
  "cluster_id": "chorus-aus-01",
  "tempo_bpm": 15,
  "beat_ms": 4000,
  "bar_len_beats": 8,
  "bar": 1287,
  "beat": 3,
  "phase": "work",
  "hlc": "2025-09-03T02:12:27.183Z+1287:3+17",
  "policy_hash": "sha256:...",
  "deadline_at": "2025-09-03T02:12:31.183Z"
}

4.2 StatusClaim (agents → Reverb)

{
  "agent_id": "chorus-192-168-1-27",
  "task_id": "ucxl://...",
  "bar": 1287,
  "beat": 3,
  "state": "planning|executing|waiting|review|done|failed",
  "wait_for": ["hmmm://thread/abc"],
  "beats_left": 2,
  "progress": 0.42,
  "notes": "awaiting summarised artifact from peer",
  "hlc": "..."
}

4.3 HelpPromise (HMMM → requester)

{
  "thread_id": "hmmm://thread/abc",
  "promise_beats": 2,
  "confidence": 0.7,
  "fail_after_beats": 3,
  "on_fail": "fallback-plan-A"
}

4.4 BarReport (Reverb → observability)

  • Perbar rollup: task counts by state, overruns, broken promises, queue depth, utilisation hints, suggested tempo/phase tweak.

5) Score Spec (YAML)

score:
  tempo: 15              # bpm hint; cluster policy can override
  bar_len: 8             # beats per bar
  phases:
    plan: 2              # beats
    work: 4
    review: 2
  wait_budget:
    help: 2              # max beats to wait for HMMM replies across the phrase
    io: 1                # max beats to wait for I/O
  retry:
    max_phrases: 2
    backoff: geometric   # plan/work/review shrink each retry
  escalation:
    on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]
    on_overrun: ["checkpoint", "defer:next-phrase"]

Rule: Agents must not exceed phase beat allocations. If help budget is exhausted, exit cleanly with degraded but auditable output.


6) Agent Loop (sketch)

on BeatFrame(bf):
  if new bar and beat==1: rotate_ephemeral_keys(); checkpoint();

  phase = score.phase_for(bf.beat)

  switch phase:
    PLAN:
      if not planned: do_planning_until(phase_end)
    WORK:
      if need_help and !help_promised: request_help_with_promise()
      if waiting_for_help:
        if wait_beats > score.wait_budget.help: exit_with_fallback()
        else continue_work_on_alternative_path()
      else do_work_step()
    REVIEW:
      run_tests_and_summarise(); publish StatusClaim(state=done|failed)

  enforce_cutoffs_at_phase_boundaries()

7) Adaptive Tempo Controller (ATC)

  • Inputs: Queue depth per role, GPU/CPU util (WHOOSH), overrun frequency, broken promises.

  • Policy: Adjust tempo_bpm and/or redistribute phase beats between bars only (PIstyle control, hysteresis ±10%).

  • Guardrails: ≤1 beat change per minute; freeze during incidents.


8) Exit Conditions & Deadlock Prevention

  • Wait budgets are hard ceilings. Missing HelpPromise by endofbar triggers on_wait_exhausted.

  • Locks & leases expire at bar boundaries unless renewed with beats_left.

  • Promises include promise_beats and fail_after_beats so callers can plan.

  • Idempotent checkpoints at downbeats enable safe retries/resumptions.


9) Integration Points

  • CHORUS (workers): Consume BeatFrame; enforce Score; publish StatusClaim each beat/change.

  • HMMM (collab): Replies carry HelpPromise; threads autoclose if fail_after_beats elapses.

  • SLURP (curation): Batch ingest windows tied to review beats; produce barstamped artefacts.

  • SHHH (secrets): Rotate per bar; credentials scoped to <cluster,bar>.

  • UCXL: Attach tempo metadata to deliverables: {bar, beat, hlc}; optional address suffix ;bar=1287#beat=8.

  • WHOOSH: Expose utilisation to ATC; enforce resource leases in beat units.

  • COOEE/DHT: Topics: backbeat://cluster/{id}, status://{project}, promise://hmmm.


10) Failure Modes & Degraded Operation

  • No Pulse leader: Agents derive a medianofpulses from available Purses; if none, use local monotonic clock (jitter ok) and freeze tempo changes.

  • Partitions: Keep counting beats locally (HLC ensures mergeable order). Reverb reconciles by HLC and bar on heal.

  • Drift: Tempo changes only on downbeats; publish policy_hash so agents detect misconfig.


11) Config Examples

11.1 Cluster Tempo Policy

cluster_id: chorus-aus-01
initial_bpm: 12
bar_len_beats: 8
phases: [plan, work, review]
limits:
  max_bpm: 24
  min_bpm: 6
adaptation:
  enable: true
  hysteresis_pct: 10
  change_per_minute: 1_beat
observability:
  emit_bar_reports: true

11.2 Task Score (attached to UCXL deliverable)

ucxl: ucxl://proj:any/*/task/graph_ingest
score:
  tempo: 15
  bar_len: 8
  phases: {plan: 2, work: 4, review: 2}
  wait_budget: {help: 2, io: 1}
  retry: {max_phrases: 2, backoff: geometric}
  escalation:
    on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]

12) Observability

  • Perbar dashboards: state counts, overruns, broken promises, tempo changes, queue depth, utilisation.

  • Trace stamps: Every artifact/event carries {bar, beat, hlc} for forensic replay.

  • Alarms: promise_miss_rate, overrun_rate, no_status_claims.


13) Security

  • Rotate ephemeral keys on downbeats; scope to project/role when possible.

  • Barstamped tokens reduce blast radius; revoke at bar+N.


14) Economics & Budgeting — Beats as Unit of Cost

14.1 Beat Unit (BU)

  • Definition: 1 BU = one cluster beat interval (beat_ms). Its the atomic scheduling & accounting quantum.

14.2 Resource Primitives (WHOOSHmeasured)

  • cpu_sec, gpu_sec[class], accel_sec[class], mem_gbs (GB·s), disk_io_mb, net_egress_mb, storage_gbh.

14.3 Budget & Costing

budget:
  max_bu:  N_total
  phase_caps: { plan: Np, work: Nw, review: Nr }
  wait_caps:  { help: Nh, io: Ni }
  hard_end: bar+K
  charge_to: ucxl://acct/...

Cost per phrase:

Total = Σ(beats_used * role_rate_bu)
      + Σ_class(gpu_sec[class] * rate_gpu_sec[class])
      + cpu_sec*rate_cpu_sec + mem_gbs*rate_mem_gbs
      + disk_io_mb*rate_io_mb + net_egress_mb*rate_egress_mb
      + storage_gbh*rate_storage_gbh

14.4 KPIs

  • TNT (temponormalised throughput), BPD (beats per deliverable), WR (wait ratio), η (efficiency), PMR (promise miss rate), CPD (cost per deliverable), TTFU (time to first useful).

15) Tokenless Accounting (Hybrid CPU/GPU, Onprem + Cloud)

  • No tokens. Price beats + measured resources; ignore modeltoken counts.

  • Device classes: price per GPU/accelerator class (A100, 4090, MI300X, TPU…).

  • Rates: onprem from TCO / dutycycle seconds; cloud from persecond list prices. Bind via config.

  • Beatscoped caps: perBU ceilings on resource primitives to contain spend regardless of hardware skew.

  • Calibration (planningonly): perfamily normalisers if you want Effective Compute Units for planning; billing remains raw seconds.


16) MVP Bringup Plan

  1. Pulse: static BPM, broadcast BeatFrame over COOEE.

  2. Agents: publish StatusClaim; enforce wait_budget & HelpPromise.

  3. Reverb: roll up to BarReport; surface early KPIs.

  4. SHHH: rotate credentials on downbeats.

  5. ATC: enable adaptation after telemetry.


17) Open Questions

  • Perrole tempi vs one cluster tempo?

  • Fixed bar_len vs dynamic redistribution of phase beats?

  • Score UI: YAML + visual “score sheet” editor?


Naming (on brand)

  • Backbeat ProtocolPulse (broadcaster) + Reverb (rollup & reports). Musical, expressive; conveys ripples from each downbeat.

Backbeat — Relative Beats Addendum (UCXL ^^/~~)

Why this addendum? Were removing dependence on everincreasing bar/beat counters. All coordination is expressed relative to NOW in beats, aligned with UCXL temporal markers ^^ (future) and ~~ (past).

A) Wire Model Adjustments

BeatFrame (Pulse → all)

Replace prior fields {bar, beat} with:

{
  "cluster_id": "...",
  "tempo_bpm": 15,
  "beat_ms": 4000,
  "bar_len_beats": 8,
  "beat_index": 3,                  // 1..bar_len_beats (cyclic within bar)
  "beat_epoch": "2025-09-03T02:12:27.000Z", // start time of this beat
  "downbeat": false,                // true when beat_index==1
  "phase": "work",
  "hlc": "2025-09-03T02:12:27.183Z+17",
  "policy_hash": "sha256:...",
  "deadline_at": "2025-09-03T02:12:31.183Z"
}

StatusClaim (agents → Reverb)

Replace prior fields {bar, beat} with:

{
  "agent_id": "...",
  "task_id": "...",
  "beat_index": 3,
  "state": "planning|executing|waiting|review|done|failed",
  "beats_left": 2,
  "progress": 0.42,
  "notes": "...",
  "hlc": "..."
}

Bar/Window Aggregation

  • Reverb aggregates per window bounded by downbeat=true frames.

  • No global bar counters are transmitted. Observability UIs may keep a local window_id for navigation.

B) UCXL Temporal Suffix - (Requires RFC-UCXL 1.1)

Attach relative beat navigation to any UCXL address:

  • ;beats=^^N → target N beats in the future from now

  • ;beats=~~N → target N beats in the past from now

  • Optional: ;phase=plan|work|review

Example:

ucxl://proj:any/*/task/ingest;beats=^^4;phase=work

C) Policy & Promises

  • All time budgets are Δbeats: wait_budget.help, retry.max_phrases, promise_beats, fail_after_beats.

  • Leases/locks renew per beat and expire on phase change unless renewed.

D) Derivations

  • beat_index = 1 + floor( (unix_ms / beat_ms) mod bar_len_beats ) (derived locally).

  • beat_epoch = floor_to_multiple(now, beat_ms).

  • Δbeats(target_time) = round( (target_time - now) / beat_ms ).

E) Compatibility Notes

  • Old fields {bar, beat} are deprecated; if received, they can be ignored or mapped to local windows.

  • HLC remains the canonical merge key for causality.

F) Action Items

  1. Update the spec wire model sections accordingly.

  2. Regenerate the Go prototype using BeatIndex/BeatEpoch/Downbeat instead of Bar/Beat counters.

  3. Add UCXL parsing for ;beats=^^/~~ in RUSTLE.

  • TODO: RUSTLE update for BACKBEAT compatibility