14 KiB
Backbeat Protocol — Pulse/Reverb (v0.1)
Purpose: Give CHORUS a shared, lightweight rhythm so multi‑agent, p2p work doesn’t deadlock or drift. Standardise expectations (plan/work/review), exit conditions, promises, and time‑bounded collaboration across CHORUS, HMMM, SLURP, SHHH, UCXL, WHOOSH, and COOEE.
1) Rationale
-
Problem: In pub/sub meshes, agents can wait indefinitely for help/context; there’s no universal cadence for planning, execution, or re‑evaluation.
-
Principle: Use coarse, explicit tempo (beats/bars) for policy alignment; not for hard real‑time sync. Must be partition‑tolerant, observable, and cheap.
-
Design: Human‑readable beats/bars/phrases for policy, Hybrid Logical Clocks (HLC) for mergeable ordering.
2) Core Concepts
-
Tempo (BPM): Beats per minute (e.g., 6–30 BPM). Cluster‑level default; task classes may suggest hints.
-
Beat: Base epoch (e.g., 4 s @ 15 BPM).
-
Bar: Group of beats (e.g., 8). Downbeat (beat 1) is a soft barrier (checkpoints, secret rotation).
-
Phrase: A sequence of bars that maps to a work cycle: plan → work → review.
-
Score (per task): Declarative allocation of beats across phases + wait budgets + retries.
3) Roles & Components
-
Pulse: Cluster tempo broadcaster. Publishes
BeatFrameeach beat; single elected leader (Raft/etcd), followers can degrade to local. -
Reverb: Aggregator/rollup. Ingests
StatusClaims and emits per‑barBarReports, plus hints for adaptive tempo. -
Agents (CHORUS workers, HMMM collaborators, SLURP, etc.): Consume beats, enforce Score, publish
StatusClaims. -
SHHH: Rotates short‑lived secrets on downbeats (per‑bar keys).
-
COOEE/DHT: Transport for topics
backbeat://cluster/{id}and per‑project status lanes.
Implementation Snapshot (2025-10)
- Pulse service (
cmd/pulse) – Encapsulates Raft leader election (internal/backbeat/leader.go), Hybrid Logical Clock maintenance (internal/backbeat/hlc.go), degradation control (internal/backbeat/degradation.go), and beat publishing over NATS. It also exposes an admin HTTP surface and collects tempo/drift metrics viainternal/backbeat/metrics.go. - Reverb service (
cmd/reverb) – Subscribes to pulse beats and agent status subjects, aggregatesStatusClaims into rolling windows, and emitsBarReports on downbeats. Readiness, health, and Prometheus endpoints report claim throughput, aggregation latency, and NATs failures. - Go SDK (
pkg/sdk) – Provides clients for beat callbacks, status emission, and health reporting with retry/circuit breaker hooks. CHORUS (project-queues/active/CHORUS/internal/backbeat/integration.go) and WHOOSH (project-queues/active/WHOOSH/internal/backbeat/integration.go) embed the SDK to align runtime operations with cluster tempo. - Inter-module telemetry – CHORUS maps P2P lifecycle operations (elections, DHT bootstrap, council delivery) into BACKBEAT status claims, while WHOOSH emits search/composer activity. This keeps Reverb windows authoritative for council health and informs SLURP/BUBBLE provenance.
- Observability bundle – Monitoring assets (
monitoring/,prometheus.yml) plus service metrics export drift, tempo adjustments, Raft state, and window KPIs, meeting BACKBEAT-PER-001/002/003 targets and enabling WHOOSH scaling gates to react to rhythm degradation.
4) Wire Model
4.1 BeatFrame (Pulse → all)
{
"cluster_id": "chorus-aus-01",
"tempo_bpm": 15,
"beat_ms": 4000,
"bar_len_beats": 8,
"bar": 1287,
"beat": 3,
"phase": "work",
"hlc": "2025-09-03T02:12:27.183Z+1287:3+17",
"policy_hash": "sha256:...",
"deadline_at": "2025-09-03T02:12:31.183Z"
}
4.2 StatusClaim (agents → Reverb)
{
"agent_id": "chorus-192-168-1-27",
"task_id": "ucxl://...",
"bar": 1287,
"beat": 3,
"state": "planning|executing|waiting|review|done|failed",
"wait_for": ["hmmm://thread/abc"],
"beats_left": 2,
"progress": 0.42,
"notes": "awaiting summarised artifact from peer",
"hlc": "..."
}
4.3 HelpPromise (HMMM → requester)
{
"thread_id": "hmmm://thread/abc",
"promise_beats": 2,
"confidence": 0.7,
"fail_after_beats": 3,
"on_fail": "fallback-plan-A"
}
4.4 BarReport (Reverb → observability)
- Per‑bar rollup: task counts by state, overruns, broken promises, queue depth, utilisation hints, suggested tempo/phase tweak.
5) Score Spec (YAML)
score:
tempo: 15 # bpm hint; cluster policy can override
bar_len: 8 # beats per bar
phases:
plan: 2 # beats
work: 4
review: 2
wait_budget:
help: 2 # max beats to wait for HMMM replies across the phrase
io: 1 # max beats to wait for I/O
retry:
max_phrases: 2
backoff: geometric # plan/work/review shrink each retry
escalation:
on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]
on_overrun: ["checkpoint", "defer:next-phrase"]
Rule: Agents must not exceed phase beat allocations. If
helpbudget is exhausted, exit cleanly with degraded but auditable output.
6) Agent Loop (sketch)
on BeatFrame(bf):
if new bar and beat==1: rotate_ephemeral_keys(); checkpoint();
phase = score.phase_for(bf.beat)
switch phase:
PLAN:
if not planned: do_planning_until(phase_end)
WORK:
if need_help and !help_promised: request_help_with_promise()
if waiting_for_help:
if wait_beats > score.wait_budget.help: exit_with_fallback()
else continue_work_on_alternative_path()
else do_work_step()
REVIEW:
run_tests_and_summarise(); publish StatusClaim(state=done|failed)
enforce_cutoffs_at_phase_boundaries()
7) Adaptive Tempo Controller (ATC)
-
Inputs: Queue depth per role, GPU/CPU util (WHOOSH), overrun frequency, broken promises.
-
Policy: Adjust
tempo_bpmand/or redistribute phase beats between bars only (PI‑style control, hysteresis ±10%). -
Guardrails: ≤1 beat change per minute; freeze during incidents.
8) Exit Conditions & Deadlock Prevention
-
Wait budgets are hard ceilings. Missing
HelpPromiseby end‑of‑bar triggerson_wait_exhausted. -
Locks & leases expire at bar boundaries unless renewed with
beats_left. -
Promises include
promise_beatsandfail_after_beatsso callers can plan. -
Idempotent checkpoints at downbeats enable safe retries/resumptions.
9) Integration Points
-
CHORUS (workers): Consume
BeatFrame; enforceScore; publishStatusClaimeach beat/change. -
HMMM (collab): Replies carry
HelpPromise; threads auto‑close iffail_after_beatselapses. -
SLURP (curation): Batch ingest windows tied to review beats; produce bar‑stamped artefacts.
-
SHHH (secrets): Rotate per bar; credentials scoped to
<cluster,bar>. -
UCXL: Attach tempo metadata to deliverables:
{bar, beat, hlc}; optional address suffix;bar=1287#beat=8. -
WHOOSH: Expose utilisation to ATC; enforce resource leases in beat units.
-
COOEE/DHT: Topics:
backbeat://cluster/{id},status://{project},promise://hmmm.
10) Failure Modes & Degraded Operation
-
No Pulse leader: Agents derive a median‑of‑pulses from available Purses; if none, use local monotonic clock (jitter ok) and freeze tempo changes.
-
Partitions: Keep counting beats locally (HLC ensures mergeable order). Reverb reconciles by HLC and bar on heal.
-
Drift: Tempo changes only on downbeats; publish
policy_hashso agents detect misconfig.
11) Config Examples
11.1 Cluster Tempo Policy
cluster_id: chorus-aus-01
initial_bpm: 12
bar_len_beats: 8
phases: [plan, work, review]
limits:
max_bpm: 24
min_bpm: 6
adaptation:
enable: true
hysteresis_pct: 10
change_per_minute: 1_beat
observability:
emit_bar_reports: true
11.2 Task Score (attached to UCXL deliverable)
ucxl: ucxl://proj:any/*/task/graph_ingest
score:
tempo: 15
bar_len: 8
phases: {plan: 2, work: 4, review: 2}
wait_budget: {help: 2, io: 1}
retry: {max_phrases: 2, backoff: geometric}
escalation:
on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]
12) Observability
-
Per‑bar dashboards: state counts, overruns, broken promises, tempo changes, queue depth, utilisation.
-
Trace stamps: Every artifact/event carries
{bar, beat, hlc}for forensic replay. -
Alarms:
promise_miss_rate,overrun_rate,no_status_claims.
13) Security
-
Rotate ephemeral keys on downbeats; scope to project/role when possible.
-
Bar‑stamped tokens reduce blast radius; revoke at bar+N.
14) Economics & Budgeting — Beats as Unit of Cost
14.1 Beat Unit (BU)
- Definition: 1 BU = one cluster beat interval (
beat_ms). It’s the atomic scheduling & accounting quantum.
14.2 Resource Primitives (WHOOSH‑measured)
cpu_sec,gpu_sec[class],accel_sec[class],mem_gbs(GB·s),disk_io_mb,net_egress_mb,storage_gbh.
14.3 Budget & Costing
budget:
max_bu: N_total
phase_caps: { plan: Np, work: Nw, review: Nr }
wait_caps: { help: Nh, io: Ni }
hard_end: bar+K
charge_to: ucxl://acct/...
Cost per phrase:
Total = Σ(beats_used * role_rate_bu)
+ Σ_class(gpu_sec[class] * rate_gpu_sec[class])
+ cpu_sec*rate_cpu_sec + mem_gbs*rate_mem_gbs
+ disk_io_mb*rate_io_mb + net_egress_mb*rate_egress_mb
+ storage_gbh*rate_storage_gbh
14.4 KPIs
- TNT (tempo‑normalised throughput), BPD (beats per deliverable), WR (wait ratio), η (efficiency), PMR (promise miss rate), CPD (cost per deliverable), TTFU (time to first useful).
15) Tokenless Accounting (Hybrid CPU/GPU, On‑prem + Cloud)
-
No tokens. Price beats + measured resources; ignore model‑token counts.
-
Device classes: price per GPU/accelerator class (A100, 4090, MI300X, TPU…).
-
Rates: on‑prem from TCO / duty‑cycle seconds; cloud from per‑second list prices. Bind via config.
-
Beat‑scoped caps: per‑BU ceilings on resource primitives to contain spend regardless of hardware skew.
-
Calibration (planning‑only): per‑family normalisers if you want Effective Compute Units for planning; billing remains raw seconds.
16) MVP Bring‑up Plan
-
Pulse: static BPM, broadcast
BeatFrameover COOEE. -
Agents: publish
StatusClaim; enforcewait_budget&HelpPromise. -
Reverb: roll up to
BarReport; surface early KPIs. -
SHHH: rotate credentials on downbeats.
-
ATC: enable adaptation after telemetry.
17) Open Questions
-
Per‑role tempi vs one cluster tempo?
-
Fixed
bar_lenvs dynamic redistribution of phase beats? -
Score UI: YAML + visual “score sheet” editor?
Naming (on brand)
- Backbeat Protocol — Pulse (broadcaster) + Reverb (rollup & reports). Musical, expressive; conveys ripples from each downbeat.
Backbeat — Relative Beats Addendum (UCXL ^^/~~)
Why this addendum? We’re removing dependence on ever‑increasing bar/beat counters. All coordination is expressed relative to NOW in beats, aligned with UCXL temporal markers ^^ (future) and ~~ (past).
A) Wire Model Adjustments
BeatFrame (Pulse → all)
Replace prior fields {bar, beat} with:
{
"cluster_id": "...",
"tempo_bpm": 15,
"beat_ms": 4000,
"bar_len_beats": 8,
"beat_index": 3, // 1..bar_len_beats (cyclic within bar)
"beat_epoch": "2025-09-03T02:12:27.000Z", // start time of this beat
"downbeat": false, // true when beat_index==1
"phase": "work",
"hlc": "2025-09-03T02:12:27.183Z+17",
"policy_hash": "sha256:...",
"deadline_at": "2025-09-03T02:12:31.183Z"
}
StatusClaim (agents → Reverb)
Replace prior fields {bar, beat} with:
{
"agent_id": "...",
"task_id": "...",
"beat_index": 3,
"state": "planning|executing|waiting|review|done|failed",
"beats_left": 2,
"progress": 0.42,
"notes": "...",
"hlc": "..."
}
Bar/Window Aggregation
-
Reverb aggregates per window bounded by
downbeat=trueframes. -
No global bar counters are transmitted. Observability UIs may keep a local
window_idfor navigation.
B) UCXL Temporal Suffix - (Requires RFC-UCXL 1.1)
Attach relative beat navigation to any UCXL address:
-
;beats=^^N→ target N beats in the future from now -
;beats=~~N→ target N beats in the past from now -
Optional:
;phase=plan|work|review
Example:
ucxl://proj:any/*/task/ingest;beats=^^4;phase=work
C) Policy & Promises
-
All time budgets are Δbeats:
wait_budget.help,retry.max_phrases,promise_beats,fail_after_beats. -
Leases/locks renew per beat and expire on phase change unless renewed.
D) Derivations
-
beat_index = 1 + floor( (unix_ms / beat_ms) mod bar_len_beats )(derived locally). -
beat_epoch = floor_to_multiple(now, beat_ms). -
Δbeats(target_time) = round( (target_time - now) / beat_ms ).
E) Compatibility Notes
-
Old fields
{bar, beat}are deprecated; if received, they can be ignored or mapped to local windows. -
HLC remains the canonical merge key for causality.
F) Action Items
-
Update the spec wire model sections accordingly.
-
Regenerate the Go prototype using
BeatIndex/BeatEpoch/Downbeatinstead ofBar/Beatcounters. -
Add UCXL parsing for
;beats=^^/~~in RUSTLE.
- TODO: RUSTLE update for BACKBEAT compatibility