Upload files to "/"
This commit is contained in:
454
BACKBEAT.md
Normal file
454
BACKBEAT.md
Normal file
@@ -0,0 +1,454 @@
|
||||
# Backbeat Protocol — Pulse/Reverb (v0.1)
|
||||
|
||||
> **Purpose:** Give CHORUS a shared, lightweight rhythm so multi‑agent, p2p work doesn’t deadlock or drift. Standardise expectations (plan/work/review), exit conditions, promises, and time‑bounded collaboration across CHORUS, HMMM, SLURP, SHHH, UCXL, WHOOSH, and COOEE.
|
||||
|
||||
---
|
||||
|
||||
## 1) Rationale
|
||||
|
||||
- **Problem:** In pub/sub meshes, agents can wait indefinitely for help/context; there’s no universal cadence for planning, execution, or re‑evaluation.
|
||||
|
||||
- **Principle:** Use **coarse, explicit tempo** (beats/bars) for policy alignment; not for hard real‑time sync. Must be **partition‑tolerant**, **observable**, and **cheap**.
|
||||
|
||||
- **Design:** Human‑readable **beats/bars/phrases** for policy, **Hybrid Logical Clocks (HLC)** for mergeable ordering.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 2) Core Concepts
|
||||
|
||||
- **Tempo (BPM):** Beats per minute (e.g., 6–30 BPM). Cluster‑level default; task classes may suggest hints.
|
||||
|
||||
- **Beat:** Base epoch (e.g., 4 s @ 15 BPM).
|
||||
|
||||
- **Bar:** Group of beats (e.g., 8). **Downbeat** (beat 1) is a soft barrier (checkpoints, secret rotation).
|
||||
|
||||
- **Phrase:** A sequence of bars that maps to a work cycle: **plan → work → review**.
|
||||
|
||||
- **Score (per task):** Declarative allocation of beats across phases + wait budgets + retries.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 3) Roles & Components
|
||||
|
||||
- **Pulse:** Cluster tempo broadcaster. Publishes `BeatFrame` each beat; single elected leader (Raft/etcd), followers can degrade to local.
|
||||
|
||||
- **Reverb:** Aggregator/rollup. Ingests `StatusClaim`s and emits per‑bar `BarReport`s, plus hints for adaptive tempo.
|
||||
|
||||
- **Agents (CHORUS workers, HMMM collaborators, SLURP, etc.):** Consume beats, enforce **Score**, publish `StatusClaim`s.
|
||||
|
||||
- **SHHH:** Rotates short‑lived secrets **on downbeats** (per‑bar keys).
|
||||
|
||||
- **COOEE/DHT:** Transport for topics `backbeat://cluster/{id}` and per‑project status lanes.
|
||||
|
||||
|
||||
### Implementation Snapshot (2025-10)
|
||||
|
||||
- **Pulse service (`cmd/pulse`)** – Encapsulates Raft leader election (`internal/backbeat/leader.go`), Hybrid Logical Clock maintenance (`internal/backbeat/hlc.go`), degradation control (`internal/backbeat/degradation.go`), and beat publishing over NATS. It also exposes an admin HTTP surface and collects tempo/drift metrics via `internal/backbeat/metrics.go`.
|
||||
- **Reverb service (`cmd/reverb`)** – Subscribes to pulse beats and agent status subjects, aggregates `StatusClaim`s into rolling windows, and emits `BarReport`s on downbeats. Readiness, health, and Prometheus endpoints report claim throughput, aggregation latency, and NATs failures.
|
||||
- **Go SDK (`pkg/sdk`)** – Provides clients for beat callbacks, status emission, and health reporting with retry/circuit breaker hooks. CHORUS (`project-queues/active/CHORUS/internal/backbeat/integration.go`) and WHOOSH (`project-queues/active/WHOOSH/internal/backbeat/integration.go`) embed the SDK to align runtime operations with cluster tempo.
|
||||
- **Inter-module telemetry** – CHORUS maps P2P lifecycle operations (elections, DHT bootstrap, council delivery) into BACKBEAT status claims, while WHOOSH emits search/composer activity. This keeps Reverb windows authoritative for council health and informs SLURP/BUBBLE provenance.
|
||||
- **Observability bundle** – Monitoring assets (`monitoring/`, `prometheus.yml`) plus service metrics export drift, tempo adjustments, Raft state, and window KPIs, meeting BACKBEAT-PER-001/002/003 targets and enabling WHOOSH scaling gates to react to rhythm degradation.
|
||||
|
||||
---
|
||||
|
||||
## 4) Wire Model
|
||||
|
||||
### 4.1 BeatFrame (Pulse → all)
|
||||
|
||||
```json
|
||||
{
|
||||
"cluster_id": "chorus-aus-01",
|
||||
"tempo_bpm": 15,
|
||||
"beat_ms": 4000,
|
||||
"bar_len_beats": 8,
|
||||
"bar": 1287,
|
||||
"beat": 3,
|
||||
"phase": "work",
|
||||
"hlc": "2025-09-03T02:12:27.183Z+1287:3+17",
|
||||
"policy_hash": "sha256:...",
|
||||
"deadline_at": "2025-09-03T02:12:31.183Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 StatusClaim (agents → Reverb)
|
||||
|
||||
```json
|
||||
{
|
||||
"agent_id": "chorus-192-168-1-27",
|
||||
"task_id": "ucxl://...",
|
||||
"bar": 1287,
|
||||
"beat": 3,
|
||||
"state": "planning|executing|waiting|review|done|failed",
|
||||
"wait_for": ["hmmm://thread/abc"],
|
||||
"beats_left": 2,
|
||||
"progress": 0.42,
|
||||
"notes": "awaiting summarised artifact from peer",
|
||||
"hlc": "..."
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 HelpPromise (HMMM → requester)
|
||||
|
||||
```json
|
||||
{
|
||||
"thread_id": "hmmm://thread/abc",
|
||||
"promise_beats": 2,
|
||||
"confidence": 0.7,
|
||||
"fail_after_beats": 3,
|
||||
"on_fail": "fallback-plan-A"
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 BarReport (Reverb → observability)
|
||||
|
||||
- Per‑bar rollup: task counts by state, overruns, broken promises, queue depth, utilisation hints, suggested tempo/phase tweak.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 5) Score Spec (YAML)
|
||||
|
||||
```yaml
|
||||
score:
|
||||
tempo: 15 # bpm hint; cluster policy can override
|
||||
bar_len: 8 # beats per bar
|
||||
phases:
|
||||
plan: 2 # beats
|
||||
work: 4
|
||||
review: 2
|
||||
wait_budget:
|
||||
help: 2 # max beats to wait for HMMM replies across the phrase
|
||||
io: 1 # max beats to wait for I/O
|
||||
retry:
|
||||
max_phrases: 2
|
||||
backoff: geometric # plan/work/review shrink each retry
|
||||
escalation:
|
||||
on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]
|
||||
on_overrun: ["checkpoint", "defer:next-phrase"]
|
||||
```
|
||||
|
||||
> **Rule:** Agents must not exceed phase beat allocations. If `help` budget is exhausted, **exit cleanly** with degraded but auditable output.
|
||||
|
||||
---
|
||||
|
||||
## 6) Agent Loop (sketch)
|
||||
|
||||
```text
|
||||
on BeatFrame(bf):
|
||||
if new bar and beat==1: rotate_ephemeral_keys(); checkpoint();
|
||||
|
||||
phase = score.phase_for(bf.beat)
|
||||
|
||||
switch phase:
|
||||
PLAN:
|
||||
if not planned: do_planning_until(phase_end)
|
||||
WORK:
|
||||
if need_help and !help_promised: request_help_with_promise()
|
||||
if waiting_for_help:
|
||||
if wait_beats > score.wait_budget.help: exit_with_fallback()
|
||||
else continue_work_on_alternative_path()
|
||||
else do_work_step()
|
||||
REVIEW:
|
||||
run_tests_and_summarise(); publish StatusClaim(state=done|failed)
|
||||
|
||||
enforce_cutoffs_at_phase_boundaries()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7) Adaptive Tempo Controller (ATC)
|
||||
|
||||
- **Inputs:** Queue depth per role, GPU/CPU util (WHOOSH), overrun frequency, broken promises.
|
||||
|
||||
- **Policy:** Adjust `tempo_bpm` and/or redistribute phase beats **between bars only** (PI‑style control, hysteresis ±10%).
|
||||
|
||||
- **Guardrails:** ≤1 beat change per minute; freeze during incidents.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 8) Exit Conditions & Deadlock Prevention
|
||||
|
||||
- **Wait budgets** are hard ceilings. Missing `HelpPromise` by end‑of‑bar triggers `on_wait_exhausted`.
|
||||
|
||||
- **Locks & leases** expire at bar boundaries unless renewed with `beats_left`.
|
||||
|
||||
- **Promises** include `promise_beats` and `fail_after_beats` so callers can plan.
|
||||
|
||||
- **Idempotent checkpoints** at downbeats enable safe retries/resumptions.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 9) Integration Points
|
||||
|
||||
- **CHORUS (workers):** Consume `BeatFrame`; enforce `Score`; publish `StatusClaim` each beat/change.
|
||||
|
||||
- **HMMM (collab):** Replies carry `HelpPromise`; threads auto‑close if `fail_after_beats` elapses.
|
||||
|
||||
- **SLURP (curation):** Batch ingest windows tied to review beats; produce bar‑stamped artefacts.
|
||||
|
||||
- **SHHH (secrets):** Rotate per bar; credentials scoped to `<cluster,bar>`.
|
||||
|
||||
- **UCXL:** Attach tempo metadata to deliverables: `{bar, beat, hlc}`; optional address suffix `;bar=1287#beat=8`.
|
||||
|
||||
- **WHOOSH:** Expose utilisation to ATC; enforce resource leases in beat units.
|
||||
|
||||
- **COOEE/DHT:** Topics: `backbeat://cluster/{id}`, `status://{project}`, `promise://hmmm`.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 10) Failure Modes & Degraded Operation
|
||||
|
||||
- **No Pulse leader:** Agents derive a **median‑of‑pulses** from available Purses; if none, use local monotonic clock (jitter ok) and **freeze tempo changes**.
|
||||
|
||||
- **Partitions:** Keep counting beats locally (HLC ensures mergeable order). Reverb reconciles by HLC and bar on heal.
|
||||
|
||||
- **Drift:** Tempo changes only on downbeats; publish `policy_hash` so agents detect misconfig.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 11) Config Examples
|
||||
|
||||
### 11.1 Cluster Tempo Policy
|
||||
|
||||
```yaml
|
||||
cluster_id: chorus-aus-01
|
||||
initial_bpm: 12
|
||||
bar_len_beats: 8
|
||||
phases: [plan, work, review]
|
||||
limits:
|
||||
max_bpm: 24
|
||||
min_bpm: 6
|
||||
adaptation:
|
||||
enable: true
|
||||
hysteresis_pct: 10
|
||||
change_per_minute: 1_beat
|
||||
observability:
|
||||
emit_bar_reports: true
|
||||
```
|
||||
|
||||
### 11.2 Task Score (attached to UCXL deliverable)
|
||||
|
||||
```yaml
|
||||
ucxl: ucxl://proj:any/*/task/graph_ingest
|
||||
score:
|
||||
tempo: 15
|
||||
bar_len: 8
|
||||
phases: {plan: 2, work: 4, review: 2}
|
||||
wait_budget: {help: 2, io: 1}
|
||||
retry: {max_phrases: 2, backoff: geometric}
|
||||
escalation:
|
||||
on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12) Observability
|
||||
|
||||
- **Per‑bar dashboards:** state counts, overruns, broken promises, tempo changes, queue depth, utilisation.
|
||||
|
||||
- **Trace stamps:** Every artifact/event carries `{bar, beat, hlc}` for forensic replay.
|
||||
|
||||
- **Alarms:** `promise_miss_rate`, `overrun_rate`, `no_status_claims`.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 13) Security
|
||||
|
||||
- Rotate **ephemeral keys on downbeats**; scope to project/role when possible.
|
||||
|
||||
- Bar‑stamped tokens reduce blast radius; revoke at bar+N.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 14) Economics & Budgeting — Beats as Unit of Cost
|
||||
|
||||
### 14.1 Beat Unit (BU)
|
||||
|
||||
- **Definition:** 1 BU = one cluster beat interval (`beat_ms`). It’s the atomic scheduling & accounting quantum.
|
||||
|
||||
|
||||
### 14.2 Resource Primitives (WHOOSH‑measured)
|
||||
|
||||
- `cpu_sec`, `gpu_sec[class]`, `accel_sec[class]`, `mem_gbs` (GB·s), `disk_io_mb`, `net_egress_mb`, `storage_gbh`.
|
||||
|
||||
|
||||
### 14.3 Budget & Costing
|
||||
|
||||
```yaml
|
||||
budget:
|
||||
max_bu: N_total
|
||||
phase_caps: { plan: Np, work: Nw, review: Nr }
|
||||
wait_caps: { help: Nh, io: Ni }
|
||||
hard_end: bar+K
|
||||
charge_to: ucxl://acct/...
|
||||
```
|
||||
|
||||
Cost per phrase:
|
||||
|
||||
```
|
||||
Total = Σ(beats_used * role_rate_bu)
|
||||
+ Σ_class(gpu_sec[class] * rate_gpu_sec[class])
|
||||
+ cpu_sec*rate_cpu_sec + mem_gbs*rate_mem_gbs
|
||||
+ disk_io_mb*rate_io_mb + net_egress_mb*rate_egress_mb
|
||||
+ storage_gbh*rate_storage_gbh
|
||||
```
|
||||
|
||||
### 14.4 KPIs
|
||||
|
||||
- **TNT** (tempo‑normalised throughput), **BPD** (beats per deliverable), **WR** (wait ratio), **η** (efficiency), **PMR** (promise miss rate), **CPD** (cost per deliverable), **TTFU** (time to first useful).
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 15) Tokenless Accounting (Hybrid CPU/GPU, On‑prem + Cloud)
|
||||
|
||||
- **No tokens.** Price **beats + measured resources**; ignore model‑token counts.
|
||||
|
||||
- **Device classes:** price per GPU/accelerator class (A100, 4090, MI300X, TPU…).
|
||||
|
||||
- **Rates:** on‑prem from TCO / duty‑cycle seconds; cloud from per‑second list prices. Bind via config.
|
||||
|
||||
- **Beat‑scoped caps:** per‑BU ceilings on resource primitives to contain spend regardless of hardware skew.
|
||||
|
||||
- **Calibration (planning‑only):** per‑family normalisers if you want **Effective Compute Units** for planning; **billing remains raw seconds**.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 16) MVP Bring‑up Plan
|
||||
|
||||
1. **Pulse**: static BPM, broadcast `BeatFrame` over COOEE.
|
||||
|
||||
2. **Agents**: publish `StatusClaim`; enforce `wait_budget` & `HelpPromise`.
|
||||
|
||||
3. **Reverb**: roll up to `BarReport`; surface early KPIs.
|
||||
|
||||
4. **SHHH**: rotate credentials on downbeats.
|
||||
|
||||
5. **ATC**: enable adaptation after telemetry.
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 17) Open Questions
|
||||
|
||||
- Per‑role tempi vs one cluster tempo?
|
||||
|
||||
- Fixed `bar_len` vs dynamic redistribution of phase beats?
|
||||
|
||||
- Score UI: YAML + visual “score sheet” editor?
|
||||
|
||||
|
||||
---
|
||||
|
||||
### Naming (on brand)
|
||||
|
||||
- **Backbeat Protocol** — **Pulse** (broadcaster) + **Reverb** (rollup & reports). Musical, expressive; conveys ripples from each downbeat.
|
||||
|
||||
|
||||
# Backbeat — Relative Beats Addendum (UCXL ^^/~~)
|
||||
|
||||
**Why this addendum?** We’re removing dependence on ever‑increasing `bar`/`beat` counters. All coordination is expressed **relative to NOW** in **beats**, aligned with UCXL temporal markers `^^` (future) and `~~` (past).
|
||||
|
||||
## A) Wire Model Adjustments
|
||||
|
||||
### BeatFrame (Pulse → all)
|
||||
|
||||
**Replace** prior fields `{bar, beat}` with:
|
||||
|
||||
```json
|
||||
{
|
||||
"cluster_id": "...",
|
||||
"tempo_bpm": 15,
|
||||
"beat_ms": 4000,
|
||||
"bar_len_beats": 8,
|
||||
"beat_index": 3, // 1..bar_len_beats (cyclic within bar)
|
||||
"beat_epoch": "2025-09-03T02:12:27.000Z", // start time of this beat
|
||||
"downbeat": false, // true when beat_index==1
|
||||
"phase": "work",
|
||||
"hlc": "2025-09-03T02:12:27.183Z+17",
|
||||
"policy_hash": "sha256:...",
|
||||
"deadline_at": "2025-09-03T02:12:31.183Z"
|
||||
}
|
||||
```
|
||||
|
||||
### StatusClaim (agents → Reverb)
|
||||
|
||||
**Replace** prior fields `{bar, beat}` with:
|
||||
|
||||
```json
|
||||
{
|
||||
"agent_id": "...",
|
||||
"task_id": "...",
|
||||
"beat_index": 3,
|
||||
"state": "planning|executing|waiting|review|done|failed",
|
||||
"beats_left": 2,
|
||||
"progress": 0.42,
|
||||
"notes": "...",
|
||||
"hlc": "..."
|
||||
}
|
||||
```
|
||||
|
||||
### Bar/Window Aggregation
|
||||
|
||||
- Reverb aggregates per **window** bounded by `downbeat=true` frames.
|
||||
|
||||
- **No global bar counters** are transmitted. Observability UIs may keep a local `window_id` for navigation.
|
||||
|
||||
|
||||
## B) UCXL Temporal Suffix - (Requires RFC-UCXL 1.1)
|
||||
|
||||
Attach **relative beat** navigation to any UCXL address:
|
||||
|
||||
- `;beats=^^N` → target **N beats in the future** from now
|
||||
|
||||
- `;beats=~~N` → target **N beats in the past** from now
|
||||
|
||||
- Optional: `;phase=plan|work|review`
|
||||
|
||||
|
||||
**Example:**
|
||||
|
||||
```
|
||||
ucxl://proj:any/*/task/ingest;beats=^^4;phase=work
|
||||
```
|
||||
|
||||
## C) Policy & Promises
|
||||
|
||||
- All time budgets are **Δbeats**: `wait_budget.help`, `retry.max_phrases`, `promise_beats`, `fail_after_beats`.
|
||||
|
||||
- **Leases/locks** renew per beat and expire on phase change unless renewed.
|
||||
|
||||
|
||||
## D) Derivations
|
||||
|
||||
- `beat_index = 1 + floor( (unix_ms / beat_ms) mod bar_len_beats )` (derived locally).
|
||||
|
||||
- `beat_epoch = floor_to_multiple(now, beat_ms)`.
|
||||
|
||||
- `Δbeats(target_time) = round( (target_time - now) / beat_ms )`.
|
||||
|
||||
|
||||
## E) Compatibility Notes
|
||||
|
||||
- Old fields `{bar, beat}` are **deprecated**; if received, they can be ignored or mapped to local windows.
|
||||
|
||||
- HLC remains the canonical merge key for causality.
|
||||
|
||||
|
||||
## F) Action Items
|
||||
|
||||
1. Update the **spec wire model** sections accordingly.
|
||||
|
||||
2. Regenerate the **Go prototype** using `BeatIndex/BeatEpoch/Downbeat` instead of `Bar/Beat` counters.
|
||||
|
||||
3. Add UCXL parsing for `;beats=^^/~~` in RUSTLE.
|
||||
- [ ] TODO: RUSTLE update for BACKBEAT compatibility
|
||||
Reference in New Issue
Block a user