Upload files to "/"

This commit is contained in:
2026-02-26 22:16:41 +00:00
parent 99238f12d7
commit 6788dd741c
5 changed files with 2443 additions and 0 deletions

454
BACKBEAT.md Normal file
View File

@@ -0,0 +1,454 @@
# Backbeat Protocol — Pulse/Reverb (v0.1)
> **Purpose:** Give CHORUS a shared, lightweight rhythm so multiagent, p2p work doesnt deadlock or drift. Standardise expectations (plan/work/review), exit conditions, promises, and timebounded collaboration across CHORUS, HMMM, SLURP, SHHH, UCXL, WHOOSH, and COOEE.
---
## 1) Rationale
- **Problem:** In pub/sub meshes, agents can wait indefinitely for help/context; theres no universal cadence for planning, execution, or reevaluation.
- **Principle:** Use **coarse, explicit tempo** (beats/bars) for policy alignment; not for hard realtime sync. Must be **partitiontolerant**, **observable**, and **cheap**.
- **Design:** Humanreadable **beats/bars/phrases** for policy, **Hybrid Logical Clocks (HLC)** for mergeable ordering.
---
## 2) Core Concepts
- **Tempo (BPM):** Beats per minute (e.g., 630 BPM). Clusterlevel default; task classes may suggest hints.
- **Beat:** Base epoch (e.g., 4 s @ 15 BPM).
- **Bar:** Group of beats (e.g., 8). **Downbeat** (beat 1) is a soft barrier (checkpoints, secret rotation).
- **Phrase:** A sequence of bars that maps to a work cycle: **plan → work → review**.
- **Score (per task):** Declarative allocation of beats across phases + wait budgets + retries.
---
## 3) Roles & Components
- **Pulse:** Cluster tempo broadcaster. Publishes `BeatFrame` each beat; single elected leader (Raft/etcd), followers can degrade to local.
- **Reverb:** Aggregator/rollup. Ingests `StatusClaim`s and emits perbar `BarReport`s, plus hints for adaptive tempo.
- **Agents (CHORUS workers, HMMM collaborators, SLURP, etc.):** Consume beats, enforce **Score**, publish `StatusClaim`s.
- **SHHH:** Rotates shortlived secrets **on downbeats** (perbar keys).
- **COOEE/DHT:** Transport for topics `backbeat://cluster/{id}` and perproject status lanes.
### Implementation Snapshot (2025-10)
- **Pulse service (`cmd/pulse`)** Encapsulates Raft leader election (`internal/backbeat/leader.go`), Hybrid Logical Clock maintenance (`internal/backbeat/hlc.go`), degradation control (`internal/backbeat/degradation.go`), and beat publishing over NATS. It also exposes an admin HTTP surface and collects tempo/drift metrics via `internal/backbeat/metrics.go`.
- **Reverb service (`cmd/reverb`)** Subscribes to pulse beats and agent status subjects, aggregates `StatusClaim`s into rolling windows, and emits `BarReport`s on downbeats. Readiness, health, and Prometheus endpoints report claim throughput, aggregation latency, and NATs failures.
- **Go SDK (`pkg/sdk`)** Provides clients for beat callbacks, status emission, and health reporting with retry/circuit breaker hooks. CHORUS (`project-queues/active/CHORUS/internal/backbeat/integration.go`) and WHOOSH (`project-queues/active/WHOOSH/internal/backbeat/integration.go`) embed the SDK to align runtime operations with cluster tempo.
- **Inter-module telemetry** CHORUS maps P2P lifecycle operations (elections, DHT bootstrap, council delivery) into BACKBEAT status claims, while WHOOSH emits search/composer activity. This keeps Reverb windows authoritative for council health and informs SLURP/BUBBLE provenance.
- **Observability bundle** Monitoring assets (`monitoring/`, `prometheus.yml`) plus service metrics export drift, tempo adjustments, Raft state, and window KPIs, meeting BACKBEAT-PER-001/002/003 targets and enabling WHOOSH scaling gates to react to rhythm degradation.
---
## 4) Wire Model
### 4.1 BeatFrame (Pulse → all)
```json
{
"cluster_id": "chorus-aus-01",
"tempo_bpm": 15,
"beat_ms": 4000,
"bar_len_beats": 8,
"bar": 1287,
"beat": 3,
"phase": "work",
"hlc": "2025-09-03T02:12:27.183Z+1287:3+17",
"policy_hash": "sha256:...",
"deadline_at": "2025-09-03T02:12:31.183Z"
}
```
### 4.2 StatusClaim (agents → Reverb)
```json
{
"agent_id": "chorus-192-168-1-27",
"task_id": "ucxl://...",
"bar": 1287,
"beat": 3,
"state": "planning|executing|waiting|review|done|failed",
"wait_for": ["hmmm://thread/abc"],
"beats_left": 2,
"progress": 0.42,
"notes": "awaiting summarised artifact from peer",
"hlc": "..."
}
```
### 4.3 HelpPromise (HMMM → requester)
```json
{
"thread_id": "hmmm://thread/abc",
"promise_beats": 2,
"confidence": 0.7,
"fail_after_beats": 3,
"on_fail": "fallback-plan-A"
}
```
### 4.4 BarReport (Reverb → observability)
- Perbar rollup: task counts by state, overruns, broken promises, queue depth, utilisation hints, suggested tempo/phase tweak.
---
## 5) Score Spec (YAML)
```yaml
score:
tempo: 15 # bpm hint; cluster policy can override
bar_len: 8 # beats per bar
phases:
plan: 2 # beats
work: 4
review: 2
wait_budget:
help: 2 # max beats to wait for HMMM replies across the phrase
io: 1 # max beats to wait for I/O
retry:
max_phrases: 2
backoff: geometric # plan/work/review shrink each retry
escalation:
on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]
on_overrun: ["checkpoint", "defer:next-phrase"]
```
> **Rule:** Agents must not exceed phase beat allocations. If `help` budget is exhausted, **exit cleanly** with degraded but auditable output.
---
## 6) Agent Loop (sketch)
```text
on BeatFrame(bf):
if new bar and beat==1: rotate_ephemeral_keys(); checkpoint();
phase = score.phase_for(bf.beat)
switch phase:
PLAN:
if not planned: do_planning_until(phase_end)
WORK:
if need_help and !help_promised: request_help_with_promise()
if waiting_for_help:
if wait_beats > score.wait_budget.help: exit_with_fallback()
else continue_work_on_alternative_path()
else do_work_step()
REVIEW:
run_tests_and_summarise(); publish StatusClaim(state=done|failed)
enforce_cutoffs_at_phase_boundaries()
```
---
## 7) Adaptive Tempo Controller (ATC)
- **Inputs:** Queue depth per role, GPU/CPU util (WHOOSH), overrun frequency, broken promises.
- **Policy:** Adjust `tempo_bpm` and/or redistribute phase beats **between bars only** (PIstyle control, hysteresis ±10%).
- **Guardrails:** ≤1 beat change per minute; freeze during incidents.
---
## 8) Exit Conditions & Deadlock Prevention
- **Wait budgets** are hard ceilings. Missing `HelpPromise` by endofbar triggers `on_wait_exhausted`.
- **Locks & leases** expire at bar boundaries unless renewed with `beats_left`.
- **Promises** include `promise_beats` and `fail_after_beats` so callers can plan.
- **Idempotent checkpoints** at downbeats enable safe retries/resumptions.
---
## 9) Integration Points
- **CHORUS (workers):** Consume `BeatFrame`; enforce `Score`; publish `StatusClaim` each beat/change.
- **HMMM (collab):** Replies carry `HelpPromise`; threads autoclose if `fail_after_beats` elapses.
- **SLURP (curation):** Batch ingest windows tied to review beats; produce barstamped artefacts.
- **SHHH (secrets):** Rotate per bar; credentials scoped to `<cluster,bar>`.
- **UCXL:** Attach tempo metadata to deliverables: `{bar, beat, hlc}`; optional address suffix `;bar=1287#beat=8`.
- **WHOOSH:** Expose utilisation to ATC; enforce resource leases in beat units.
- **COOEE/DHT:** Topics: `backbeat://cluster/{id}`, `status://{project}`, `promise://hmmm`.
---
## 10) Failure Modes & Degraded Operation
- **No Pulse leader:** Agents derive a **medianofpulses** from available Purses; if none, use local monotonic clock (jitter ok) and **freeze tempo changes**.
- **Partitions:** Keep counting beats locally (HLC ensures mergeable order). Reverb reconciles by HLC and bar on heal.
- **Drift:** Tempo changes only on downbeats; publish `policy_hash` so agents detect misconfig.
---
## 11) Config Examples
### 11.1 Cluster Tempo Policy
```yaml
cluster_id: chorus-aus-01
initial_bpm: 12
bar_len_beats: 8
phases: [plan, work, review]
limits:
max_bpm: 24
min_bpm: 6
adaptation:
enable: true
hysteresis_pct: 10
change_per_minute: 1_beat
observability:
emit_bar_reports: true
```
### 11.2 Task Score (attached to UCXL deliverable)
```yaml
ucxl: ucxl://proj:any/*/task/graph_ingest
score:
tempo: 15
bar_len: 8
phases: {plan: 2, work: 4, review: 2}
wait_budget: {help: 2, io: 1}
retry: {max_phrases: 2, backoff: geometric}
escalation:
on_wait_exhausted: ["emit:needs-attention", "fallback:coarse-answer"]
```
---
## 12) Observability
- **Perbar dashboards:** state counts, overruns, broken promises, tempo changes, queue depth, utilisation.
- **Trace stamps:** Every artifact/event carries `{bar, beat, hlc}` for forensic replay.
- **Alarms:** `promise_miss_rate`, `overrun_rate`, `no_status_claims`.
---
## 13) Security
- Rotate **ephemeral keys on downbeats**; scope to project/role when possible.
- Barstamped tokens reduce blast radius; revoke at bar+N.
---
## 14) Economics & Budgeting — Beats as Unit of Cost
### 14.1 Beat Unit (BU)
- **Definition:** 1 BU = one cluster beat interval (`beat_ms`). Its the atomic scheduling & accounting quantum.
### 14.2 Resource Primitives (WHOOSHmeasured)
- `cpu_sec`, `gpu_sec[class]`, `accel_sec[class]`, `mem_gbs` (GB·s), `disk_io_mb`, `net_egress_mb`, `storage_gbh`.
### 14.3 Budget & Costing
```yaml
budget:
max_bu: N_total
phase_caps: { plan: Np, work: Nw, review: Nr }
wait_caps: { help: Nh, io: Ni }
hard_end: bar+K
charge_to: ucxl://acct/...
```
Cost per phrase:
```
Total = Σ(beats_used * role_rate_bu)
+ Σ_class(gpu_sec[class] * rate_gpu_sec[class])
+ cpu_sec*rate_cpu_sec + mem_gbs*rate_mem_gbs
+ disk_io_mb*rate_io_mb + net_egress_mb*rate_egress_mb
+ storage_gbh*rate_storage_gbh
```
### 14.4 KPIs
- **TNT** (temponormalised throughput), **BPD** (beats per deliverable), **WR** (wait ratio), **η** (efficiency), **PMR** (promise miss rate), **CPD** (cost per deliverable), **TTFU** (time to first useful).
---
## 15) Tokenless Accounting (Hybrid CPU/GPU, Onprem + Cloud)
- **No tokens.** Price **beats + measured resources**; ignore modeltoken counts.
- **Device classes:** price per GPU/accelerator class (A100, 4090, MI300X, TPU…).
- **Rates:** onprem from TCO / dutycycle seconds; cloud from persecond list prices. Bind via config.
- **Beatscoped caps:** perBU ceilings on resource primitives to contain spend regardless of hardware skew.
- **Calibration (planningonly):** perfamily normalisers if you want **Effective Compute Units** for planning; **billing remains raw seconds**.
---
## 16) MVP Bringup Plan
1. **Pulse**: static BPM, broadcast `BeatFrame` over COOEE.
2. **Agents**: publish `StatusClaim`; enforce `wait_budget` & `HelpPromise`.
3. **Reverb**: roll up to `BarReport`; surface early KPIs.
4. **SHHH**: rotate credentials on downbeats.
5. **ATC**: enable adaptation after telemetry.
---
## 17) Open Questions
- Perrole tempi vs one cluster tempo?
- Fixed `bar_len` vs dynamic redistribution of phase beats?
- Score UI: YAML + visual “score sheet” editor?
---
### Naming (on brand)
- **Backbeat Protocol** — **Pulse** (broadcaster) + **Reverb** (rollup & reports). Musical, expressive; conveys ripples from each downbeat.
# Backbeat — Relative Beats Addendum (UCXL ^^/~~)
**Why this addendum?** Were removing dependence on everincreasing `bar`/`beat` counters. All coordination is expressed **relative to NOW** in **beats**, aligned with UCXL temporal markers `^^` (future) and `~~` (past).
## A) Wire Model Adjustments
### BeatFrame (Pulse → all)
**Replace** prior fields `{bar, beat}` with:
```json
{
"cluster_id": "...",
"tempo_bpm": 15,
"beat_ms": 4000,
"bar_len_beats": 8,
"beat_index": 3, // 1..bar_len_beats (cyclic within bar)
"beat_epoch": "2025-09-03T02:12:27.000Z", // start time of this beat
"downbeat": false, // true when beat_index==1
"phase": "work",
"hlc": "2025-09-03T02:12:27.183Z+17",
"policy_hash": "sha256:...",
"deadline_at": "2025-09-03T02:12:31.183Z"
}
```
### StatusClaim (agents → Reverb)
**Replace** prior fields `{bar, beat}` with:
```json
{
"agent_id": "...",
"task_id": "...",
"beat_index": 3,
"state": "planning|executing|waiting|review|done|failed",
"beats_left": 2,
"progress": 0.42,
"notes": "...",
"hlc": "..."
}
```
### Bar/Window Aggregation
- Reverb aggregates per **window** bounded by `downbeat=true` frames.
- **No global bar counters** are transmitted. Observability UIs may keep a local `window_id` for navigation.
## B) UCXL Temporal Suffix - (Requires RFC-UCXL 1.1)
Attach **relative beat** navigation to any UCXL address:
- `;beats=^^N` → target **N beats in the future** from now
- `;beats=~~N` → target **N beats in the past** from now
- Optional: `;phase=plan|work|review`
**Example:**
```
ucxl://proj:any/*/task/ingest;beats=^^4;phase=work
```
## C) Policy & Promises
- All time budgets are **Δbeats**: `wait_budget.help`, `retry.max_phrases`, `promise_beats`, `fail_after_beats`.
- **Leases/locks** renew per beat and expire on phase change unless renewed.
## D) Derivations
- `beat_index = 1 + floor( (unix_ms / beat_ms) mod bar_len_beats )` (derived locally).
- `beat_epoch = floor_to_multiple(now, beat_ms)`.
- `Δbeats(target_time) = round( (target_time - now) / beat_ms )`.
## E) Compatibility Notes
- Old fields `{bar, beat}` are **deprecated**; if received, they can be ignored or mapped to local windows.
- HLC remains the canonical merge key for causality.
## F) Action Items
1. Update the **spec wire model** sections accordingly.
2. Regenerate the **Go prototype** using `BeatIndex/BeatEpoch/Downbeat` instead of `Bar/Beat` counters.
3. Add UCXL parsing for `;beats=^^/~~` in RUSTLE.
- [ ] TODO: RUSTLE update for BACKBEAT compatibility