Design Brief #1
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Got it. Here’s a crisp, implementation-ready design brief for the new project.
SWOOSH — Deterministic Orchestrator (State-Machine Core)
Owner: Platform/Orchestration
Replaces: WHOOSH orchestration module (drop-in)
Integrations: BACKBEAT (or tidal pressure), HMMM (signed comms), UCXL (addressing), BUBBLE (Decision Records), KACHING (licensing), SHHH (policy gate), SLURP (curation/index), MCP registry (tool layer)
1) Purpose & Outcomes
Problem
Chaos-monkey testing exposed brittle behavior in WHOOSH’s event-driven core: replay issues, split-brain council membership, racey ingestion/index swaps, and non-deterministic recovery after node churn.
Goal
Ship SWOOSH, a deterministic hierarchical state machine (statecharts) that governs:
…with idempotent transitions, WAL + checkpoints, and HLC-ordered commits. This must be a drop-in replacement at the module boundary so the rest of CHORUS remains untouched.
Success Criteria (verifiable)
2) Scope / Non-Goals
In scope
Out of scope
3) Users & Interfaces
Primary Interface (unchanged endpoint paths where possible)
POST /transition— propose a guarded state transition (signed, idempotent).GET /state— read current state (projectable).GET /health— readiness/liveness plus consensus status (if clustered).4) Architecture Overview
Core Components
newState = Reduce(oldState, Transition).snapshot + WAL.replay().High-Level Statechart (statechart/hfsm)
Top-level:
BOOT → PROJECT_LIFECYCLE (parallel) → EXECUTION (loop), plus control/meta states.UNINITIALIZED → LICENSE_CHECKIngestion:
DISCOVER → FETCH → VALIDATE → INDEX → READYCouncil:
PLAN_ROLES → ELECT → TOOLING_SYNC → READYEnvironment:
ALLOCATE → PROVISION → HEALTHCHECK → READY|DEGRADEDPLAN → WORK → REVIEW → REVERB → (next window) → PLANPAUSED,RECOVERY,DEGRADEDQUARANTINED(sticky; human or policy update required)TERMINATEDA transition to
EXECUTIONrequires all three lifecycle regions READY with matching content/version hashes.5) Formal State & Transition Contracts
Transition Proposal (JSON)
Rules
idem_keyensures exactly-once application per op class.current_state_hashmust match HEAD (optimistic concurrency). If stale, caller re-reads and resubmits with updated hash.hlcmust be ≥ last applied HLC; ties break by(signer, idem_key).Minimal API
/transition→202 Acceptedor400 Rejected(reason + failing guard)./state?projection=Council,Ingestion→ JSON withhash,projectionobject./health→{status, consensus, lastSnapshotHLC, walLag}.6) Guard Semantics (non-blocking purity)
LICENSE_CHECK; cache token with jittered refresh; configurable grace substate before hard stop.window.isOpen(transitionClass) == true. Pressure amplitude can scale retry backoff.ELECT → TOOLING_SYNCrequires a signed vote set whose member hash equals the proposed council set; signature set verified via HMMM.TOOLING_SYNC → READYrequires active MCP reachability and version pin checks.→ QUARANTINED, carrying rationale/justification; only explicit human release or policy update unblocks.Guards are pure functions over
(state, inputs, invariants)and return{ok, reasons[]}; reasons become part of the DR.7) Data Model (at rest)
State Document (canonical, hashed):
meta: machine version, schema hashboot: license status, identitiesingestion: phase, source set, content hash/version, validation outcomescouncil: planned roles, elected members, quorum cert, tooling statusenvironment: allocations, provisioned assets, healthexecution: plan lock, active beat/window, evidence pointers, approvalscontrol: paused/degraded/recovery flagspolicy: quarantine status, rationaleepoch: monotone counter per regionhlc_last: last applied HLCWAL Record (append-only):
(state_pre_hash, transition_name, inputs_hash, signer, idem_key, hlc, window_id, guard_outcomes, evidence[], timestamp, state_post_hash)Snapshots: Full state plus
lastAppliedIndexandlastAppliedHLC.All artifacts get content-addressed; UCXL addresses are stored, never dereferenced synchronously inside the reducer.
8) Persistence & Recovery
lastAppliedIndex.9) Security
LICENSE_CHECKpasses; grace window configurable.10) Observability
Metrics (Prometheus):
swoosh_transition_total{transition}swoosh_guard_fail_total{guard}swoosh_state_advance_latency_seconds{region}swoosh_retry_count{transition}swoosh_quarantine_totalswoosh_recovery_time_secondsswoosh_wal_lag_recordsTracing: One span per accepted transition; include guard reasons & evidence refs.
Logs: Structured JSON lines. Always include
state_pre_hash,state_post_hash,transition,hlc,window_id.11) Failure Modes & Compensations
READYby content hash; any new index becomes a new version requiring explicit plan change.DEGRADEDgrace substate; stop external comms at grace expiry.DEPROVISION,EJECT/RE-ELECT,ROLLBACK_INDEXare first-class.12) Test Strategy (must-pass)
Property / Model-based testing: Feed the same transition stream to (a) in-proc reducer and (b) full process with WAL. Compare state hashes every N steps.
Chaos suite (automated):
Invariants (assert after each transition):
EXECUTIONentry requires all READY prerequisites with matching content/version hashes.13) Migration & Rollout
idem_key,hlc,window_idto all upstream producers.14) Deliverables
Statechart spec (machine-readable YAML) + explainer doc.
Reducer library (Go) with:
Transition API (OpenAPI + server endpoints) and client SDK.
Guard providers: BACKBEAT/tidal, SHHH policy, HMMM quorum, MCP health, KACHING license.
Observability pack: metrics, logs, traces; Grafana dashboards.
Chaos harness: scripts + container recipes to reproduce failure modes.
Runbooks: recovery, snapshot restore, consensus health triage.
BUBBLE emitters: DR templates and UCXL backlink policy.
15) Acceptance Criteria
state_post_hashacross 3+ replicas after a randomized 10k-transition run.REVIEWandREVERBtransitions only occur in allowed BACKBEAT/tidal windows (verified by guard logs).16) Implementation Notes (for the team)
Language: Go preferred (fits the stack), but keep reducer pure and portable.
Packages:
Codegen: Optional: generate reducer scaffolding from the YAML statechart to ensure switch-exhaustiveness and consistent metrics hooks.
Performance: Batch WAL fsyncs with careful durability trade-offs (configurable); snapshot on REVERB or every N transitions.
17) Risks & Mitigations
18) Open Questions (to resolve before code freeze)
19) Appendix
A) Example Mermaid (conceptual)
B) Minimal OpenAPI (sketch)
POST /transitionwith fields shown in §5GET /state?projection=…returns{hash, projection}Call to Action / Next Steps
statechart.yaml, reducer skeleton, WAL adapter, OpenAPI server skeleton,