Design Brief #1

Open
opened 2025-10-19 00:53:22 +00:00 by tony · 0 comments
Owner

Got it. Here’s a crisp, implementation-ready design brief for the new project.

SWOOSH — Deterministic Orchestrator (State-Machine Core)

Owner: Platform/Orchestration
Replaces: WHOOSH orchestration module (drop-in)
Integrations: BACKBEAT (or tidal pressure), HMMM (signed comms), UCXL (addressing), BUBBLE (Decision Records), KACHING (licensing), SHHH (policy gate), SLURP (curation/index), MCP registry (tool layer)


1) Purpose & Outcomes

Problem

Chaos-monkey testing exposed brittle behavior in WHOOSH’s event-driven core: replay issues, split-brain council membership, racey ingestion/index swaps, and non-deterministic recovery after node churn.

Goal

Ship SWOOSH, a deterministic hierarchical state machine (statecharts) that governs:

  • Project ingestion
  • Council formation and readiness
  • Environment allocation/provisioning
  • Execution loop (PLAN → WORK → REVIEW → REVERB)

…with idempotent transitions, WAL + checkpoints, and HLC-ordered commits. This must be a drop-in replacement at the module boundary so the rest of CHORUS remains untouched.

Success Criteria (verifiable)

  • Deterministic replay from WAL + latest snapshot produces a bit-identical state hash across replicas.
  • No invariant violations across a large randomized chaos suite (duplicate/reordered messages, node churn, license throttling).
  • Time-boxed transitions honor BACKBEAT/tidal windows; approval and submission happen only at permitted phases (REVERB).
  • Every accepted transition emits a BUBBLE Decision Record with UCXL backlink and guard justifications.

2) Scope / Non-Goals

In scope

  • New state-machine core (reducers, guards, compensation).
  • Transition API (signed proposals, idempotency).
  • WAL, snapshots, rehydration, monotonic HLC ordering.
  • Backpressure/pressure windows via BACKBEAT/tidal semantics.
  • Council quorum certificates via HMMM-signed votes.
  • Policy/quarantine flows through SHHH.
  • Observability: metrics, logs, traces for transitions and guard outcomes.
  • Migration shim to replace WHOOSH in the stack.

Out of scope

  • Redesign of HMMM transport, SLURP curation logic, or MCP/plugin ecosystems.
  • Rewriting UI or CLI beyond what’s required to speak the new API.

3) Users & Interfaces

  • Callers: WHOOSH clients (unchanged semantic surface), CI operators, council agents, ingestion workers.
  • External Systems: KACHING (license), HMMM (signed proposals & votes), SHHH (policy engine), SLURP/UCXL/BUBBLE, MCP registry.

Primary Interface (unchanged endpoint paths where possible)

  • POST /transition — propose a guarded state transition (signed, idempotent).
  • GET /state — read current state (projectable).
  • GET /health — readiness/liveness plus consensus status (if clustered).

4) Architecture Overview

Core Components

  • Reducer: Pure function newState = Reduce(oldState, Transition).
  • Guard Providers: BACKBEAT window gate, policy checks, quorum verification, MCP health probes, license gating.
  • WAL: Append-only log of accepted transitions.
  • Snapshots: Periodic state snapshots; restart = snapshot + WAL.replay().
  • Clocking: Hybrid Logical Clock (HLC) per transition for total ordering across replicas.
  • Consensus (small): Critical decisions (e.g., elected council set) are committed via a small Raft/Paxos group (3–5) or encoded as quorum certificates validated by guards and then accepted by a single leader.

High-Level Statechart (statechart/hfsm)

Top-level: BOOT → PROJECT_LIFECYCLE (parallel) → EXECUTION (loop), plus control/meta states.

  • BOOT: UNINITIALIZED → LICENSE_CHECK
  • PROJECT_LIFECYCLE (parallel regions):
    Ingestion: DISCOVER → FETCH → VALIDATE → INDEX → READY
    Council: PLAN_ROLES → ELECT → TOOLING_SYNC → READY
    Environment: ALLOCATE → PROVISION → HEALTHCHECK → READY|DEGRADED
  • EXECUTION: PLAN → WORK → REVIEW → REVERB → (next window) → PLAN
  • CONTROL: PAUSED, RECOVERY, DEGRADED
  • Policy: QUARANTINED (sticky; human or policy update required)
  • Terminal: TERMINATED

A transition to EXECUTION requires all three lifecycle regions READY with matching content/version hashes.


5) Formal State & Transition Contracts

Transition Proposal (JSON)

{
  "current_state_hash": "sha256-…",
  "transition": "Council_QuorumCert", 
  "inputs_hash": "sha256-…",
  "signer": "hmmm:pubkey:…",
  "idem_key": "goalId:beatIndex:opClass:uuid",
  "hlc": "2025-10-19T01:23:45.678+00:00#1234",
  "window_id": "backbeat:W-2025-10-19-02",
  "evidence": [
    "ucxl://chorus:spec@council/…",
    "bubble://dr/…"
  ]
}

Rules

  • idem_key ensures exactly-once application per op class.
  • current_state_hash must match HEAD (optimistic concurrency). If stale, caller re-reads and resubmits with updated hash.
  • hlc must be ≥ last applied HLC; ties break by (signer, idem_key).

Minimal API

  • POST /transition202 Accepted or 400 Rejected (reason + failing guard).
  • GET /state?projection=Council,Ingestion → JSON with hash, projection object.
  • GET /health{status, consensus, lastSnapshotHLC, walLag}.

6) Guard Semantics (non-blocking purity)

  • License (KACHING): Hard gate in LICENSE_CHECK; cache token with jittered refresh; configurable grace substate before hard stop.
  • BACKBEAT/Tidal Windows: Each time-gated transition requires window.isOpen(transitionClass) == true. Pressure amplitude can scale retry backoff.
  • Quorum Certificate (Council): Transition ELECT → TOOLING_SYNC requires a signed vote set whose member hash equals the proposed council set; signature set verified via HMMM.
  • MCP/Tooling Health: TOOLING_SYNC → READY requires active MCP reachability and version pin checks.
  • Policy (SHHH): Any policy breach triggers → QUARANTINED, carrying rationale/justification; only explicit human release or policy update unblocks.

Guards are pure functions over (state, inputs, invariants) and return {ok, reasons[]}; reasons become part of the DR.


7) Data Model (at rest)

  • State Document (canonical, hashed):

    • meta: machine version, schema hash
    • boot: license status, identities
    • ingestion: phase, source set, content hash/version, validation outcomes
    • council: planned roles, elected members, quorum cert, tooling status
    • environment: allocations, provisioned assets, health
    • execution: plan lock, active beat/window, evidence pointers, approvals
    • control: paused/degraded/recovery flags
    • policy: quarantine status, rationale
    • epoch: monotone counter per region
    • hlc_last: last applied HLC
  • WAL Record (append-only):

    • (state_pre_hash, transition_name, inputs_hash, signer, idem_key, hlc, window_id, guard_outcomes, evidence[], timestamp, state_post_hash)
  • Snapshots: Full state plus lastAppliedIndex and lastAppliedHLC.

All artifacts get content-addressed; UCXL addresses are stored, never dereferenced synchronously inside the reducer.


8) Persistence & Recovery

  • Store: Badger/RocksDB (local) with periodic snapshot to ZFS.
  • Recovery: Load latest snapshot, replay WAL from lastAppliedIndex.
  • Cross-node: Either use a tiny replication group for WAL (Raft) or single-writer with followers tailing WAL via HMMM+AGE channel. Determinism is preserved by HLC+idem rules.

9) Security

  • Proposals: Must be HMMM-signed by an authorized identity.
  • Secrets: AGE-encrypted configs; never log plaintext.
  • License: All network activity is blocked until LICENSE_CHECK passes; grace window configurable.
  • Audit: Each accepted transition emits a BUBBLE DR containing: guard outcomes, signer, hashes, UCXL backlinks, and justification.

10) Observability

  • Metrics (Prometheus):

    • swoosh_transition_total{transition}
    • swoosh_guard_fail_total{guard}
    • swoosh_state_advance_latency_seconds{region}
    • swoosh_retry_count{transition}
    • swoosh_quarantine_total
    • swoosh_recovery_time_seconds
    • swoosh_wal_lag_records
  • Tracing: One span per accepted transition; include guard reasons & evidence refs.

  • Logs: Structured JSON lines. Always include state_pre_hash, state_post_hash, transition, hlc, window_id.


11) Failure Modes & Compensations

  • Split-brain council: Prevent by requiring quorum cert; reject if member set hash mismatch.
  • Index swap mid-flight: Pin ingestion READY by content hash; any new index becomes a new version requiring explicit plan change.
  • Duplicate deliveries: Idem keys short-circuit repeats; reducer is idempotent.
  • License throttling: Enter DEGRADED grace substate; stop external comms at grace expiry.
  • Provisioning partials: Compensation transitions DEPROVISION, EJECT/RE-ELECT, ROLLBACK_INDEX are first-class.

12) Test Strategy (must-pass)

  • Property / Model-based testing: Feed the same transition stream to (a) in-proc reducer and (b) full process with WAL. Compare state hashes every N steps.

  • Chaos suite (automated):

    • Duplicate, reorder, and drop transitions (1–10%).
    • Random orchestrator restarts and forced snapshot rollback.
    • License server latency/denial.
    • MCP registry flaps, env capacity churn.
  • Invariants (assert after each transition):

    • HLC monotonicity (no time travel).
    • Exactly-once epoch advance per region.
    • EXECUTION entry requires all READY prerequisites with matching content/version hashes.
    • Quorum certificate matches elected set hash.

13) Migration & Rollout

  1. Shadow mode: Run SWOOSH alongside WHOOSH; mirror event inputs; produce shadow state and hash; compare to legacy expectations.
  2. Idempotency headers: Add idem_key, hlc, window_id to all upstream producers.
  3. Feature flag: Route only council formation through SWOOSH first; then ingestion; then execution loop.
  4. Cut-over: Disable WHOOSH mutation path; SWOOSH becomes single source of truth. Keep bypass flag for emergency rollback.
  5. Post-cut soak: Extended chaos-monkey run with SLA/SLO monitors enabled.

14) Deliverables

  • Statechart spec (machine-readable YAML) + explainer doc.

  • Reducer library (Go) with:

    • Transition registry, guard adaptor interfaces, compensation hooks.
    • WAL + snapshot adapters (Badger/RocksDB).
    • HLC utility.
  • Transition API (OpenAPI + server endpoints) and client SDK.

  • Guard providers: BACKBEAT/tidal, SHHH policy, HMMM quorum, MCP health, KACHING license.

  • Observability pack: metrics, logs, traces; Grafana dashboards.

  • Chaos harness: scripts + container recipes to reproduce failure modes.

  • Runbooks: recovery, snapshot restore, consensus health triage.

  • BUBBLE emitters: DR templates and UCXL backlink policy.


15) Acceptance Criteria

  • Deterministic replay = identical state_post_hash across 3+ replicas after a randomized 10k-transition run.
  • Zero invariant violations under full chaos suite runs.
  • REVIEW and REVERB transitions only occur in allowed BACKBEAT/tidal windows (verified by guard logs).
  • All accepted transitions produce BUBBLE DRs with complete guard rationales and UCXL links.
  • Seamless drop-in: existing callers function without code changes beyond the new idempotency headers.

16) Implementation Notes (for the team)

  • Language: Go preferred (fits the stack), but keep reducer pure and portable.

  • Packages:

    • Storage: Badger/RocksDB binding
    • Consensus: built-in Raft (if used) scoped to orchestrator metadata only
    • OpenAPI → chi/echo/gin (team choice)
    • HLC: local utility (simple logical clock with walltime)
  • Codegen: Optional: generate reducer scaffolding from the YAML statechart to ensure switch-exhaustiveness and consistent metrics hooks.

  • Performance: Batch WAL fsyncs with careful durability trade-offs (configurable); snapshot on REVERB or every N transitions.


17) Risks & Mitigations

  • Consensus complexity creep: Limit Raft scope to critical decisions or rely on quorum certificates validated by guards; keep a single writer during bootstrap.
  • Guard latency (policy, MCP checks): All guards are pure and should consult cached, signed facts; long-running probes run out-of-band and materialize as inputs/evidence hashes.
  • Operator burden: Provide clear runbooks and dashboards; surface reasons on every rejection.

18) Open Questions (to resolve before code freeze)

  1. Do we require Raft for WAL replication at v1, or accept single-writer + follower tailing?
  2. What is the minimum quorum size and election algorithm for council in heterogeneous clusters?
  3. Exact tidal window schedule schema (amplitude/phase controls) to codify in BACKBEAT provider.
  4. DR redaction policy: which fields are included/excluded in BUBBLE for sensitive ops?

19) Appendix

A) Example Mermaid (conceptual)

stateDiagram-v2
  [*] --> UNINITIALIZED
  UNINITIALIZED --> LICENSE_CHECK : config_ok
  LICENSE_CHECK --> PROJECT_LIFECYCLE : licensed
  state PROJECT_LIFECYCLE {
    state Ingestion {
      [*] --> DISCOVER
      DISCOVER --> FETCH : sources_resolved
      FETCH --> VALIDATE : bytes_ok
      VALIDATE --> INDEX : schema_ok
      VALIDATE --> QUARANTINED : policy_violation
      INDEX --> READY : corpus_built
    }
    --
    state Council {
      [*] --> PLAN_ROLES
      PLAN_ROLES --> ELECT : profiles_loaded
      ELECT --> TOOLING_SYNC : quorum_cert
      TOOLING_SYNC --> READY : mcp_green
    }
    --
    state Environment {
      [*] --> ALLOCATE
      ALLOCATE --> PROVISION : capacity_ok
      PROVISION --> HEALTHCHECK : installed
      HEALTHCHECK --> READY : green
      HEALTHCHECK --> DEGRADED : amber
    }
  }
  PROJECT_LIFECYCLE --> EXECUTION : all_ready
  state EXECUTION {
    [*] --> PLAN
    PLAN --> WORK : plan_locked
    WORK --> REVIEW : beat_review_gate
    REVIEW --> REVERB : approvals_threshold
    REVERB --> PLAN : next_window
  }
  EXECUTION --> RECOVERY : quorum_lost
  QUARANTINED --> TERMINATED : confirmed_block

B) Minimal OpenAPI (sketch)

  • POST /transition with fields shown in §5
  • GET /state?projection=… returns {hash, projection}

Call to Action / Next Steps

  • statechart.yaml, reducer skeleton, WAL adapter, OpenAPI server skeleton,
  • guard provider interfaces (BACKBEAT, SHHH, HMMM, MCP, KACHING),
  • chaos harness scaffold and CI.
  • Then we iterate on the Open Questions and lock the v1 guard contracts.
Got it. Here’s a crisp, **implementation-ready design brief** for the new project. # SWOOSH — Deterministic Orchestrator (State-Machine Core) **Owner:** Platform/Orchestration **Replaces:** WHOOSH orchestration module (drop-in) **Integrations:** BACKBEAT (or tidal pressure), HMMM (signed comms), UCXL (addressing), BUBBLE (Decision Records), KACHING (licensing), SHHH (policy gate), SLURP (curation/index), MCP registry (tool layer) --- ## 1) Purpose & Outcomes ### Problem Chaos-monkey testing exposed brittle behavior in WHOOSH’s event-driven core: replay issues, split-brain council membership, racey ingestion/index swaps, and non-deterministic recovery after node churn. ### Goal Ship **SWOOSH**, a **deterministic hierarchical state machine** (statecharts) that governs: * Project ingestion * Council formation and readiness * Environment allocation/provisioning * Execution loop (PLAN → WORK → REVIEW → REVERB) …with **idempotent transitions, WAL + checkpoints, and HLC-ordered commits**. This must be a **drop-in** replacement at the module boundary so the rest of CHORUS remains untouched. ### Success Criteria (verifiable) * Deterministic replay from WAL + latest snapshot produces a **bit-identical state hash** across replicas. * No invariant violations across a large randomized chaos suite (duplicate/reordered messages, node churn, license throttling). * Time-boxed transitions honor **BACKBEAT/tidal windows**; approval and submission happen only at permitted phases (REVERB). * Every accepted transition emits a **BUBBLE Decision Record** with UCXL backlink and guard justifications. --- ## 2) Scope / Non-Goals **In scope** * New state-machine core (reducers, guards, compensation). * Transition API (signed proposals, idempotency). * WAL, snapshots, rehydration, monotonic HLC ordering. * Backpressure/pressure windows via BACKBEAT/tidal semantics. * Council quorum certificates via HMMM-signed votes. * Policy/quarantine flows through SHHH. * Observability: metrics, logs, traces for transitions and guard outcomes. * Migration shim to replace WHOOSH in the stack. **Out of scope** * Redesign of HMMM transport, SLURP curation logic, or MCP/plugin ecosystems. * Rewriting UI or CLI beyond what’s required to speak the new API. --- ## 3) Users & Interfaces * **Callers:** WHOOSH clients (unchanged semantic surface), CI operators, council agents, ingestion workers. * **External Systems:** KACHING (license), HMMM (signed proposals & votes), SHHH (policy engine), SLURP/UCXL/BUBBLE, MCP registry. **Primary Interface (unchanged endpoint paths where possible)** * `POST /transition` — propose a guarded state transition (signed, idempotent). * `GET /state` — read current state (projectable). * `GET /health` — readiness/liveness plus consensus status (if clustered). --- ## 4) Architecture Overview ### Core Components * **Reducer:** Pure function `newState = Reduce(oldState, Transition)`. * **Guard Providers:** BACKBEAT window gate, policy checks, quorum verification, MCP health probes, license gating. * **WAL:** Append-only log of accepted transitions. * **Snapshots:** Periodic state snapshots; restart = `snapshot + WAL.replay()`. * **Clocking:** Hybrid Logical Clock (HLC) per transition for total ordering across replicas. * **Consensus (small):** Critical decisions (e.g., elected council set) are committed via a small Raft/Paxos group (3–5) or encoded as quorum certificates validated by guards and then accepted by a single leader. ### High-Level Statechart (statechart/hfsm) Top-level: `BOOT → PROJECT_LIFECYCLE (parallel) → EXECUTION (loop)`, plus control/meta states. * **BOOT**: `UNINITIALIZED → LICENSE_CHECK` * **PROJECT_LIFECYCLE** *(parallel regions)*: **Ingestion:** `DISCOVER → FETCH → VALIDATE → INDEX → READY` **Council:** `PLAN_ROLES → ELECT → TOOLING_SYNC → READY` **Environment:** `ALLOCATE → PROVISION → HEALTHCHECK → READY|DEGRADED` * **EXECUTION:** `PLAN → WORK → REVIEW → REVERB → (next window) → PLAN` * **CONTROL:** `PAUSED`, `RECOVERY`, `DEGRADED` * **Policy:** `QUARANTINED` (sticky; human or policy update required) * **Terminal:** `TERMINATED` A transition to `EXECUTION` requires all three lifecycle regions **READY** with matching content/version hashes. --- ## 5) Formal State & Transition Contracts ### Transition Proposal (JSON) ```json { "current_state_hash": "sha256-…", "transition": "Council_QuorumCert", "inputs_hash": "sha256-…", "signer": "hmmm:pubkey:…", "idem_key": "goalId:beatIndex:opClass:uuid", "hlc": "2025-10-19T01:23:45.678+00:00#1234", "window_id": "backbeat:W-2025-10-19-02", "evidence": [ "ucxl://chorus:spec@council/…", "bubble://dr/…" ] } ``` **Rules** * `idem_key` ensures **exactly-once** application per op class. * `current_state_hash` must match HEAD (optimistic concurrency). If stale, caller re-reads and resubmits with updated hash. * `hlc` must be ≥ last applied HLC; ties break by `(signer, idem_key)`. ### Minimal API * **POST `/transition`** → `202 Accepted` or `400 Rejected` (reason + failing guard). * **GET `/state?projection=Council,Ingestion`** → JSON with `hash`, `projection` object. * **GET `/health`** → `{status, consensus, lastSnapshotHLC, walLag}`. --- ## 6) Guard Semantics (non-blocking purity) * **License (KACHING):** Hard gate in `LICENSE_CHECK`; cache token with jittered refresh; configurable grace substate before hard stop. * **BACKBEAT/Tidal Windows:** Each time-gated transition requires `window.isOpen(transitionClass) == true`. Pressure amplitude can scale retry backoff. * **Quorum Certificate (Council):** Transition `ELECT → TOOLING_SYNC` requires a **signed vote set** whose member hash equals the proposed council set; signature set verified via HMMM. * **MCP/Tooling Health:** `TOOLING_SYNC → READY` requires active MCP reachability and version pin checks. * **Policy (SHHH):** Any policy breach triggers `→ QUARANTINED`, carrying rationale/justification; only explicit human release or policy update unblocks. Guards are pure functions over `(state, inputs, invariants)` and return `{ok, reasons[]}`; reasons become part of the DR. --- ## 7) Data Model (at rest) * **State Document (canonical, hashed):** * `meta`: machine version, schema hash * `boot`: license status, identities * `ingestion`: phase, source set, content hash/version, validation outcomes * `council`: planned roles, elected members, quorum cert, tooling status * `environment`: allocations, provisioned assets, health * `execution`: plan lock, active beat/window, evidence pointers, approvals * `control`: paused/degraded/recovery flags * `policy`: quarantine status, rationale * `epoch`: monotone counter per region * `hlc_last`: last applied HLC * **WAL Record (append-only):** * `(state_pre_hash, transition_name, inputs_hash, signer, idem_key, hlc, window_id, guard_outcomes, evidence[], timestamp, state_post_hash)` * **Snapshots:** Full state plus `lastAppliedIndex` and `lastAppliedHLC`. All artifacts get content-addressed; UCXL addresses are stored, never dereferenced synchronously inside the reducer. --- ## 8) Persistence & Recovery * **Store:** Badger/RocksDB (local) with periodic snapshot to ZFS. * **Recovery:** Load latest snapshot, replay WAL from `lastAppliedIndex`. * **Cross-node:** Either use a tiny replication group for WAL (Raft) or single-writer with followers tailing WAL via HMMM+AGE channel. Determinism is preserved by HLC+idem rules. --- ## 9) Security * **Proposals:** Must be **HMMM-signed** by an authorized identity. * **Secrets:** AGE-encrypted configs; never log plaintext. * **License:** All network activity is blocked until `LICENSE_CHECK` passes; grace window configurable. * **Audit:** Each accepted transition emits a **BUBBLE DR** containing: guard outcomes, signer, hashes, UCXL backlinks, and justification. --- ## 10) Observability * **Metrics (Prometheus):** * `swoosh_transition_total{transition}` * `swoosh_guard_fail_total{guard}` * `swoosh_state_advance_latency_seconds{region}` * `swoosh_retry_count{transition}` * `swoosh_quarantine_total` * `swoosh_recovery_time_seconds` * `swoosh_wal_lag_records` * **Tracing:** One span per accepted transition; include guard reasons & evidence refs. * **Logs:** Structured JSON lines. Always include `state_pre_hash`, `state_post_hash`, `transition`, `hlc`, `window_id`. --- ## 11) Failure Modes & Compensations * **Split-brain council:** Prevent by requiring quorum cert; reject if member set hash mismatch. * **Index swap mid-flight:** Pin ingestion `READY` by **content hash**; any new index becomes a *new version* requiring explicit plan change. * **Duplicate deliveries:** Idem keys short-circuit repeats; reducer is **idempotent**. * **License throttling:** Enter `DEGRADED` grace substate; stop external comms at grace expiry. * **Provisioning partials:** Compensation transitions `DEPROVISION`, `EJECT/RE-ELECT`, `ROLLBACK_INDEX` are first-class. --- ## 12) Test Strategy (must-pass) * **Property / Model-based testing:** Feed the same transition stream to (a) in-proc reducer and (b) full process with WAL. Compare state hashes every N steps. * **Chaos suite (automated):** * Duplicate, reorder, and drop transitions (1–10%). * Random orchestrator restarts and forced snapshot rollback. * License server latency/denial. * MCP registry flaps, env capacity churn. * **Invariants (assert after each transition):** * HLC monotonicity (no time travel). * Exactly-once epoch advance per region. * `EXECUTION` entry requires **all READY** prerequisites with matching content/version hashes. * Quorum certificate matches elected set hash. --- ## 13) Migration & Rollout 1. **Shadow mode:** Run SWOOSH alongside WHOOSH; mirror event inputs; produce **shadow state** and hash; compare to legacy expectations. 2. **Idempotency headers:** Add `idem_key`, `hlc`, `window_id` to all upstream producers. 3. **Feature flag:** Route only council formation through SWOOSH first; then ingestion; then execution loop. 4. **Cut-over:** Disable WHOOSH mutation path; SWOOSH becomes single source of truth. Keep bypass flag for emergency rollback. 5. **Post-cut soak:** Extended chaos-monkey run with SLA/SLO monitors enabled. --- ## 14) Deliverables * **Statechart spec** (machine-readable YAML) + **explainer** doc. * **Reducer library** (Go) with: * Transition registry, guard adaptor interfaces, compensation hooks. * WAL + snapshot adapters (Badger/RocksDB). * HLC utility. * **Transition API** (OpenAPI + server endpoints) and client SDK. * **Guard providers:** BACKBEAT/tidal, SHHH policy, HMMM quorum, MCP health, KACHING license. * **Observability pack:** metrics, logs, traces; Grafana dashboards. * **Chaos harness:** scripts + container recipes to reproduce failure modes. * **Runbooks:** recovery, snapshot restore, consensus health triage. * **BUBBLE emitters:** DR templates and UCXL backlink policy. --- ## 15) Acceptance Criteria * Deterministic replay = identical `state_post_hash` across 3+ replicas after a randomized 10k-transition run. * Zero invariant violations under full chaos suite runs. * `REVIEW` and `REVERB` transitions only occur in allowed BACKBEAT/tidal windows (verified by guard logs). * All accepted transitions produce BUBBLE DRs with complete guard rationales and UCXL links. * Seamless drop-in: existing callers function without code changes beyond the new idempotency headers. --- ## 16) Implementation Notes (for the team) * **Language:** Go preferred (fits the stack), but keep reducer pure and portable. * **Packages:** * Storage: Badger/RocksDB binding * Consensus: built-in Raft (if used) scoped to orchestrator metadata only * OpenAPI → chi/echo/gin (team choice) * HLC: local utility (simple logical clock with walltime) * **Codegen:** Optional: generate reducer scaffolding from the YAML statechart to ensure switch-exhaustiveness and consistent metrics hooks. * **Performance:** Batch WAL fsyncs with careful durability trade-offs (configurable); snapshot on REVERB or every N transitions. --- ## 17) Risks & Mitigations * **Consensus complexity creep:** Limit Raft scope to critical decisions or rely on quorum certificates validated by guards; keep a single writer during bootstrap. * **Guard latency (policy, MCP checks):** All guards are *pure* and should consult cached, signed facts; long-running probes run out-of-band and materialize as inputs/evidence hashes. * **Operator burden:** Provide clear runbooks and dashboards; surface reasons on every rejection. --- ## 18) Open Questions (to resolve before code freeze) 1. Do we **require** Raft for WAL replication at v1, or accept single-writer + follower tailing? 2. What is the **minimum quorum size** and election algorithm for council in heterogeneous clusters? 3. Exact **tidal window schedule** schema (amplitude/phase controls) to codify in BACKBEAT provider. 4. DR redaction policy: which fields are included/excluded in BUBBLE for sensitive ops? --- ## 19) Appendix ### A) Example Mermaid (conceptual) ```mermaid stateDiagram-v2 [*] --> UNINITIALIZED UNINITIALIZED --> LICENSE_CHECK : config_ok LICENSE_CHECK --> PROJECT_LIFECYCLE : licensed state PROJECT_LIFECYCLE { state Ingestion { [*] --> DISCOVER DISCOVER --> FETCH : sources_resolved FETCH --> VALIDATE : bytes_ok VALIDATE --> INDEX : schema_ok VALIDATE --> QUARANTINED : policy_violation INDEX --> READY : corpus_built } -- state Council { [*] --> PLAN_ROLES PLAN_ROLES --> ELECT : profiles_loaded ELECT --> TOOLING_SYNC : quorum_cert TOOLING_SYNC --> READY : mcp_green } -- state Environment { [*] --> ALLOCATE ALLOCATE --> PROVISION : capacity_ok PROVISION --> HEALTHCHECK : installed HEALTHCHECK --> READY : green HEALTHCHECK --> DEGRADED : amber } } PROJECT_LIFECYCLE --> EXECUTION : all_ready state EXECUTION { [*] --> PLAN PLAN --> WORK : plan_locked WORK --> REVIEW : beat_review_gate REVIEW --> REVERB : approvals_threshold REVERB --> PLAN : next_window } EXECUTION --> RECOVERY : quorum_lost QUARANTINED --> TERMINATED : confirmed_block ``` ### B) Minimal OpenAPI (sketch) * `POST /transition` with fields shown in §5 * `GET /state?projection=…` returns `{hash, projection}` --- ## Call to Action / Next Steps * `statechart.yaml`, reducer skeleton, WAL adapter, OpenAPI server skeleton, * guard provider interfaces (BACKBEAT, SHHH, HMMM, MCP, KACHING), * chaos harness scaffold and CI. * Then we iterate on the **Open Questions** and lock the v1 guard contracts.
Sign in to join this conversation.
No description provided.