Address WHOOSH issue #7 with comprehensive scaling optimizations to prevent license server, bootstrap peer, and control plane collapse during fast scale-out. HIGH-RISK FIXES (Must-Do): ✅ License gate already implemented with cache + circuit breaker + grace window ✅ mDNS disabled in container environments (CHORUS_MDNS_ENABLED=false) ✅ Connection rate limiting (5 dials/sec, 16 concurrent DHT queries) ✅ Connection manager with watermarks (32 low, 128 high) ✅ AutoNAT enabled for container networking MEDIUM-RISK FIXES (Next Priority): ✅ Assignment merge layer with HTTP/file config + SIGHUP reload ✅ Runtime configuration system with WHOOSH assignment API support ✅ Election stability windows to prevent churn: - CHORUS_ELECTION_MIN_TERM=30s (minimum time between elections) - CHORUS_LEADER_MIN_TERM=45s (minimum time before challenging healthy leader) ✅ Bootstrap pool JSON support with priority sorting and join stagger NEW FEATURES: - Runtime config system with assignment overrides from WHOOSH - SIGHUP reload handler for live configuration updates - JSON bootstrap configuration with peer metadata (region, roles, priority) - Configurable election stability windows with environment variables - Multi-format bootstrap support: Assignment → JSON → CSV FILES MODIFIED: - pkg/config/assignment.go (NEW): Runtime assignment merge system - docker/bootstrap.json (NEW): Example JSON bootstrap configuration - pkg/election/election.go: Added stability windows and churn prevention - internal/runtime/shared.go: Integrated assignment loading and conditional mDNS - p2p/node.go: Added connection management and rate limiting - pkg/config/hybrid_config.go: Added rate limiting configuration fields - docker/docker-compose.yml: Updated environment variables and configs - README.md: Updated status table with scaling milestone This implementation enables wave-based autoscaling without system collapse, addressing all scaling concerns from WHOOSH issue #7. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
88 lines
4.5 KiB
Markdown
88 lines
4.5 KiB
Markdown
# CHORUS – Container-First Context Platform (Alpha)
|
||
|
||
CHORUS is the runtime that ties the CHORUS ecosystem together: libp2p mesh, DHT-backed storage, council/task coordination, and (eventually) SLURP contextual intelligence. The repository you are looking at is the in-progress container-first refactor. Several core systems boot today, but higher-level services (SLURP, SHHH, full HMMM routing) are still landing.
|
||
|
||
## Current Status
|
||
|
||
| Area | Status | Notes |
|
||
| --- | --- | --- |
|
||
| libp2p node + PubSub | ✅ Running | `internal/runtime/shared.go` spins up the mesh, hypercore logging, availability broadcasts. |
|
||
| DHT + DecisionPublisher | ✅ Running | Encrypted storage wired through `pkg/dht`; decisions written via `ucxl.DecisionPublisher`. |
|
||
| **Leader Election System** | ✅ **FULLY FUNCTIONAL** | **🎉 MILESTONE: Complete admin election with consensus, discovery protocol, heartbeats, and SLURP activation!** |
|
||
| SLURP (context intelligence) | 🚧 Stubbed | `pkg/slurp/slurp.go` contains TODOs for resolver, temporal graphs, intelligence. Leader integration scaffolding exists but uses placeholder IDs/request forwarding. |
|
||
| SHHH (secrets sentinel) | 🚧 Sentinel live | `pkg/shhh` redacts hypercore + PubSub payloads with audit + metrics hooks (policy replay TBD). |
|
||
| HMMM routing | 🚧 Partial | PubSub topics join, but capability/role announcements and HMMM router wiring are placeholders (`internal/runtime/agent_support.go`). |
|
||
|
||
See `docs/progress/CHORUS-WHOOSH-development-plan.md` for the detailed build plan and `docs/progress/CHORUS-WHOOSH-roadmap.md` for sequencing.
|
||
|
||
## Quick Start (Alpha)
|
||
|
||
The container-first workflows are still evolving; expect frequent changes.
|
||
|
||
```bash
|
||
git clone https://gitea.chorus.services/tony/CHORUS.git
|
||
cd CHORUS
|
||
cp docker/chorus.env.example docker/chorus.env
|
||
# adjust env vars (KACHING license, bootstrap peers, etc.)
|
||
docker compose -f docker/docker-compose.yml up --build
|
||
```
|
||
|
||
You’ll get a single agent container with:
|
||
- libp2p networking (mDNS + configured bootstrap peers)
|
||
- election heartbeat
|
||
- DHT storage (AGE-encrypted)
|
||
- HTTP API + health endpoints
|
||
|
||
**Missing today:** SLURP context resolution, advanced SHHH policy replay, HMMM per-issue routing. Expect log warnings/TODOs for those paths.
|
||
|
||
## 🎉 Leader Election System (NEW!)
|
||
|
||
CHORUS now features a complete, production-ready leader election system:
|
||
|
||
### Core Features
|
||
- **Consensus-based election** with weighted scoring (uptime, capabilities, resources)
|
||
- **Admin discovery protocol** for network-wide leader identification
|
||
- **Heartbeat system** with automatic failover (15-second intervals)
|
||
- **Concurrent election prevention** with randomized delays
|
||
- **SLURP activation** on elected admin nodes
|
||
|
||
### How It Works
|
||
1. **Bootstrap**: Nodes start in idle state, no admin known
|
||
2. **Discovery**: Nodes send discovery requests to find existing admin
|
||
3. **Election trigger**: If no admin found after grace period, trigger election
|
||
4. **Candidacy**: Eligible nodes announce themselves with capability scores
|
||
5. **Consensus**: Network selects winner based on highest score
|
||
6. **Leadership**: Winner starts heartbeats, activates SLURP functionality
|
||
7. **Monitoring**: Nodes continuously verify admin health via heartbeats
|
||
|
||
### Debugging
|
||
Use these log patterns to monitor election health:
|
||
```bash
|
||
# Monitor WHOAMI messages and leader identification
|
||
docker service logs CHORUS_chorus | grep "🤖 WHOAMI\|👑\|📡.*Discovered"
|
||
|
||
# Track election cycles
|
||
docker service logs CHORUS_chorus | grep "🗳️\|📢.*candidacy\|🏆.*winner"
|
||
|
||
# Watch discovery protocol
|
||
docker service logs CHORUS_chorus | grep "📩\|📤\|📥"
|
||
```
|
||
|
||
## Roadmap Highlights
|
||
|
||
1. **Security substrate** – land SHHH sentinel, finish SLURP leader-only operations, validate COOEE enrolment (see roadmap Phase 1).
|
||
2. **Autonomous teams** – coordinate with WHOOSH for deployment telemetry + SLURP context export.
|
||
3. **UCXL + KACHING** – hook runtime telemetry into KACHING and enforce UCXL validator.
|
||
|
||
Track progress via the shared roadmap and weekly burndown dashboards.
|
||
|
||
## Related Projects
|
||
- [WHOOSH](https://gitea.chorus.services/tony/WHOOSH) – council/team orchestration
|
||
- [KACHING](https://gitea.chorus.services/tony/KACHING) – telemetry/licensing
|
||
- [SLURP](https://gitea.chorus.services/tony/SLURP) – contextual intelligence prototypes
|
||
- [HMMM](https://gitea.chorus.services/tony/hmmm) – meta-discussion layer
|
||
|
||
## Contributing
|
||
|
||
This repo is still alpha. Please coordinate via the roadmap tickets before landing changes. Major security/runtime decisions should include a Decision Record with a UCXL address so SLURP/BUBBLE can ingest it later.
|