Address WHOOSH issue #7 with comprehensive scaling optimizations to prevent license server, bootstrap peer, and control plane collapse during fast scale-out. HIGH-RISK FIXES (Must-Do): ✅ License gate already implemented with cache + circuit breaker + grace window ✅ mDNS disabled in container environments (CHORUS_MDNS_ENABLED=false) ✅ Connection rate limiting (5 dials/sec, 16 concurrent DHT queries) ✅ Connection manager with watermarks (32 low, 128 high) ✅ AutoNAT enabled for container networking MEDIUM-RISK FIXES (Next Priority): ✅ Assignment merge layer with HTTP/file config + SIGHUP reload ✅ Runtime configuration system with WHOOSH assignment API support ✅ Election stability windows to prevent churn: - CHORUS_ELECTION_MIN_TERM=30s (minimum time between elections) - CHORUS_LEADER_MIN_TERM=45s (minimum time before challenging healthy leader) ✅ Bootstrap pool JSON support with priority sorting and join stagger NEW FEATURES: - Runtime config system with assignment overrides from WHOOSH - SIGHUP reload handler for live configuration updates - JSON bootstrap configuration with peer metadata (region, roles, priority) - Configurable election stability windows with environment variables - Multi-format bootstrap support: Assignment → JSON → CSV FILES MODIFIED: - pkg/config/assignment.go (NEW): Runtime assignment merge system - docker/bootstrap.json (NEW): Example JSON bootstrap configuration - pkg/election/election.go: Added stability windows and churn prevention - internal/runtime/shared.go: Integrated assignment loading and conditional mDNS - p2p/node.go: Added connection management and rate limiting - pkg/config/hybrid_config.go: Added rate limiting configuration fields - docker/docker-compose.yml: Updated environment variables and configs - README.md: Updated status table with scaling milestone This implementation enables wave-based autoscaling without system collapse, addressing all scaling concerns from WHOOSH issue #7. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
4.5 KiB
CHORUS – Container-First Context Platform (Alpha)
CHORUS is the runtime that ties the CHORUS ecosystem together: libp2p mesh, DHT-backed storage, council/task coordination, and (eventually) SLURP contextual intelligence. The repository you are looking at is the in-progress container-first refactor. Several core systems boot today, but higher-level services (SLURP, SHHH, full HMMM routing) are still landing.
Current Status
| Area | Status | Notes |
|---|---|---|
| libp2p node + PubSub | ✅ Running | internal/runtime/shared.go spins up the mesh, hypercore logging, availability broadcasts. |
| DHT + DecisionPublisher | ✅ Running | Encrypted storage wired through pkg/dht; decisions written via ucxl.DecisionPublisher. |
| Leader Election System | ✅ FULLY FUNCTIONAL | 🎉 MILESTONE: Complete admin election with consensus, discovery protocol, heartbeats, and SLURP activation! |
| SLURP (context intelligence) | 🚧 Stubbed | pkg/slurp/slurp.go contains TODOs for resolver, temporal graphs, intelligence. Leader integration scaffolding exists but uses placeholder IDs/request forwarding. |
| SHHH (secrets sentinel) | 🚧 Sentinel live | pkg/shhh redacts hypercore + PubSub payloads with audit + metrics hooks (policy replay TBD). |
| HMMM routing | 🚧 Partial | PubSub topics join, but capability/role announcements and HMMM router wiring are placeholders (internal/runtime/agent_support.go). |
See docs/progress/CHORUS-WHOOSH-development-plan.md for the detailed build plan and docs/progress/CHORUS-WHOOSH-roadmap.md for sequencing.
Quick Start (Alpha)
The container-first workflows are still evolving; expect frequent changes.
git clone https://gitea.chorus.services/tony/CHORUS.git
cd CHORUS
cp docker/chorus.env.example docker/chorus.env
# adjust env vars (KACHING license, bootstrap peers, etc.)
docker compose -f docker/docker-compose.yml up --build
You’ll get a single agent container with:
- libp2p networking (mDNS + configured bootstrap peers)
- election heartbeat
- DHT storage (AGE-encrypted)
- HTTP API + health endpoints
Missing today: SLURP context resolution, advanced SHHH policy replay, HMMM per-issue routing. Expect log warnings/TODOs for those paths.
🎉 Leader Election System (NEW!)
CHORUS now features a complete, production-ready leader election system:
Core Features
- Consensus-based election with weighted scoring (uptime, capabilities, resources)
- Admin discovery protocol for network-wide leader identification
- Heartbeat system with automatic failover (15-second intervals)
- Concurrent election prevention with randomized delays
- SLURP activation on elected admin nodes
How It Works
- Bootstrap: Nodes start in idle state, no admin known
- Discovery: Nodes send discovery requests to find existing admin
- Election trigger: If no admin found after grace period, trigger election
- Candidacy: Eligible nodes announce themselves with capability scores
- Consensus: Network selects winner based on highest score
- Leadership: Winner starts heartbeats, activates SLURP functionality
- Monitoring: Nodes continuously verify admin health via heartbeats
Debugging
Use these log patterns to monitor election health:
# Monitor WHOAMI messages and leader identification
docker service logs CHORUS_chorus | grep "🤖 WHOAMI\|👑\|📡.*Discovered"
# Track election cycles
docker service logs CHORUS_chorus | grep "🗳️\|📢.*candidacy\|🏆.*winner"
# Watch discovery protocol
docker service logs CHORUS_chorus | grep "📩\|📤\|📥"
Roadmap Highlights
- Security substrate – land SHHH sentinel, finish SLURP leader-only operations, validate COOEE enrolment (see roadmap Phase 1).
- Autonomous teams – coordinate with WHOOSH for deployment telemetry + SLURP context export.
- UCXL + KACHING – hook runtime telemetry into KACHING and enforce UCXL validator.
Track progress via the shared roadmap and weekly burndown dashboards.
Related Projects
- WHOOSH – council/team orchestration
- KACHING – telemetry/licensing
- SLURP – contextual intelligence prototypes
- HMMM – meta-discussion layer
Contributing
This repo is still alpha. Please coordinate via the roadmap tickets before landing changes. Major security/runtime decisions should include a Decision Record with a UCXL address so SLURP/BUBBLE can ingest it later.