anthonyrawlins
|
17673c38a6
|
fix: P2P connectivity regression + dynamic versioning system
## P2P Connectivity Fixes
- **Root Cause**: mDNS discovery was conditionally disabled in Task Execution Engine implementation
- **Solution**: Restored always-enabled mDNS discovery from working baseline (eb2e05f)
- **Result**: 9/9 Docker Swarm replicas with working P2P mesh, democratic elections, and leader consensus
## Dynamic Version System
- **Problem**: Hardcoded version "0.1.0-dev" in 1000+ builds made debugging impossible
- **Solution**: Implemented build-time version injection via ldflags
- **Features**: Shows commit hash, build date, and semantic version
- **Example**: `CHORUS-agent 0.5.5 (build: 9dbd361, 2025-09-26_05:55:55)`
## Container Compatibility
- **Issue**: Binary execution failed in Alpine due to glibc/musl incompatibility
- **Solution**: Added Ubuntu-based Dockerfile for proper glibc support
- **Benefit**: Reliable container execution across Docker Swarm nodes
## Key Changes
- `internal/runtime/shared.go`: Always enable mDNS discovery, dynamic version vars
- `cmd/agent/main.go`: Build-time version injection and display
- `p2p/node.go`: Restored working "🐝 Bzzz Node Status" logging format
- `Makefile`: Updated version to 0.5.5, proper ldflags configuration
- `Dockerfile.ubuntu`: New glibc-compatible container base
- `docker-compose.yml`: Updated to latest image tag for Watchtower auto-updates
## Verification
✅ P2P mesh connectivity: Peers exchanging availability broadcasts
✅ Democratic elections: Candidacy announcements and leader selection
✅ BACKBEAT integration: Beat synchronization and degraded mode handling
✅ Dynamic versioning: All containers show v0.5.5 with build metadata
✅ Task Execution Engine: All Phase 4 functionality preserved and working
Fixes P2P connectivity regression while preserving complete Task Execution Engine implementation.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-09-26 16:05:25 +10:00 |
|
anthonyrawlins
|
e523c4b543
|
feat: Implement CHORUS scaling improvements for robust autoscaling
Address WHOOSH issue #7 with comprehensive scaling optimizations to prevent
license server, bootstrap peer, and control plane collapse during fast scale-out.
HIGH-RISK FIXES (Must-Do):
✅ License gate already implemented with cache + circuit breaker + grace window
✅ mDNS disabled in container environments (CHORUS_MDNS_ENABLED=false)
✅ Connection rate limiting (5 dials/sec, 16 concurrent DHT queries)
✅ Connection manager with watermarks (32 low, 128 high)
✅ AutoNAT enabled for container networking
MEDIUM-RISK FIXES (Next Priority):
✅ Assignment merge layer with HTTP/file config + SIGHUP reload
✅ Runtime configuration system with WHOOSH assignment API support
✅ Election stability windows to prevent churn:
- CHORUS_ELECTION_MIN_TERM=30s (minimum time between elections)
- CHORUS_LEADER_MIN_TERM=45s (minimum time before challenging healthy leader)
✅ Bootstrap pool JSON support with priority sorting and join stagger
NEW FEATURES:
- Runtime config system with assignment overrides from WHOOSH
- SIGHUP reload handler for live configuration updates
- JSON bootstrap configuration with peer metadata (region, roles, priority)
- Configurable election stability windows with environment variables
- Multi-format bootstrap support: Assignment → JSON → CSV
FILES MODIFIED:
- pkg/config/assignment.go (NEW): Runtime assignment merge system
- docker/bootstrap.json (NEW): Example JSON bootstrap configuration
- pkg/election/election.go: Added stability windows and churn prevention
- internal/runtime/shared.go: Integrated assignment loading and conditional mDNS
- p2p/node.go: Added connection management and rate limiting
- pkg/config/hybrid_config.go: Added rate limiting configuration fields
- docker/docker-compose.yml: Updated environment variables and configs
- README.md: Updated status table with scaling milestone
This implementation enables wave-based autoscaling without system collapse,
addressing all scaling concerns from WHOOSH issue #7.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-09-23 17:50:40 +10:00 |
|