Commit Graph

4 Commits

Author SHA1 Message Date
anthonyrawlins
17673c38a6 fix: P2P connectivity regression + dynamic versioning system
## P2P Connectivity Fixes
- **Root Cause**: mDNS discovery was conditionally disabled in Task Execution Engine implementation
- **Solution**: Restored always-enabled mDNS discovery from working baseline (eb2e05f)
- **Result**: 9/9 Docker Swarm replicas with working P2P mesh, democratic elections, and leader consensus

## Dynamic Version System
- **Problem**: Hardcoded version "0.1.0-dev" in 1000+ builds made debugging impossible
- **Solution**: Implemented build-time version injection via ldflags
- **Features**: Shows commit hash, build date, and semantic version
- **Example**: `CHORUS-agent 0.5.5 (build: 9dbd361, 2025-09-26_05:55:55)`

## Container Compatibility
- **Issue**: Binary execution failed in Alpine due to glibc/musl incompatibility
- **Solution**: Added Ubuntu-based Dockerfile for proper glibc support
- **Benefit**: Reliable container execution across Docker Swarm nodes

## Key Changes
- `internal/runtime/shared.go`: Always enable mDNS discovery, dynamic version vars
- `cmd/agent/main.go`: Build-time version injection and display
- `p2p/node.go`: Restored working "🐝 Bzzz Node Status" logging format
- `Makefile`: Updated version to 0.5.5, proper ldflags configuration
- `Dockerfile.ubuntu`: New glibc-compatible container base
- `docker-compose.yml`: Updated to latest image tag for Watchtower auto-updates

## Verification
 P2P mesh connectivity: Peers exchanging availability broadcasts
 Democratic elections: Candidacy announcements and leader selection
 BACKBEAT integration: Beat synchronization and degraded mode handling
 Dynamic versioning: All containers show v0.5.5 with build metadata
 Task Execution Engine: All Phase 4 functionality preserved and working

Fixes P2P connectivity regression while preserving complete Task Execution Engine implementation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-26 16:05:25 +10:00
anthonyrawlins
e523c4b543 feat: Implement CHORUS scaling improvements for robust autoscaling
Address WHOOSH issue #7 with comprehensive scaling optimizations to prevent
license server, bootstrap peer, and control plane collapse during fast scale-out.

HIGH-RISK FIXES (Must-Do):
 License gate already implemented with cache + circuit breaker + grace window
 mDNS disabled in container environments (CHORUS_MDNS_ENABLED=false)
 Connection rate limiting (5 dials/sec, 16 concurrent DHT queries)
 Connection manager with watermarks (32 low, 128 high)
 AutoNAT enabled for container networking

MEDIUM-RISK FIXES (Next Priority):
 Assignment merge layer with HTTP/file config + SIGHUP reload
 Runtime configuration system with WHOOSH assignment API support
 Election stability windows to prevent churn:
  - CHORUS_ELECTION_MIN_TERM=30s (minimum time between elections)
  - CHORUS_LEADER_MIN_TERM=45s (minimum time before challenging healthy leader)
 Bootstrap pool JSON support with priority sorting and join stagger

NEW FEATURES:
- Runtime config system with assignment overrides from WHOOSH
- SIGHUP reload handler for live configuration updates
- JSON bootstrap configuration with peer metadata (region, roles, priority)
- Configurable election stability windows with environment variables
- Multi-format bootstrap support: Assignment → JSON → CSV

FILES MODIFIED:
- pkg/config/assignment.go (NEW): Runtime assignment merge system
- docker/bootstrap.json (NEW): Example JSON bootstrap configuration
- pkg/election/election.go: Added stability windows and churn prevention
- internal/runtime/shared.go: Integrated assignment loading and conditional mDNS
- p2p/node.go: Added connection management and rate limiting
- pkg/config/hybrid_config.go: Added rate limiting configuration fields
- docker/docker-compose.yml: Updated environment variables and configs
- README.md: Updated status table with scaling milestone

This implementation enables wave-based autoscaling without system collapse,
addressing all scaling concerns from WHOOSH issue #7.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-23 17:50:40 +10:00
anthonyrawlins
1bb736c09a Harden CHORUS security and messaging stack 2025-09-20 23:21:35 +10:00
anthonyrawlins
1806a4fe09 feat(prompts): load system prompts and defaults from Docker volume; set runtime system prompt; add BACKBEAT standards 2025-09-06 15:42:41 +10:00