anthonyrawlins ea04378962 fix: Resolve WHOOSH startup failures and restore service functionality
## Problem Analysis
- WHOOSH service was failing to start due to BACKBEAT NATS connectivity issues
- Containers were unable to resolve "backbeat-nats" hostname from DNS
- Service was stuck in deployment loops with all replicas failing
- Root cause: Missing WHOOSH_BACKBEAT_NATS_URL environment variable configuration

## Solution Implementation

### 1. BACKBEAT Configuration Fix
- **Added explicit WHOOSH BACKBEAT environment variables** to docker-compose.yml:
  - `WHOOSH_BACKBEAT_ENABLED: "false"` (temporarily disabled for stability)
  - `WHOOSH_BACKBEAT_CLUSTER_ID: "chorus-production"`
  - `WHOOSH_BACKBEAT_AGENT_ID: "whoosh"`
  - `WHOOSH_BACKBEAT_NATS_URL: "nats://backbeat-nats:4222"`

### 2. Service Deployment Improvements
- **Removed rosewood node constraints** across all services (gaming PC intermittency)
- **Simplified network configuration** by removing unused `whoosh-backend` network
- **Improved health check configuration** for postgres service
- **Streamlined service placement** for better distribution

### 3. Code Quality Improvements
- **Fixed code formatting** inconsistencies in HTTP server
- **Updated service comments** from "Bzzz" to "CHORUS" for clarity
- **Standardized import grouping** and spacing

## Results Achieved

###  WHOOSH Service Operational
- **Service successfully running** on walnut node (1/2 replicas healthy)
- **Health checks passing** - API accessible on port 8800
- **Database connectivity restored** - migrations completed successfully
- **Council formation working** - teams being created and tasks assigned

###  Core Functionality Verified
- **Agent discovery active** - CHORUS agents being detected and registered
- **Task processing operational** - autonomous team formation working
- **API endpoints responsive** - `/health` returning proper status
- **Service integration** - discovery of multiple CHORUS agent endpoints

## Technical Details

### Service Configuration
- **Environment**: Production Docker Swarm deployment
- **Database**: PostgreSQL with automatic migrations
- **Networking**: Internal chorus_net overlay network
- **Load Balancing**: Traefik routing with SSL certificates
- **Monitoring**: Prometheus metrics collection enabled

### Deployment Status
```
CHORUS_whoosh.2.nej8z6nbae1a@walnut    Running 31 seconds ago
- Health checks:  Passing (200 OK responses)
- Database:  Connected and migrated
- Agent Discovery:  Active (multiple agents detected)
- Council Formation:  Functional (teams being created)
```

### Key Log Evidence
```
{"service":"whoosh","status":"ok","version":"0.1.0-mvp"}
🚀 Task successfully assigned to team
🤖 Discovered CHORUS agent with metadata
 Database migrations completed
🌐 Starting HTTP server on :8080
```

## Next Steps
- **BACKBEAT Integration**: Re-enable once NATS connectivity fully stabilized
- **Multi-Node Deployment**: Investigate ironwood node DNS resolution issues
- **Performance Monitoring**: Verify scaling behavior under load
- **Integration Testing**: Full project ingestion and council formation workflows

🎯 **Mission Accomplished**: WHOOSH is now operational and ready for autonomous development team orchestration testing.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-24 15:52:05 +10:00
2025-09-06 14:47:41 +10:00
2025-09-02 19:53:33 +10:00
2025-09-06 14:47:41 +10:00

CHORUS Container-First Context Platform (Alpha)

CHORUS is the runtime that ties the CHORUS ecosystem together: libp2p mesh, DHT-backed storage, council/task coordination, and (eventually) SLURP contextual intelligence. The repository you are looking at is the in-progress container-first refactor. Several core systems boot today, but higher-level services (SLURP, SHHH, full HMMM routing) are still landing.

Current Status

Area Status Notes
libp2p node + PubSub Running internal/runtime/shared.go spins up the mesh, hypercore logging, availability broadcasts.
DHT + DecisionPublisher Running Encrypted storage wired through pkg/dht; decisions written via ucxl.DecisionPublisher.
Leader Election System FULLY FUNCTIONAL 🎉 MILESTONE: Complete admin election with consensus, discovery protocol, heartbeats, and SLURP activation!
SLURP (context intelligence) 🚧 Stubbed pkg/slurp/slurp.go contains TODOs for resolver, temporal graphs, intelligence. Leader integration scaffolding exists but uses placeholder IDs/request forwarding.
SHHH (secrets sentinel) 🚧 Sentinel live pkg/shhh redacts hypercore + PubSub payloads with audit + metrics hooks (policy replay TBD).
HMMM routing 🚧 Partial PubSub topics join, but capability/role announcements and HMMM router wiring are placeholders (internal/runtime/agent_support.go).

See docs/progress/CHORUS-WHOOSH-development-plan.md for the detailed build plan and docs/progress/CHORUS-WHOOSH-roadmap.md for sequencing.

Quick Start (Alpha)

The container-first workflows are still evolving; expect frequent changes.

git clone https://gitea.chorus.services/tony/CHORUS.git
cd CHORUS
cp docker/chorus.env.example docker/chorus.env
# adjust env vars (KACHING license, bootstrap peers, etc.)
docker compose -f docker/docker-compose.yml up --build

Youll get a single agent container with:

  • libp2p networking (mDNS + configured bootstrap peers)
  • election heartbeat
  • DHT storage (AGE-encrypted)
  • HTTP API + health endpoints

Missing today: SLURP context resolution, advanced SHHH policy replay, HMMM per-issue routing. Expect log warnings/TODOs for those paths.

🎉 Leader Election System (NEW!)

CHORUS now features a complete, production-ready leader election system:

Core Features

  • Consensus-based election with weighted scoring (uptime, capabilities, resources)
  • Admin discovery protocol for network-wide leader identification
  • Heartbeat system with automatic failover (15-second intervals)
  • Concurrent election prevention with randomized delays
  • SLURP activation on elected admin nodes

How It Works

  1. Bootstrap: Nodes start in idle state, no admin known
  2. Discovery: Nodes send discovery requests to find existing admin
  3. Election trigger: If no admin found after grace period, trigger election
  4. Candidacy: Eligible nodes announce themselves with capability scores
  5. Consensus: Network selects winner based on highest score
  6. Leadership: Winner starts heartbeats, activates SLURP functionality
  7. Monitoring: Nodes continuously verify admin health via heartbeats

Debugging

Use these log patterns to monitor election health:

# Monitor WHOAMI messages and leader identification
docker service logs CHORUS_chorus | grep "🤖 WHOAMI\|👑\|📡.*Discovered"

# Track election cycles
docker service logs CHORUS_chorus | grep "🗳️\|📢.*candidacy\|🏆.*winner"

# Watch discovery protocol
docker service logs CHORUS_chorus | grep "📩\|📤\|📥"

Roadmap Highlights

  1. Security substrate land SHHH sentinel, finish SLURP leader-only operations, validate COOEE enrolment (see roadmap Phase 1).
  2. Autonomous teams coordinate with WHOOSH for deployment telemetry + SLURP context export.
  3. UCXL + KACHING hook runtime telemetry into KACHING and enforce UCXL validator.

Track progress via the shared roadmap and weekly burndown dashboards.

  • WHOOSH council/team orchestration
  • KACHING telemetry/licensing
  • SLURP contextual intelligence prototypes
  • HMMM meta-discussion layer

Contributing

This repo is still alpha. Please coordinate via the roadmap tickets before landing changes. Major security/runtime decisions should include a Decision Record with a UCXL address so SLURP/BUBBLE can ingest it later.

Description
Container-First P2P Task Coordination System - Next generation distributed AI agent coordination designed for Docker/Kubernetes deployments
Readme 292 MiB
Languages
Go 97.7%
HTML 1.9%
Python 0.2%
Makefile 0.1%