anthonyrawlins c99def17d7 Add deployment infrastructure and documentation
This commit adds complete deployment infrastructure for the Sequential Thinking
Age-Encrypted Wrapper, ready for Docker Swarm production deployment.

## Deliverables

### 1. Docker Swarm Service Definition
**File**: `deploy/seqthink/docker-compose.swarm.yml`

**Features**:
- 3 replicas with automatic spreading across worker nodes
- Resource limits: 1 CPU / 512MB RAM per replica
- Resource reservations: 0.5 CPU / 256MB RAM per replica
- Rolling updates with automatic rollback
- Health checks every 30 seconds
- Traefik integration with automatic HTTPS
- Load balancer with health checking
- Docker Secrets integration for age keys
- Comprehensive logging configuration

**Environment Variables**:
- `LOG_LEVEL`: Logging verbosity
- `MCP_LOCAL`: MCP server URL (loopback only)
- `PORT`: HTTP server port (8443)
- `MAX_BODY_MB`: Request size limit
- `AGE_IDENT_PATH`: Age private key path
- `AGE_RECIPS_PATH`: Age public key path
- `KACHING_JWKS_URL`: KACHING JWKS endpoint
- `REQUIRED_SCOPE`: Required JWT scope

**Secrets**:
- `seqthink_age_identity`: Age private key (mounted at /run/secrets/)
- `seqthink_age_recipients`: Age public key (mounted at /run/secrets/)

**Network**:
- `chorus-overlay`: External overlay network for service mesh

**Labels**:
- Traefik routing: `seqthink.chorus.services`
- HTTPS with Let's Encrypt
- Health check path: `/health`
- Load balancer port: 8443

### 2. Deployment Documentation
**File**: `deploy/seqthink/DEPLOYMENT.md` (500+ lines)

**Sections**:
1. **Prerequisites**: Cluster setup, network requirements
2. **Architecture**: Security layers diagram
3. **Step-by-step deployment**:
   - Generate age keys
   - Create Docker secrets
   - Build Docker image
   - Deploy to swarm
   - Verify deployment
   - Test with JWT tokens
4. **Configuration reference**: All environment variables documented
5. **Scaling**: Horizontal scaling commands
6. **Updates**: Rolling update procedures
7. **Rollback**: Automatic and manual rollback
8. **Monitoring**: Prometheus metrics, health checks, logs
9. **Troubleshooting**: Common issues and solutions
10. **Security considerations**: Key rotation, TLS, rate limiting
11. **Development mode**: Testing without security
12. **Production checklist**: Pre-deployment verification

### 3. End-to-End Test Script
**File**: `deploy/seqthink/test-e2e.sh` (executable)

**Tests**:
1. Health endpoint validation
2. Readiness endpoint validation
3. Metrics endpoint verification
4. Unauthorized request rejection (401)
5. Invalid authorization header rejection (401)
6. JWT token validation (if token provided)
7. Encrypted request/response (if age keys provided)
8. Content-Type validation (415 for wrong type)
9. Metrics collection verification
10. SSE endpoint availability

**Usage**:
```bash
# Basic tests (no auth)
./deploy/seqthink/test-e2e.sh

# With JWT token
export JWT_TOKEN="eyJhbGci..."
./deploy/seqthink/test-e2e.sh

# With JWT + encryption
export JWT_TOKEN="eyJhbGci..."
export AGE_RECIPIENT="$(cat seqthink_age.pub)"
export AGE_IDENTITY="seqthink_age.key"
./deploy/seqthink/test-e2e.sh
```

**Output**:
- Color-coded test results (✓ pass, ✗ fail, ⚠ warn)
- Test summary with counts
- Exit code 0 if all pass, 1 if any fail

### 4. Secrets Management Guide
**File**: `deploy/seqthink/SECRETS.md` (400+ lines)

**Topics**:
1. **Secret types**: Age keys, KACHING config
2. **Key generation**:
   - Method 1: Using age-keygen
   - Method 2: Using Go code
3. **Storing secrets**: Docker Swarm secret commands
4. **Using secrets**: Compose file configuration
5. **Key rotation**:
   - Why rotate
   - 5-step rotation process
   - Zero-downtime rotation
6. **Backup and recovery**:
   - Secure backup procedures
   - Age-encrypted backups
   - Recovery process
7. **Security best practices**:
   - Key generation ✓/✗ guidelines
   - Key storage ✓/✗ guidelines
   - Key distribution ✓/✗ guidelines
   - Key lifecycle ✓/✗ guidelines
8. **Troubleshooting**: Common secret issues
9. **Client-side key management**: Distributing public keys
10. **Compliance and auditing**: SOC 2, ISO 27001
11. **Emergency procedures**: Key compromise response

## Deployment Flow

### Initial Deployment
```bash
# 1. Generate keys
age-keygen -o seqthink_age.key
age-keygen -y seqthink_age.key > seqthink_age.pub

# 2. Create secrets
docker secret create seqthink_age_identity seqthink_age.key
docker secret create seqthink_age_recipients seqthink_age.pub

# 3. Build image
docker build -f deploy/seqthink/Dockerfile -t anthonyrawlins/seqthink-wrapper:latest .
docker push anthonyrawlins/seqthink-wrapper:latest

# 4. Deploy
docker stack deploy -c deploy/seqthink/docker-compose.swarm.yml seqthink

# 5. Verify
docker service ps seqthink_seqthink-wrapper
docker service logs seqthink_seqthink-wrapper

# 6. Test
./deploy/seqthink/test-e2e.sh
```

### Update Deployment
```bash
# Build new version
docker build -f deploy/seqthink/Dockerfile -t anthonyrawlins/seqthink-wrapper:0.2.0 .
docker push anthonyrawlins/seqthink-wrapper:0.2.0

# Rolling update
docker service update \
  --image anthonyrawlins/seqthink-wrapper:0.2.0 \
  seqthink_seqthink-wrapper
```

## Service Architecture

```
Internet
    ↓
Traefik (HTTPS + Load Balancing)
    ↓
seqthink.chorus.services
    ↓
Docker Swarm Overlay Network
    ↓
SeqThink Wrapper (3 replicas)
    ├─ JWT Validation (KACHING)
    ├─ Age Decryption
    ├─ MCP Server (loopback)
    ├─ Age Encryption
    └─ Response
```

## Security Layers

1. **Transport Security**: TLS 1.3 via Traefik
2. **Authentication**: JWT signature verification (RS256)
3. **Authorization**: Scope-based access control
4. **Encryption**: Age end-to-end encryption
5. **Network Isolation**: MCP server on loopback only
6. **Secrets Management**: Docker Swarm secrets (tmpfs)
7. **Resource Limits**: Container resource constraints

## Monitoring Integration

**Prometheus Metrics** (`/metrics`):
- `seqthink_requests_total`: Total requests
- `seqthink_errors_total`: Total errors
- `seqthink_decrypt_failures_total`: Decryption failures
- `seqthink_encrypt_failures_total`: Encryption failures
- `seqthink_policy_denials_total`: Authorization denials
- `seqthink_request_duration_seconds`: Request latency

**Health Checks**:
- `/health`: Liveness probe (wrapper running)
- `/ready`: Readiness probe (MCP server ready)

**Logging**:
- JSON format via docker logs
- 10MB max size, 3 file rotation
- Centralized log aggregation ready

## Production Readiness

 **High Availability**: 3 replicas with auto-restart
 **Zero-Downtime Updates**: Rolling updates with health checks
 **Automatic Rollback**: On update failure
 **Resource Management**: CPU/memory limits and reservations
 **Security**: Multi-layer defense (TLS, JWT, Age, Secrets)
 **Monitoring**: Metrics, health checks, structured logs
 **Documentation**: Complete deployment and operations guides
 **Testing**: Automated E2E test suite
 **Secrets Management**: Docker Swarm secrets with rotation procedures

## Next Steps

1. Test deployment on staging environment
2. Generate production age keys
3. Configure KACHING JWT integration
4. Deploy to production cluster
5. Monitor metrics and logs
6. Load testing and performance validation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 08:53:36 +11:00
2025-09-02 19:53:33 +10:00
2025-09-06 14:47:41 +10:00

CHORUS Container-First Context Platform (Alpha)

CHORUS is the runtime that ties the CHORUS ecosystem together: libp2p mesh, DHT-backed storage, council/task coordination, and (eventually) SLURP contextual intelligence. The repository you are looking at is the in-progress container-first refactor. Several core systems boot today, but higher-level services (SLURP, SHHH, full HMMM routing) are still landing.

Current Status

Area Status Notes
libp2p node + PubSub Running internal/runtime/shared.go spins up the mesh, hypercore logging, availability broadcasts.
DHT + DecisionPublisher Running Encrypted storage wired through pkg/dht; decisions written via ucxl.DecisionPublisher.
Leader Election System FULLY FUNCTIONAL 🎉 MILESTONE: Complete admin election with consensus, discovery protocol, heartbeats, and SLURP activation!
SLURP (context intelligence) 🚧 Stubbed pkg/slurp/slurp.go contains TODOs for resolver, temporal graphs, intelligence. Leader integration scaffolding exists but uses placeholder IDs/request forwarding.
SHHH (secrets sentinel) 🚧 Sentinel live pkg/shhh redacts hypercore + PubSub payloads with audit + metrics hooks (policy replay TBD).
HMMM routing 🚧 Partial PubSub topics join, but capability/role announcements and HMMM router wiring are placeholders (internal/runtime/agent_support.go).

See docs/progress/CHORUS-WHOOSH-development-plan.md for the detailed build plan and docs/progress/CHORUS-WHOOSH-roadmap.md for sequencing.

Quick Start (Alpha)

The container-first workflows are still evolving; expect frequent changes.

git clone https://gitea.chorus.services/tony/CHORUS.git
cd CHORUS
cp docker/chorus.env.example docker/chorus.env
# adjust env vars (KACHING license, bootstrap peers, etc.)
docker compose -f docker/docker-compose.yml up --build

Youll get a single agent container with:

  • libp2p networking (mDNS + configured bootstrap peers)
  • election heartbeat
  • DHT storage (AGE-encrypted)
  • HTTP API + health endpoints

Missing today: SLURP context resolution, advanced SHHH policy replay, HMMM per-issue routing. Expect log warnings/TODOs for those paths.

🎉 Leader Election System (NEW!)

CHORUS now features a complete, production-ready leader election system:

Core Features

  • Consensus-based election with weighted scoring (uptime, capabilities, resources)
  • Admin discovery protocol for network-wide leader identification
  • Heartbeat system with automatic failover (15-second intervals)
  • Concurrent election prevention with randomized delays
  • SLURP activation on elected admin nodes

How It Works

  1. Bootstrap: Nodes start in idle state, no admin known
  2. Discovery: Nodes send discovery requests to find existing admin
  3. Election trigger: If no admin found after grace period, trigger election
  4. Candidacy: Eligible nodes announce themselves with capability scores
  5. Consensus: Network selects winner based on highest score
  6. Leadership: Winner starts heartbeats, activates SLURP functionality
  7. Monitoring: Nodes continuously verify admin health via heartbeats

Debugging

Use these log patterns to monitor election health:

# Monitor WHOAMI messages and leader identification
docker service logs CHORUS_chorus | grep "🤖 WHOAMI\|👑\|📡.*Discovered"

# Track election cycles
docker service logs CHORUS_chorus | grep "🗳️\|📢.*candidacy\|🏆.*winner"

# Watch discovery protocol
docker service logs CHORUS_chorus | grep "📩\|📤\|📥"

Roadmap Highlights

  1. Security substrate land SHHH sentinel, finish SLURP leader-only operations, validate COOEE enrolment (see roadmap Phase 1).
  2. Autonomous teams coordinate with WHOOSH for deployment telemetry + SLURP context export.
  3. UCXL + KACHING hook runtime telemetry into KACHING and enforce UCXL validator.

Track progress via the shared roadmap and weekly burndown dashboards.

  • WHOOSH council/team orchestration
  • KACHING telemetry/licensing
  • SLURP contextual intelligence prototypes
  • HMMM meta-discussion layer

Contributing

This repo is still alpha. Please coordinate via the roadmap tickets before landing changes. Major security/runtime decisions should include a Decision Record with a UCXL address so SLURP/BUBBLE can ingest it later.

Description
Container-First P2P Task Coordination System - Next generation distributed AI agent coordination designed for Docker/Kubernetes deployments
Readme 292 MiB
Languages
Go 97.7%
HTML 1.9%
Python 0.2%
Makefile 0.1%