Created complete documentation infrastructure with master index and detailed command-line tool documentation. Documentation Structure: - docs/comprehensive/README.md - Master index with navigation - docs/comprehensive/architecture/README.md - System architecture overview - docs/comprehensive/commands/chorus-agent.md - Autonomous agent binary (✅ Production) - docs/comprehensive/commands/chorus-hap.md - Human Agent Portal (🔶 Beta) - docs/comprehensive/commands/chorus.md - Deprecated wrapper (⚠️ Deprecated) Coverage Statistics: - 3 command binaries fully documented (3,056 lines, ~14,500 words) - Complete source code analysis with line numbers - Configuration reference for all environment variables - Runtime behavior and execution flows - P2P networking details - Health checks and monitoring - Example deployments (local, Docker, Swarm) - Troubleshooting guides - Cross-references between docs Key Features Documented: - Container-first architecture - P2P mesh networking - Democratic leader election - Docker sandbox execution - HMMM collaborative reasoning - UCXL decision publishing - DHT encrypted storage - Multi-layer security - Human-agent collaboration Implementation Status Tracking: - ✅ Production features marked - 🔶 Beta features identified - ⏳ Stubbed components noted - ⚠️ Deprecated code flagged Next Phase: Package documentation (30+ packages in pkg/) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
590 lines
20 KiB
Markdown
590 lines
20 KiB
Markdown
# CHORUS Architecture Overview
|
|
|
|
**System:** CHORUS - Container-First P2P Task Coordination
|
|
**Version:** 0.5.0-dev
|
|
**Architecture Type:** Distributed, Peer-to-Peer, Event-Driven
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [System Overview](#system-overview)
|
|
2. [Core Principles](#core-principles)
|
|
3. [Architecture Layers](#architecture-layers)
|
|
4. [Key Components](#key-components)
|
|
5. [Data Flow](#data-flow)
|
|
6. [Deployment Models](#deployment-models)
|
|
7. [Related Documents](#related-documents)
|
|
|
|
---
|
|
|
|
## System Overview
|
|
|
|
CHORUS is a **distributed task coordination system** that enables both autonomous AI agents and human operators to collaborate on software development tasks through a peer-to-peer network. The system provides:
|
|
|
|
### Primary Capabilities
|
|
|
|
- **Autonomous Agent Execution**: AI agents that can execute code tasks in isolated Docker sandboxes
|
|
- **Human-Agent Collaboration**: Human Agent Portal (HAP) for human participation in agent networks
|
|
- **Distributed Coordination**: P2P mesh networking with democratic leader election
|
|
- **Context Addressing**: UCXL (Universal Context Addressing) for immutable decision tracking
|
|
- **Secure Execution**: Multi-layer sandboxing with Docker containers and security policies
|
|
- **Collaborative Reasoning**: HMMM protocol for meta-discussion and consensus building
|
|
- **Encrypted Storage**: DHT-based encrypted storage for sensitive data
|
|
|
|
### System Philosophy
|
|
|
|
CHORUS follows these key principles:
|
|
|
|
1. **Container-First**: All configuration via environment variables, no file-based config
|
|
2. **P2P by Default**: No central server; agents form democratic mesh networks
|
|
3. **Zero-Trust Security**: Every operation validated, credentials never stored in containers
|
|
4. **Immutable Decisions**: All agent decisions recorded in content-addressed storage
|
|
5. **Human-in-the-Loop**: Humans as first-class peers in the agent network
|
|
|
|
---
|
|
|
|
## Core Principles
|
|
|
|
### 1. Container-Native Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ CHORUS Container │
|
|
│ │
|
|
│ Environment Variables → Runtime Configuration │
|
|
│ Volume Mounts → Prompts & Secrets │
|
|
│ Network Policies → Zero-Egress by Default │
|
|
│ Signal Handling → Dynamic Reconfiguration (SIGHUP) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Key Features:**
|
|
- No config files inside containers
|
|
- All settings via environment variables
|
|
- Secrets injected via secure volumes
|
|
- Dynamic assignment loading from WHOOSH
|
|
- SIGHUP-triggered reconfiguration
|
|
|
|
### 2. Peer-to-Peer Mesh Network
|
|
|
|
```
|
|
Agent-1 (Alice)
|
|
/|\
|
|
/ | \
|
|
/ | \
|
|
/ | \
|
|
Agent-2 | Agent-4
|
|
(Bob) | (Dave)
|
|
\ | /
|
|
\ | /
|
|
\ | /
|
|
\|/
|
|
Agent-3 (Carol)
|
|
|
|
All agents are equal peers
|
|
No central coordinator
|
|
Democratic leader election
|
|
mDNS local discovery
|
|
DHT global discovery
|
|
```
|
|
|
|
### 3. Multi-Layer Security
|
|
|
|
```
|
|
Layer 1: License Validation (KACHING)
|
|
↓
|
|
Layer 2: P2P Encryption (libp2p TLS)
|
|
↓
|
|
Layer 3: DHT Encryption (age encryption)
|
|
↓
|
|
Layer 4: Docker Sandboxing (namespaces, cgroups)
|
|
↓
|
|
Layer 5: Network Isolation (zero-egress)
|
|
↓
|
|
Layer 6: SHHH Secrets Detection (scan & redact)
|
|
↓
|
|
Layer 7: UCXL Validation (immutable audit trail)
|
|
↓
|
|
Layer 8: Credential Mediation (agent uploads, not container)
|
|
```
|
|
|
|
---
|
|
|
|
## Architecture Layers
|
|
|
|
CHORUS is organized into distinct architectural layers:
|
|
|
|
### Layer 1: P2P Infrastructure
|
|
|
|
**Components:**
|
|
- libp2p Host (networking)
|
|
- mDNS Discovery (local peers)
|
|
- DHT (global peer discovery)
|
|
- PubSub (message broadcasting)
|
|
|
|
**Responsibilities:**
|
|
- Peer discovery and connection management
|
|
- Encrypted peer-to-peer communication
|
|
- Message routing and delivery
|
|
- Network resilience and failover
|
|
|
|
**See:** [P2P Infrastructure](../internal/p2p.md)
|
|
|
|
### Layer 2: Coordination & Consensus
|
|
|
|
**Components:**
|
|
- Election Manager (leader election)
|
|
- Task Coordinator (work distribution)
|
|
- HMMM Router (meta-discussion)
|
|
- SLURP (distributed orchestration)
|
|
|
|
**Responsibilities:**
|
|
- Democratic leader election
|
|
- Task assignment and tracking
|
|
- Collaborative reasoning protocols
|
|
- Work distribution algorithms
|
|
|
|
**See:** [Coordination](../packages/coordination.md), [SLURP](../packages/slurp/README.md)
|
|
|
|
### Layer 3: Execution Engine
|
|
|
|
**Components:**
|
|
- Task Execution Engine
|
|
- Docker Sandbox
|
|
- Image Selector
|
|
- Command Executor
|
|
|
|
**Responsibilities:**
|
|
- Isolated code execution in Docker containers
|
|
- Language-specific environment selection
|
|
- Resource limits and monitoring
|
|
- Result capture and validation
|
|
|
|
**See:** [Execution Engine](../packages/execution.md), [Task Execution Engine Module](../../Modules/TaskExecutionEngine.md)
|
|
|
|
### Layer 4: AI Integration
|
|
|
|
**Components:**
|
|
- AI Provider Interface
|
|
- Provider Implementations (Ollama, ResetData)
|
|
- Model Selection Logic
|
|
- Prompt Management
|
|
|
|
**Responsibilities:**
|
|
- Abstract AI provider differences
|
|
- Route requests to appropriate models
|
|
- Manage system prompts and context
|
|
- Handle AI provider failover
|
|
|
|
**See:** [AI Providers](../packages/ai.md), [Providers](../packages/providers.md)
|
|
|
|
### Layer 5: Storage & State
|
|
|
|
**Components:**
|
|
- DHT Storage (distributed)
|
|
- Encrypted Storage (age encryption)
|
|
- UCXL Decision Publisher
|
|
- Hypercore Log (append-only)
|
|
|
|
**Responsibilities:**
|
|
- Distributed data storage
|
|
- Encryption and key management
|
|
- Immutable decision recording
|
|
- Event log persistence
|
|
|
|
**See:** [DHT](../packages/dht.md), [UCXL](../packages/ucxl.md)
|
|
|
|
### Layer 6: Security & Validation
|
|
|
|
**Components:**
|
|
- License Validator (KACHING)
|
|
- SHHH Sentinel (secrets detection)
|
|
- Crypto Layer (encryption)
|
|
- Security Policies
|
|
|
|
**Responsibilities:**
|
|
- License enforcement
|
|
- Secrets scanning and redaction
|
|
- Cryptographic operations
|
|
- Security policy enforcement
|
|
|
|
**See:** [Crypto](../packages/crypto.md), [SHHH](../packages/shhh.md), [Licensing](../internal/licensing.md)
|
|
|
|
### Layer 7: Observability
|
|
|
|
**Components:**
|
|
- Metrics Collector (CHORUS Metrics)
|
|
- Health Checks (liveness, readiness)
|
|
- BACKBEAT Integration (P2P telemetry)
|
|
- Hypercore Log (coordination events)
|
|
|
|
**Responsibilities:**
|
|
- System metrics collection
|
|
- Health monitoring
|
|
- P2P operation tracking
|
|
- Event logging and audit trails
|
|
|
|
**See:** [Metrics](../packages/metrics.md), [Health](../packages/health.md)
|
|
|
|
### Layer 8: External Interfaces
|
|
|
|
**Components:**
|
|
- HTTP API Server
|
|
- UCXI Server (content resolution)
|
|
- HAP Terminal Interface
|
|
- HAP Web Interface [STUB]
|
|
|
|
**Responsibilities:**
|
|
- REST API endpoints
|
|
- UCXL content resolution
|
|
- Human interaction interfaces
|
|
- External system integration
|
|
|
|
**See:** [API](../api/README.md), [UCXI](../packages/ucxi.md), [HAP UI](../internal/hapui.md)
|
|
|
|
---
|
|
|
|
## Key Components
|
|
|
|
### Runtime Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ main.go (cmd/agent or cmd/hap) │
|
|
│ │ │
|
|
│ └─→ internal/runtime.Initialize() │
|
|
│ │ │
|
|
│ ├─→ Config Loading (environment) │
|
|
│ ├─→ License Validation (KACHING) │
|
|
│ ├─→ AI Provider Setup (Ollama/ResetData) │
|
|
│ ├─→ P2P Node Creation (libp2p) │
|
|
│ ├─→ PubSub Initialization │
|
|
│ ├─→ DHT Setup (optional) │
|
|
│ ├─→ Election Manager │
|
|
│ ├─→ Task Coordinator │
|
|
│ ├─→ HTTP API Server │
|
|
│ ├─→ UCXI Server (optional) │
|
|
│ └─→ Health & Metrics │
|
|
│ │
|
|
│ SharedRuntime │
|
|
│ ├── Context & Cancellation │
|
|
│ ├── Logger (SimpleLogger) │
|
|
│ ├── Config (*config.Config) │
|
|
│ ├── RuntimeConfig (dynamic assignments) │
|
|
│ ├── P2P Node (*p2p.Node) │
|
|
│ ├── PubSub (*pubsub.PubSub) │
|
|
│ ├── DHT (*dht.LibP2PDHT) │
|
|
│ ├── Encrypted Storage (*dht.EncryptedDHTStorage) │
|
|
│ ├── Election Manager (*election.ElectionManager) │
|
|
│ ├── Task Coordinator (*coordinator.TaskCoordinator) │
|
|
│ ├── HTTP Server (*api.HTTPServer) │
|
|
│ ├── UCXI Server (*ucxi.Server) │
|
|
│ ├── Health Manager (*health.Manager) │
|
|
│ ├── Metrics (*metrics.CHORUSMetrics) │
|
|
│ ├── SHHH Sentinel (*shhh.Sentinel) │
|
|
│ ├── BACKBEAT Integration (*backbeat.Integration) │
|
|
│ └── Decision Publisher (*ucxl.DecisionPublisher) │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Binary Separation
|
|
|
|
CHORUS provides three binaries with shared infrastructure:
|
|
|
|
| Binary | Purpose | Mode | Status |
|
|
|--------|---------|------|--------|
|
|
| **chorus-agent** | Autonomous AI agent | Agent Mode | ✅ Production |
|
|
| **chorus-hap** | Human Agent Portal | HAP Mode | 🔶 Beta |
|
|
| **chorus** | Compatibility wrapper | N/A | 🔴 Deprecated |
|
|
|
|
All binaries share:
|
|
- P2P infrastructure (libp2p, PubSub, DHT)
|
|
- Election and coordination systems
|
|
- Security and encryption layers
|
|
- Configuration and licensing
|
|
|
|
Differences:
|
|
- **Agent**: Automatic task execution, autonomous reasoning
|
|
- **HAP**: Terminal/web UI for human interaction, manual task approval
|
|
|
|
**See:** [Commands](../commands/README.md)
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
### Task Execution Flow
|
|
|
|
```
|
|
1. Task Request Arrives
|
|
│
|
|
├─→ Via PubSub (from another agent)
|
|
├─→ Via HTTP API (from external system)
|
|
└─→ Via HAP (from human operator)
|
|
│
|
|
↓
|
|
2. Task Coordinator Receives Task
|
|
│
|
|
├─→ Check agent availability
|
|
├─→ Validate task structure
|
|
└─→ Assign to execution engine
|
|
│
|
|
↓
|
|
3. Execution Engine Processes
|
|
│
|
|
├─→ Detect language (Go, Rust, Python, etc.)
|
|
├─→ Select Docker image
|
|
├─→ Create sandbox configuration
|
|
├─→ Start container
|
|
│ │
|
|
│ ├─→ Mount /workspace/input (read-only source)
|
|
│ ├─→ Mount /workspace/data (working directory)
|
|
│ └─→ Mount /workspace/output (deliverables)
|
|
│
|
|
├─→ Execute commands via Docker Exec API
|
|
├─→ Stream stdout/stderr
|
|
├─→ Monitor resource usage
|
|
└─→ Capture exit codes
|
|
│
|
|
↓
|
|
4. Result Processing
|
|
│
|
|
├─→ Collect artifacts from /workspace/output
|
|
├─→ Generate task summary
|
|
├─→ Create UCXL decision record
|
|
└─→ Publish to DHT (encrypted)
|
|
│
|
|
↓
|
|
5. Result Distribution
|
|
│
|
|
├─→ Broadcast completion via PubSub
|
|
├─→ Update task tracker (availability)
|
|
├─→ Notify requester (if HTTP API)
|
|
└─→ Log to Hypercore (audit trail)
|
|
```
|
|
|
|
### Decision Publishing Flow
|
|
|
|
```
|
|
Agent Decision Made
|
|
│
|
|
↓
|
|
Generate UCXL Context Address
|
|
│
|
|
├─→ Hash decision content (SHA-256)
|
|
├─→ Create ucxl:// URI
|
|
└─→ Add metadata (agent ID, timestamp)
|
|
│
|
|
↓
|
|
Encrypt Decision Data
|
|
│
|
|
├─→ Use age encryption
|
|
├─→ Derive key from shared secret
|
|
└─→ Create encrypted blob
|
|
│
|
|
↓
|
|
Store in DHT
|
|
│
|
|
├─→ Key: UCXL hash
|
|
├─→ Value: Encrypted decision
|
|
└─→ TTL: Configured expiration
|
|
│
|
|
↓
|
|
Announce on PubSub
|
|
│
|
|
├─→ Topic: "chorus/decisions"
|
|
├─→ Payload: UCXL address only
|
|
└─→ Interested peers can fetch from DHT
|
|
```
|
|
|
|
### Election Flow
|
|
|
|
```
|
|
Agent Startup
|
|
│
|
|
↓
|
|
Join Election Topic
|
|
│
|
|
├─→ Subscribe to "chorus/election/v1"
|
|
├─→ Announce presence
|
|
└─→ Share capabilities
|
|
│
|
|
↓
|
|
Send Heartbeats
|
|
│
|
|
├─→ Every 5 seconds
|
|
├─→ Include: Node ID, Uptime, Load
|
|
└─→ Track other peers' heartbeats
|
|
│
|
|
↓
|
|
Monitor Admin Status
|
|
│
|
|
├─→ Track last admin heartbeat
|
|
├─→ Timeout: 15 seconds
|
|
└─→ If timeout → Trigger election
|
|
│
|
|
↓
|
|
Election Triggered
|
|
│
|
|
├─→ All agents propose themselves
|
|
├─→ Vote for highest uptime
|
|
├─→ Consensus on winner
|
|
└─→ Winner becomes admin
|
|
│
|
|
↓
|
|
Admin Elected
|
|
│
|
|
├─→ Winner assumes admin role
|
|
├─→ Applies admin configuration
|
|
├─→ Enables SLURP coordination
|
|
└─→ Continues heartbeat at higher frequency
|
|
```
|
|
|
|
---
|
|
|
|
## Deployment Models
|
|
|
|
### Model 1: Local Development
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ Developer Laptop │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ chorus-agent │ │ chorus-hap │ │
|
|
│ │ (Alice) │ │ (Human) │ │
|
|
│ └──────┬───────┘ └──────┬───────┘ │
|
|
│ │ │ │
|
|
│ └────────┬─────────┘ │
|
|
│ │ │
|
|
│ mDNS Discovery │
|
|
│ P2P Mesh (local) │
|
|
│ │
|
|
│ Ollama: localhost:11434 │
|
|
│ Docker: /var/run/docker.sock │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
**Characteristics:**
|
|
- Single machine deployment
|
|
- mDNS for peer discovery
|
|
- Local Ollama instance
|
|
- Shared Docker socket
|
|
- No DHT required
|
|
|
|
**Use Cases:**
|
|
- Local testing
|
|
- Development workflows
|
|
- Single-user tasks
|
|
|
|
### Model 2: Docker Swarm Cluster
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────────────┐
|
|
│ Docker Swarm Cluster │
|
|
│ │
|
|
│ Manager Node 1 Manager Node 2 Worker 1 │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ │
|
|
│ │ chorus-agent │←─────→│ chorus-agent │←─────→│ chorus │ │
|
|
│ │ (Leader) │ │ (Follower) │ │ -agent │ │
|
|
│ └──────────────┘ └──────────────┘ └─────────┘ │
|
|
│ ↑ ↑ ↑ │
|
|
│ │ │ │ │
|
|
│ └───────────────────────┴─────────────────────┘ │
|
|
│ Docker Swarm Overlay Network │
|
|
│ P2P Mesh + DHT │
|
|
│ │
|
|
│ Shared Services: │
|
|
│ - Docker Registry (private) │
|
|
│ - Ollama Distributed (5 nodes) │
|
|
│ - NFS Storage (/rust) │
|
|
│ - WHOOSH (assignment server) │
|
|
│ - KACHING (license server) │
|
|
└────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Characteristics:**
|
|
- Multi-node cluster
|
|
- DHT for global discovery
|
|
- Bootstrap peers for network joining
|
|
- Overlay networking
|
|
- Shared storage via NFS
|
|
- Centralized license validation
|
|
|
|
**Use Cases:**
|
|
- Production deployments
|
|
- Team collaboration
|
|
- High availability
|
|
- Scalable workloads
|
|
|
|
### Model 3: Hybrid (Agent + HAP)
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────┐
|
|
│ Production Environment │
|
|
│ │
|
|
│ Docker Swarm Developer Workstation │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ chorus-agent │ │ chorus-hap │ │
|
|
│ │ (Alice) │←─────P2P─────→│ (Human-Bob) │ │
|
|
│ └──────┬───────┘ └──────────────┘ │
|
|
│ │ │
|
|
│ ┌──────┴───────┐ │
|
|
│ │ chorus-agent │ │
|
|
│ │ (Carol) │ │
|
|
│ └──────────────┘ │
|
|
│ │
|
|
│ Autonomous agents run in swarm │
|
|
│ Human operator joins via HAP (local or remote) │
|
|
│ Same P2P protocol, equal participants │
|
|
└──────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Characteristics:**
|
|
- Autonomous agents in production
|
|
- Human operators join as needed
|
|
- Collaborative decision-making
|
|
- HMMM meta-discussion
|
|
- Humans can override or guide
|
|
|
|
**Use Cases:**
|
|
- Supervised automation
|
|
- Human-in-the-loop workflows
|
|
- Critical decision points
|
|
- Training and oversight
|
|
|
|
---
|
|
|
|
## Related Documents
|
|
|
|
### Getting Started
|
|
- [Commands Overview](../commands/README.md) - Entry points and CLI tools
|
|
- [Deployment Guide](../deployment/README.md) - How to deploy CHORUS
|
|
- [Configuration](../deployment/configuration.md) - Environment variables and settings
|
|
|
|
### Core Systems
|
|
- [Task Execution Engine](../../Modules/TaskExecutionEngine.md) - Complete execution engine documentation
|
|
- [P2P Infrastructure](../internal/p2p.md) - libp2p networking details
|
|
- [SLURP System](../packages/slurp/README.md) - Distributed coordination
|
|
|
|
### Security
|
|
- [Security Architecture](security.md) - Security layers and threat model
|
|
- [Crypto Package](../packages/crypto.md) - Encryption and key management
|
|
- [SHHH](../packages/shhh.md) - Secrets detection and redaction
|
|
- [Licensing](../internal/licensing.md) - License validation
|
|
|
|
### Integration
|
|
- [API Reference](../api/reference.md) - HTTP API endpoints
|
|
- [UCXL System](../packages/ucxl.md) - Context addressing
|
|
- [AI Providers](../packages/ai.md) - AI integration
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
For detailed information on specific components:
|
|
1. **New to CHORUS?** Start with [System Architecture](system-architecture.md)
|
|
2. **Want to deploy?** See [Deployment Guide](../deployment/README.md)
|
|
3. **Developing features?** Review [Component Map](component-map.md)
|
|
4. **Understanding execution?** Read [Task Execution Engine](../../Modules/TaskExecutionEngine.md) |