Files

anthonyrawlins bd19709b31 docs: Add comprehensive documentation foundation (Phase 1: Architecture & Commands)

Created complete documentation infrastructure with master index and detailed
command-line tool documentation.

Documentation Structure:
- docs/comprehensive/README.md - Master index with navigation
- docs/comprehensive/architecture/README.md - System architecture overview
- docs/comprehensive/commands/chorus-agent.md - Autonomous agent binary (✅ Production)
- docs/comprehensive/commands/chorus-hap.md - Human Agent Portal (🔶 Beta)
- docs/comprehensive/commands/chorus.md - Deprecated wrapper (⚠️ Deprecated)

Coverage Statistics:
- 3 command binaries fully documented (3,056 lines, ~14,500 words)
- Complete source code analysis with line numbers
- Configuration reference for all environment variables
- Runtime behavior and execution flows
- P2P networking details
- Health checks and monitoring
- Example deployments (local, Docker, Swarm)
- Troubleshooting guides
- Cross-references between docs

Key Features Documented:
- Container-first architecture
- P2P mesh networking
- Democratic leader election
- Docker sandbox execution
- HMMM collaborative reasoning
- UCXL decision publishing
- DHT encrypted storage
- Multi-layer security
- Human-agent collaboration

Implementation Status Tracking:
- ✅ Production features marked
- 🔶 Beta features identified
- ⏳ Stubbed components noted
- ⚠️ Deprecated code flagged

Next Phase: Package documentation (30+ packages in pkg/)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-09-30 13:49:46 +10:00

20 KiB

Raw Blame History

CHORUS Architecture Overview

System: CHORUS - Container-First P2P Task Coordination Version: 0.5.0-dev Architecture Type: Distributed, Peer-to-Peer, Event-Driven

System Overview
Core Principles
Architecture Layers
Key Components
Data Flow
Deployment Models
Related Documents

System Overview

CHORUS is a distributed task coordination system that enables both autonomous AI agents and human operators to collaborate on software development tasks through a peer-to-peer network. The system provides:

Primary Capabilities

Autonomous Agent Execution: AI agents that can execute code tasks in isolated Docker sandboxes
Human-Agent Collaboration: Human Agent Portal (HAP) for human participation in agent networks
Distributed Coordination: P2P mesh networking with democratic leader election
Context Addressing: UCXL (Universal Context Addressing) for immutable decision tracking
Secure Execution: Multi-layer sandboxing with Docker containers and security policies
Collaborative Reasoning: HMMM protocol for meta-discussion and consensus building
Encrypted Storage: DHT-based encrypted storage for sensitive data

System Philosophy

CHORUS follows these key principles:

Container-First: All configuration via environment variables, no file-based config
P2P by Default: No central server; agents form democratic mesh networks
Zero-Trust Security: Every operation validated, credentials never stored in containers
Immutable Decisions: All agent decisions recorded in content-addressed storage
Human-in-the-Loop: Humans as first-class peers in the agent network

Core Principles

1. Container-Native Architecture

┌─────────────────────────────────────────────────────────────┐
│ CHORUS Container                                             │
│                                                               │
│  Environment Variables  →  Runtime Configuration             │
│  Volume Mounts          →  Prompts & Secrets                 │
│  Network Policies       →  Zero-Egress by Default            │
│  Signal Handling        →  Dynamic Reconfiguration (SIGHUP)  │
└─────────────────────────────────────────────────────────────┘

Key Features:

No config files inside containers
All settings via environment variables
Secrets injected via secure volumes
Dynamic assignment loading from WHOOSH
SIGHUP-triggered reconfiguration

2. Peer-to-Peer Mesh Network

        Agent-1 (Alice)
           /|\
          / | \
         /  |  \
        /   |   \
   Agent-2  |  Agent-4
    (Bob)   |   (Dave)
        \   |   /
         \  |  /
          \ | /
           \|/
        Agent-3 (Carol)

All agents are equal peers
No central coordinator
Democratic leader election
mDNS local discovery
DHT global discovery

3. Multi-Layer Security

Layer 1: License Validation (KACHING)
    ↓
Layer 2: P2P Encryption (libp2p TLS)
    ↓
Layer 3: DHT Encryption (age encryption)
    ↓
Layer 4: Docker Sandboxing (namespaces, cgroups)
    ↓
Layer 5: Network Isolation (zero-egress)
    ↓
Layer 6: SHHH Secrets Detection (scan & redact)
    ↓
Layer 7: UCXL Validation (immutable audit trail)
    ↓
Layer 8: Credential Mediation (agent uploads, not container)

Architecture Layers

CHORUS is organized into distinct architectural layers:

Layer 1: P2P Infrastructure

Components:

libp2p Host (networking)
mDNS Discovery (local peers)
DHT (global peer discovery)
PubSub (message broadcasting)

Responsibilities:

Peer discovery and connection management
Encrypted peer-to-peer communication
Message routing and delivery
Network resilience and failover

See: P2P Infrastructure

Layer 2: Coordination & Consensus

Components:

Election Manager (leader election)
Task Coordinator (work distribution)
HMMM Router (meta-discussion)
SLURP (distributed orchestration)

Responsibilities:

Democratic leader election
Task assignment and tracking
Collaborative reasoning protocols
Work distribution algorithms

See: Coordination, SLURP

Layer 3: Execution Engine

Components:

Task Execution Engine
Docker Sandbox
Image Selector
Command Executor

Responsibilities:

Isolated code execution in Docker containers
Language-specific environment selection
Resource limits and monitoring
Result capture and validation

See: Execution Engine, Task Execution Engine Module

Layer 4: AI Integration

Components:

AI Provider Interface
Provider Implementations (Ollama, ResetData)
Model Selection Logic
Prompt Management

Responsibilities:

Abstract AI provider differences
Route requests to appropriate models
Manage system prompts and context
Handle AI provider failover

See: AI Providers, Providers

Layer 5: Storage & State

Components:

DHT Storage (distributed)
Encrypted Storage (age encryption)
UCXL Decision Publisher
Hypercore Log (append-only)

Responsibilities:

Distributed data storage
Encryption and key management
Immutable decision recording
Event log persistence

See: DHT, UCXL

Layer 6: Security & Validation

Components:

License Validator (KACHING)
SHHH Sentinel (secrets detection)
Crypto Layer (encryption)
Security Policies

Responsibilities:

License enforcement
Secrets scanning and redaction
Cryptographic operations
Security policy enforcement

See: Crypto, SHHH, Licensing

Layer 7: Observability

Components:

Metrics Collector (CHORUS Metrics)
Health Checks (liveness, readiness)
BACKBEAT Integration (P2P telemetry)
Hypercore Log (coordination events)

Responsibilities:

System metrics collection
Health monitoring
P2P operation tracking
Event logging and audit trails

See: Metrics, Health

Layer 8: External Interfaces

Components:

HTTP API Server
UCXI Server (content resolution)
HAP Terminal Interface
HAP Web Interface [STUB]

Responsibilities:

REST API endpoints
UCXL content resolution
Human interaction interfaces
External system integration

See: API, UCXI, HAP UI

Key Components

Runtime Architecture

┌──────────────────────────────────────────────────────────────┐
│ main.go (cmd/agent or cmd/hap)                               │
│   │                                                            │
│   └─→ internal/runtime.Initialize()                           │
│          │                                                     │
│          ├─→ Config Loading (environment)                     │
│          ├─→ License Validation (KACHING)                     │
│          ├─→ AI Provider Setup (Ollama/ResetData)            │
│          ├─→ P2P Node Creation (libp2p)                       │
│          ├─→ PubSub Initialization                            │
│          ├─→ DHT Setup (optional)                             │
│          ├─→ Election Manager                                 │
│          ├─→ Task Coordinator                                 │
│          ├─→ HTTP API Server                                  │
│          ├─→ UCXI Server (optional)                           │
│          └─→ Health & Metrics                                 │
│                                                                │
│   SharedRuntime                                               │
│   ├── Context & Cancellation                                  │
│   ├── Logger (SimpleLogger)                                   │
│   ├── Config (*config.Config)                                 │
│   ├── RuntimeConfig (dynamic assignments)                     │
│   ├── P2P Node (*p2p.Node)                                    │
│   ├── PubSub (*pubsub.PubSub)                                │
│   ├── DHT (*dht.LibP2PDHT)                                    │
│   ├── Encrypted Storage (*dht.EncryptedDHTStorage)           │
│   ├── Election Manager (*election.ElectionManager)           │
│   ├── Task Coordinator (*coordinator.TaskCoordinator)        │
│   ├── HTTP Server (*api.HTTPServer)                           │
│   ├── UCXI Server (*ucxi.Server)                              │
│   ├── Health Manager (*health.Manager)                        │
│   ├── Metrics (*metrics.CHORUSMetrics)                        │
│   ├── SHHH Sentinel (*shhh.Sentinel)                          │
│   ├── BACKBEAT Integration (*backbeat.Integration)           │
│   └── Decision Publisher (*ucxl.DecisionPublisher)           │
└──────────────────────────────────────────────────────────────┘

Binary Separation

CHORUS provides three binaries with shared infrastructure:

Binary	Purpose	Mode	Status
chorus-agent	Autonomous AI agent	Agent Mode	✅ Production
chorus-hap	Human Agent Portal	HAP Mode	🔶 Beta
chorus	Compatibility wrapper	N/A	🔴 Deprecated

All binaries share:

P2P infrastructure (libp2p, PubSub, DHT)
Election and coordination systems
Security and encryption layers
Configuration and licensing

Differences:

Agent: Automatic task execution, autonomous reasoning
HAP: Terminal/web UI for human interaction, manual task approval

See: Commands

Data Flow

Task Execution Flow

1. Task Request Arrives
   │
   ├─→ Via PubSub (from another agent)
   ├─→ Via HTTP API (from external system)
   └─→ Via HAP (from human operator)
   │
   ↓
2. Task Coordinator Receives Task
   │
   ├─→ Check agent availability
   ├─→ Validate task structure
   └─→ Assign to execution engine
   │
   ↓
3. Execution Engine Processes
   │
   ├─→ Detect language (Go, Rust, Python, etc.)
   ├─→ Select Docker image
   ├─→ Create sandbox configuration
   ├─→ Start container
   │   │
   │   ├─→ Mount /workspace/input (read-only source)
   │   ├─→ Mount /workspace/data (working directory)
   │   └─→ Mount /workspace/output (deliverables)
   │
   ├─→ Execute commands via Docker Exec API
   ├─→ Stream stdout/stderr
   ├─→ Monitor resource usage
   └─→ Capture exit codes
   │
   ↓
4. Result Processing
   │
   ├─→ Collect artifacts from /workspace/output
   ├─→ Generate task summary
   ├─→ Create UCXL decision record
   └─→ Publish to DHT (encrypted)
   │
   ↓
5. Result Distribution
   │
   ├─→ Broadcast completion via PubSub
   ├─→ Update task tracker (availability)
   ├─→ Notify requester (if HTTP API)
   └─→ Log to Hypercore (audit trail)

Decision Publishing Flow

Agent Decision Made
   │
   ↓
Generate UCXL Context Address
   │
   ├─→ Hash decision content (SHA-256)
   ├─→ Create ucxl:// URI
   └─→ Add metadata (agent ID, timestamp)
   │
   ↓
Encrypt Decision Data
   │
   ├─→ Use age encryption
   ├─→ Derive key from shared secret
   └─→ Create encrypted blob
   │
   ↓
Store in DHT
   │
   ├─→ Key: UCXL hash
   ├─→ Value: Encrypted decision
   └─→ TTL: Configured expiration
   │
   ↓
Announce on PubSub
   │
   ├─→ Topic: "chorus/decisions"
   ├─→ Payload: UCXL address only
   └─→ Interested peers can fetch from DHT

Election Flow

Agent Startup
   │
   ↓
Join Election Topic
   │
   ├─→ Subscribe to "chorus/election/v1"
   ├─→ Announce presence
   └─→ Share capabilities
   │
   ↓
Send Heartbeats
   │
   ├─→ Every 5 seconds
   ├─→ Include: Node ID, Uptime, Load
   └─→ Track other peers' heartbeats
   │
   ↓
Monitor Admin Status
   │
   ├─→ Track last admin heartbeat
   ├─→ Timeout: 15 seconds
   └─→ If timeout → Trigger election
   │
   ↓
Election Triggered
   │
   ├─→ All agents propose themselves
   ├─→ Vote for highest uptime
   ├─→ Consensus on winner
   └─→ Winner becomes admin
   │
   ↓
Admin Elected
   │
   ├─→ Winner assumes admin role
   ├─→ Applies admin configuration
   ├─→ Enables SLURP coordination
   └─→ Continues heartbeat at higher frequency

Deployment Models

Model 1: Local Development

┌─────────────────────────────────────────┐
│ Developer Laptop                         │
│                                          │
│  ┌──────────────┐  ┌──────────────┐    │
│  │ chorus-agent │  │ chorus-hap   │    │
│  │  (Alice)     │  │  (Human)     │    │
│  └──────┬───────┘  └──────┬───────┘    │
│         │                  │             │
│         └────────┬─────────┘             │
│                  │                       │
│           mDNS Discovery                 │
│           P2P Mesh (local)               │
│                                          │
│  Ollama: localhost:11434                │
│  Docker: /var/run/docker.sock           │
└─────────────────────────────────────────┘

Characteristics:

Single machine deployment
mDNS for peer discovery
Local Ollama instance
Shared Docker socket
No DHT required

Use Cases:

Local testing
Development workflows
Single-user tasks

Model 2: Docker Swarm Cluster

┌────────────────────────────────────────────────────────────┐
│ Docker Swarm Cluster                                        │
│                                                              │
│  Manager Node 1          Manager Node 2          Worker 1   │
│  ┌──────────────┐       ┌──────────────┐       ┌─────────┐ │
│  │ chorus-agent │←─────→│ chorus-agent │←─────→│ chorus  │ │
│  │ (Leader)     │       │ (Follower)   │       │ -agent  │ │
│  └──────────────┘       └──────────────┘       └─────────┘ │
│         ↑                       ↑                     ↑      │
│         │                       │                     │      │
│         └───────────────────────┴─────────────────────┘      │
│                     Docker Swarm Overlay Network             │
│                     P2P Mesh + DHT                           │
│                                                              │
│  Shared Services:                                           │
│  - Docker Registry (private)                                │
│  - Ollama Distributed (5 nodes)                             │
│  - NFS Storage (/rust)                                      │
│  - WHOOSH (assignment server)                               │
│  - KACHING (license server)                                 │
└────────────────────────────────────────────────────────────┘

Characteristics:

Multi-node cluster
DHT for global discovery
Bootstrap peers for network joining
Overlay networking
Shared storage via NFS
Centralized license validation

Use Cases:

Production deployments
Team collaboration
High availability
Scalable workloads

Model 3: Hybrid (Agent + HAP)

┌──────────────────────────────────────────────────────────┐
│ Production Environment                                    │
│                                                            │
│  Docker Swarm                    Developer Workstation    │
│  ┌──────────────┐               ┌──────────────┐         │
│  │ chorus-agent │               │ chorus-hap   │         │
│  │ (Alice)      │←─────P2P─────→│ (Human-Bob)  │         │
│  └──────┬───────┘               └──────────────┘         │
│         │                                                  │
│  ┌──────┴───────┐                                         │
│  │ chorus-agent │                                         │
│  │ (Carol)      │                                         │
│  └──────────────┘                                         │
│                                                            │
│  Autonomous agents run in swarm                           │
│  Human operator joins via HAP (local or remote)           │
│  Same P2P protocol, equal participants                    │
└──────────────────────────────────────────────────────────┘

Characteristics:

Autonomous agents in production
Human operators join as needed
Collaborative decision-making
HMMM meta-discussion
Humans can override or guide

Use Cases:

Supervised automation
Human-in-the-loop workflows
Critical decision points
Training and oversight

Getting Started

Commands Overview - Entry points and CLI tools
Deployment Guide - How to deploy CHORUS
Configuration - Environment variables and settings

Core Systems

Task Execution Engine - Complete execution engine documentation
P2P Infrastructure - libp2p networking details
SLURP System - Distributed coordination

Security

Security Architecture - Security layers and threat model
Crypto Package - Encryption and key management
SHHH - Secrets detection and redaction
Licensing - License validation

Integration

API Reference - HTTP API endpoints
UCXL System - Context addressing
AI Providers - AI integration

Next Steps

For detailed information on specific components:

New to CHORUS? Start with System Architecture
Want to deploy? See Deployment Guide
Developing features? Review Component Map
Understanding execution? Read Task Execution Engine

20 KiB Raw Blame History

CHORUS Architecture Overview

Table of Contents

System Overview

Primary Capabilities

System Philosophy

Core Principles

1. Container-Native Architecture

2. Peer-to-Peer Mesh Network

3. Multi-Layer Security

Architecture Layers

Layer 1: P2P Infrastructure

Layer 2: Coordination & Consensus

Layer 3: Execution Engine

Layer 4: AI Integration

Layer 5: Storage & State

Layer 6: Security & Validation

Layer 7: Observability

Layer 8: External Interfaces

Key Components

Runtime Architecture

Binary Separation

Data Flow

Task Execution Flow

Decision Publishing Flow

Election Flow

Deployment Models

Model 1: Local Development

Model 2: Docker Swarm Cluster

Model 3: Hybrid (Agent + HAP)

Related Documents

Getting Started

Core Systems

Security

Integration

Next Steps

20 KiB

Raw Blame History