Files
CHORUS/docs/comprehensive/architecture
anthonyrawlins bd19709b31 docs: Add comprehensive documentation foundation (Phase 1: Architecture & Commands)
Created complete documentation infrastructure with master index and detailed
command-line tool documentation.

Documentation Structure:
- docs/comprehensive/README.md - Master index with navigation
- docs/comprehensive/architecture/README.md - System architecture overview
- docs/comprehensive/commands/chorus-agent.md - Autonomous agent binary ( Production)
- docs/comprehensive/commands/chorus-hap.md - Human Agent Portal (🔶 Beta)
- docs/comprehensive/commands/chorus.md - Deprecated wrapper (⚠️ Deprecated)

Coverage Statistics:
- 3 command binaries fully documented (3,056 lines, ~14,500 words)
- Complete source code analysis with line numbers
- Configuration reference for all environment variables
- Runtime behavior and execution flows
- P2P networking details
- Health checks and monitoring
- Example deployments (local, Docker, Swarm)
- Troubleshooting guides
- Cross-references between docs

Key Features Documented:
- Container-first architecture
- P2P mesh networking
- Democratic leader election
- Docker sandbox execution
- HMMM collaborative reasoning
- UCXL decision publishing
- DHT encrypted storage
- Multi-layer security
- Human-agent collaboration

Implementation Status Tracking:
-  Production features marked
- 🔶 Beta features identified
-  Stubbed components noted
- ⚠️ Deprecated code flagged

Next Phase: Package documentation (30+ packages in pkg/)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 13:49:46 +10:00
..

CHORUS Architecture Overview

System: CHORUS - Container-First P2P Task Coordination Version: 0.5.0-dev Architecture Type: Distributed, Peer-to-Peer, Event-Driven


Table of Contents

  1. System Overview
  2. Core Principles
  3. Architecture Layers
  4. Key Components
  5. Data Flow
  6. Deployment Models
  7. Related Documents

System Overview

CHORUS is a distributed task coordination system that enables both autonomous AI agents and human operators to collaborate on software development tasks through a peer-to-peer network. The system provides:

Primary Capabilities

  • Autonomous Agent Execution: AI agents that can execute code tasks in isolated Docker sandboxes
  • Human-Agent Collaboration: Human Agent Portal (HAP) for human participation in agent networks
  • Distributed Coordination: P2P mesh networking with democratic leader election
  • Context Addressing: UCXL (Universal Context Addressing) for immutable decision tracking
  • Secure Execution: Multi-layer sandboxing with Docker containers and security policies
  • Collaborative Reasoning: HMMM protocol for meta-discussion and consensus building
  • Encrypted Storage: DHT-based encrypted storage for sensitive data

System Philosophy

CHORUS follows these key principles:

  1. Container-First: All configuration via environment variables, no file-based config
  2. P2P by Default: No central server; agents form democratic mesh networks
  3. Zero-Trust Security: Every operation validated, credentials never stored in containers
  4. Immutable Decisions: All agent decisions recorded in content-addressed storage
  5. Human-in-the-Loop: Humans as first-class peers in the agent network

Core Principles

1. Container-Native Architecture

┌─────────────────────────────────────────────────────────────┐
│ CHORUS Container                                             │
│                                                               │
│  Environment Variables  →  Runtime Configuration             │
│  Volume Mounts          →  Prompts & Secrets                 │
│  Network Policies       →  Zero-Egress by Default            │
│  Signal Handling        →  Dynamic Reconfiguration (SIGHUP)  │
└─────────────────────────────────────────────────────────────┘

Key Features:

  • No config files inside containers
  • All settings via environment variables
  • Secrets injected via secure volumes
  • Dynamic assignment loading from WHOOSH
  • SIGHUP-triggered reconfiguration

2. Peer-to-Peer Mesh Network

        Agent-1 (Alice)
           /|\
          / | \
         /  |  \
        /   |   \
   Agent-2  |  Agent-4
    (Bob)   |   (Dave)
        \   |   /
         \  |  /
          \ | /
           \|/
        Agent-3 (Carol)

All agents are equal peers
No central coordinator
Democratic leader election
mDNS local discovery
DHT global discovery

3. Multi-Layer Security

Layer 1: License Validation (KACHING)
    ↓
Layer 2: P2P Encryption (libp2p TLS)
    ↓
Layer 3: DHT Encryption (age encryption)
    ↓
Layer 4: Docker Sandboxing (namespaces, cgroups)
    ↓
Layer 5: Network Isolation (zero-egress)
    ↓
Layer 6: SHHH Secrets Detection (scan & redact)
    ↓
Layer 7: UCXL Validation (immutable audit trail)
    ↓
Layer 8: Credential Mediation (agent uploads, not container)

Architecture Layers

CHORUS is organized into distinct architectural layers:

Layer 1: P2P Infrastructure

Components:

  • libp2p Host (networking)
  • mDNS Discovery (local peers)
  • DHT (global peer discovery)
  • PubSub (message broadcasting)

Responsibilities:

  • Peer discovery and connection management
  • Encrypted peer-to-peer communication
  • Message routing and delivery
  • Network resilience and failover

See: P2P Infrastructure

Layer 2: Coordination & Consensus

Components:

  • Election Manager (leader election)
  • Task Coordinator (work distribution)
  • HMMM Router (meta-discussion)
  • SLURP (distributed orchestration)

Responsibilities:

  • Democratic leader election
  • Task assignment and tracking
  • Collaborative reasoning protocols
  • Work distribution algorithms

See: Coordination, SLURP

Layer 3: Execution Engine

Components:

  • Task Execution Engine
  • Docker Sandbox
  • Image Selector
  • Command Executor

Responsibilities:

  • Isolated code execution in Docker containers
  • Language-specific environment selection
  • Resource limits and monitoring
  • Result capture and validation

See: Execution Engine, Task Execution Engine Module

Layer 4: AI Integration

Components:

  • AI Provider Interface
  • Provider Implementations (Ollama, ResetData)
  • Model Selection Logic
  • Prompt Management

Responsibilities:

  • Abstract AI provider differences
  • Route requests to appropriate models
  • Manage system prompts and context
  • Handle AI provider failover

See: AI Providers, Providers

Layer 5: Storage & State

Components:

  • DHT Storage (distributed)
  • Encrypted Storage (age encryption)
  • UCXL Decision Publisher
  • Hypercore Log (append-only)

Responsibilities:

  • Distributed data storage
  • Encryption and key management
  • Immutable decision recording
  • Event log persistence

See: DHT, UCXL

Layer 6: Security & Validation

Components:

  • License Validator (KACHING)
  • SHHH Sentinel (secrets detection)
  • Crypto Layer (encryption)
  • Security Policies

Responsibilities:

  • License enforcement
  • Secrets scanning and redaction
  • Cryptographic operations
  • Security policy enforcement

See: Crypto, SHHH, Licensing

Layer 7: Observability

Components:

  • Metrics Collector (CHORUS Metrics)
  • Health Checks (liveness, readiness)
  • BACKBEAT Integration (P2P telemetry)
  • Hypercore Log (coordination events)

Responsibilities:

  • System metrics collection
  • Health monitoring
  • P2P operation tracking
  • Event logging and audit trails

See: Metrics, Health

Layer 8: External Interfaces

Components:

  • HTTP API Server
  • UCXI Server (content resolution)
  • HAP Terminal Interface
  • HAP Web Interface [STUB]

Responsibilities:

  • REST API endpoints
  • UCXL content resolution
  • Human interaction interfaces
  • External system integration

See: API, UCXI, HAP UI


Key Components

Runtime Architecture

┌──────────────────────────────────────────────────────────────┐
│ main.go (cmd/agent or cmd/hap)                               │
│   │                                                            │
│   └─→ internal/runtime.Initialize()                           │
│          │                                                     │
│          ├─→ Config Loading (environment)                     │
│          ├─→ License Validation (KACHING)                     │
│          ├─→ AI Provider Setup (Ollama/ResetData)            │
│          ├─→ P2P Node Creation (libp2p)                       │
│          ├─→ PubSub Initialization                            │
│          ├─→ DHT Setup (optional)                             │
│          ├─→ Election Manager                                 │
│          ├─→ Task Coordinator                                 │
│          ├─→ HTTP API Server                                  │
│          ├─→ UCXI Server (optional)                           │
│          └─→ Health & Metrics                                 │
│                                                                │
│   SharedRuntime                                               │
│   ├── Context & Cancellation                                  │
│   ├── Logger (SimpleLogger)                                   │
│   ├── Config (*config.Config)                                 │
│   ├── RuntimeConfig (dynamic assignments)                     │
│   ├── P2P Node (*p2p.Node)                                    │
│   ├── PubSub (*pubsub.PubSub)                                │
│   ├── DHT (*dht.LibP2PDHT)                                    │
│   ├── Encrypted Storage (*dht.EncryptedDHTStorage)           │
│   ├── Election Manager (*election.ElectionManager)           │
│   ├── Task Coordinator (*coordinator.TaskCoordinator)        │
│   ├── HTTP Server (*api.HTTPServer)                           │
│   ├── UCXI Server (*ucxi.Server)                              │
│   ├── Health Manager (*health.Manager)                        │
│   ├── Metrics (*metrics.CHORUSMetrics)                        │
│   ├── SHHH Sentinel (*shhh.Sentinel)                          │
│   ├── BACKBEAT Integration (*backbeat.Integration)           │
│   └── Decision Publisher (*ucxl.DecisionPublisher)           │
└──────────────────────────────────────────────────────────────┘

Binary Separation

CHORUS provides three binaries with shared infrastructure:

Binary Purpose Mode Status
chorus-agent Autonomous AI agent Agent Mode Production
chorus-hap Human Agent Portal HAP Mode 🔶 Beta
chorus Compatibility wrapper N/A 🔴 Deprecated

All binaries share:

  • P2P infrastructure (libp2p, PubSub, DHT)
  • Election and coordination systems
  • Security and encryption layers
  • Configuration and licensing

Differences:

  • Agent: Automatic task execution, autonomous reasoning
  • HAP: Terminal/web UI for human interaction, manual task approval

See: Commands


Data Flow

Task Execution Flow

1. Task Request Arrives
   │
   ├─→ Via PubSub (from another agent)
   ├─→ Via HTTP API (from external system)
   └─→ Via HAP (from human operator)
   │
   ↓
2. Task Coordinator Receives Task
   │
   ├─→ Check agent availability
   ├─→ Validate task structure
   └─→ Assign to execution engine
   │
   ↓
3. Execution Engine Processes
   │
   ├─→ Detect language (Go, Rust, Python, etc.)
   ├─→ Select Docker image
   ├─→ Create sandbox configuration
   ├─→ Start container
   │   │
   │   ├─→ Mount /workspace/input (read-only source)
   │   ├─→ Mount /workspace/data (working directory)
   │   └─→ Mount /workspace/output (deliverables)
   │
   ├─→ Execute commands via Docker Exec API
   ├─→ Stream stdout/stderr
   ├─→ Monitor resource usage
   └─→ Capture exit codes
   │
   ↓
4. Result Processing
   │
   ├─→ Collect artifacts from /workspace/output
   ├─→ Generate task summary
   ├─→ Create UCXL decision record
   └─→ Publish to DHT (encrypted)
   │
   ↓
5. Result Distribution
   │
   ├─→ Broadcast completion via PubSub
   ├─→ Update task tracker (availability)
   ├─→ Notify requester (if HTTP API)
   └─→ Log to Hypercore (audit trail)

Decision Publishing Flow

Agent Decision Made
   │
   ↓
Generate UCXL Context Address
   │
   ├─→ Hash decision content (SHA-256)
   ├─→ Create ucxl:// URI
   └─→ Add metadata (agent ID, timestamp)
   │
   ↓
Encrypt Decision Data
   │
   ├─→ Use age encryption
   ├─→ Derive key from shared secret
   └─→ Create encrypted blob
   │
   ↓
Store in DHT
   │
   ├─→ Key: UCXL hash
   ├─→ Value: Encrypted decision
   └─→ TTL: Configured expiration
   │
   ↓
Announce on PubSub
   │
   ├─→ Topic: "chorus/decisions"
   ├─→ Payload: UCXL address only
   └─→ Interested peers can fetch from DHT

Election Flow

Agent Startup
   │
   ↓
Join Election Topic
   │
   ├─→ Subscribe to "chorus/election/v1"
   ├─→ Announce presence
   └─→ Share capabilities
   │
   ↓
Send Heartbeats
   │
   ├─→ Every 5 seconds
   ├─→ Include: Node ID, Uptime, Load
   └─→ Track other peers' heartbeats
   │
   ↓
Monitor Admin Status
   │
   ├─→ Track last admin heartbeat
   ├─→ Timeout: 15 seconds
   └─→ If timeout → Trigger election
   │
   ↓
Election Triggered
   │
   ├─→ All agents propose themselves
   ├─→ Vote for highest uptime
   ├─→ Consensus on winner
   └─→ Winner becomes admin
   │
   ↓
Admin Elected
   │
   ├─→ Winner assumes admin role
   ├─→ Applies admin configuration
   ├─→ Enables SLURP coordination
   └─→ Continues heartbeat at higher frequency

Deployment Models

Model 1: Local Development

┌─────────────────────────────────────────┐
│ Developer Laptop                         │
│                                          │
│  ┌──────────────┐  ┌──────────────┐    │
│  │ chorus-agent │  │ chorus-hap   │    │
│  │  (Alice)     │  │  (Human)     │    │
│  └──────┬───────┘  └──────┬───────┘    │
│         │                  │             │
│         └────────┬─────────┘             │
│                  │                       │
│           mDNS Discovery                 │
│           P2P Mesh (local)               │
│                                          │
│  Ollama: localhost:11434                │
│  Docker: /var/run/docker.sock           │
└─────────────────────────────────────────┘

Characteristics:

  • Single machine deployment
  • mDNS for peer discovery
  • Local Ollama instance
  • Shared Docker socket
  • No DHT required

Use Cases:

  • Local testing
  • Development workflows
  • Single-user tasks

Model 2: Docker Swarm Cluster

┌────────────────────────────────────────────────────────────┐
│ Docker Swarm Cluster                                        │
│                                                              │
│  Manager Node 1          Manager Node 2          Worker 1   │
│  ┌──────────────┐       ┌──────────────┐       ┌─────────┐ │
│  │ chorus-agent │←─────→│ chorus-agent │←─────→│ chorus  │ │
│  │ (Leader)     │       │ (Follower)   │       │ -agent  │ │
│  └──────────────┘       └──────────────┘       └─────────┘ │
│         ↑                       ↑                     ↑      │
│         │                       │                     │      │
│         └───────────────────────┴─────────────────────┘      │
│                     Docker Swarm Overlay Network             │
│                     P2P Mesh + DHT                           │
│                                                              │
│  Shared Services:                                           │
│  - Docker Registry (private)                                │
│  - Ollama Distributed (5 nodes)                             │
│  - NFS Storage (/rust)                                      │
│  - WHOOSH (assignment server)                               │
│  - KACHING (license server)                                 │
└────────────────────────────────────────────────────────────┘

Characteristics:

  • Multi-node cluster
  • DHT for global discovery
  • Bootstrap peers for network joining
  • Overlay networking
  • Shared storage via NFS
  • Centralized license validation

Use Cases:

  • Production deployments
  • Team collaboration
  • High availability
  • Scalable workloads

Model 3: Hybrid (Agent + HAP)

┌──────────────────────────────────────────────────────────┐
│ Production Environment                                    │
│                                                            │
│  Docker Swarm                    Developer Workstation    │
│  ┌──────────────┐               ┌──────────────┐         │
│  │ chorus-agent │               │ chorus-hap   │         │
│  │ (Alice)      │←─────P2P─────→│ (Human-Bob)  │         │
│  └──────┬───────┘               └──────────────┘         │
│         │                                                  │
│  ┌──────┴───────┐                                         │
│  │ chorus-agent │                                         │
│  │ (Carol)      │                                         │
│  └──────────────┘                                         │
│                                                            │
│  Autonomous agents run in swarm                           │
│  Human operator joins via HAP (local or remote)           │
│  Same P2P protocol, equal participants                    │
└──────────────────────────────────────────────────────────┘

Characteristics:

  • Autonomous agents in production
  • Human operators join as needed
  • Collaborative decision-making
  • HMMM meta-discussion
  • Humans can override or guide

Use Cases:

  • Supervised automation
  • Human-in-the-loop workflows
  • Critical decision points
  • Training and oversight

Getting Started

Core Systems

Security

Integration


Next Steps

For detailed information on specific components:

  1. New to CHORUS? Start with System Architecture
  2. Want to deploy? See Deployment Guide
  3. Developing features? Review Component Map
  4. Understanding execution? Read Task Execution Engine