CHORUS/docs/comprehensive/architecture/README.md

# CHORUS Architecture Overview

**System:** CHORUS - Container-First P2P Task Coordination
**Version:** 0.5.0-dev
**Architecture Type:** Distributed, Peer-to-Peer, Event-Driven

---

## Table of Contents

1. [System Overview](#system-overview)
2. [Core Principles](#core-principles)
3. [Architecture Layers](#architecture-layers)
4. [Key Components](#key-components)
5. [Data Flow](#data-flow)
6. [Deployment Models](#deployment-models)
7. [Related Documents](#related-documents)

---

## System Overview

CHORUS is a **distributed task coordination system** that enables both autonomous AI agents and human operators to collaborate on software development tasks through a peer-to-peer network. The system provides:

### Primary Capabilities

- **Autonomous Agent Execution**: AI agents that can execute code tasks in isolated Docker sandboxes
- **Human-Agent Collaboration**: Human Agent Portal (HAP) for human participation in agent networks
- **Distributed Coordination**: P2P mesh networking with democratic leader election
- **Context Addressing**: UCXL (Universal Context Addressing) for immutable decision tracking
- **Secure Execution**: Multi-layer sandboxing with Docker containers and security policies
- **Collaborative Reasoning**: HMMM protocol for meta-discussion and consensus building
- **Encrypted Storage**: DHT-based encrypted storage for sensitive data

### System Philosophy

CHORUS follows these key principles:

1. **Container-First**: All configuration via environment variables, no file-based config
2. **P2P by Default**: No central server; agents form democratic mesh networks
3. **Zero-Trust Security**: Every operation validated, credentials never stored in containers
4. **Immutable Decisions**: All agent decisions recorded in content-addressed storage
5. **Human-in-the-Loop**: Humans as first-class peers in the agent network

---

## Core Principles

### 1. Container-Native Architecture

```
┌─────────────────────────────────────────────────────────────┐
│ CHORUS Container                                             │
│                                                               │
│  Environment Variables  →  Runtime Configuration             │
│  Volume Mounts          →  Prompts & Secrets                 │
│  Network Policies       →  Zero-Egress by Default            │
│  Signal Handling        →  Dynamic Reconfiguration (SIGHUP)  │
└─────────────────────────────────────────────────────────────┘
```

**Key Features:**
- No config files inside containers
- All settings via environment variables
- Secrets injected via secure volumes
- Dynamic assignment loading from WHOOSH
- SIGHUP-triggered reconfiguration

### 2. Peer-to-Peer Mesh Network

```
        Agent-1 (Alice)
           /|\
          / | \
         /  |  \
        /   |   \
   Agent-2  |  Agent-4
    (Bob)   |   (Dave)
        \   |   /
         \  |  /
          \ | /
           \|/
        Agent-3 (Carol)

All agents are equal peers
No central coordinator
Democratic leader election
mDNS local discovery
DHT global discovery
```

### 3. Multi-Layer Security

```
Layer 1: License Validation (KACHING)
    ↓
Layer 2: P2P Encryption (libp2p TLS)
    ↓
Layer 3: DHT Encryption (age encryption)
    ↓
Layer 4: Docker Sandboxing (namespaces, cgroups)
    ↓
Layer 5: Network Isolation (zero-egress)
    ↓
Layer 6: SHHH Secrets Detection (scan & redact)
    ↓
Layer 7: UCXL Validation (immutable audit trail)
    ↓
Layer 8: Credential Mediation (agent uploads, not container)
```

---

## Architecture Layers

CHORUS is organized into distinct architectural layers:

### Layer 1: P2P Infrastructure

**Components:**
- libp2p Host (networking)
- mDNS Discovery (local peers)
- DHT (global peer discovery)
- PubSub (message broadcasting)

**Responsibilities:**
- Peer discovery and connection management
- Encrypted peer-to-peer communication
- Message routing and delivery
- Network resilience and failover

**See:** [P2P Infrastructure](../internal/p2p.md)

### Layer 2: Coordination & Consensus

**Components:**
- Election Manager (leader election)
- Task Coordinator (work distribution)
- HMMM Router (meta-discussion)
- SLURP (distributed orchestration)

**Responsibilities:**
- Democratic leader election
- Task assignment and tracking
- Collaborative reasoning protocols
- Work distribution algorithms

**See:** [Coordination](../packages/coordination.md), [SLURP](../packages/slurp/README.md)

### Layer 3: Execution Engine

**Components:**
- Task Execution Engine
- Docker Sandbox
- Image Selector
- Command Executor

**Responsibilities:**
- Isolated code execution in Docker containers
- Language-specific environment selection
- Resource limits and monitoring
- Result capture and validation

**See:** [Execution Engine](../packages/execution.md), [Task Execution Engine Module](../../Modules/TaskExecutionEngine.md)

### Layer 4: AI Integration

**Components:**
- AI Provider Interface
- Provider Implementations (Ollama, ResetData)
- Model Selection Logic
- Prompt Management

**Responsibilities:**
- Abstract AI provider differences
- Route requests to appropriate models
- Manage system prompts and context
- Handle AI provider failover

**See:** [AI Providers](../packages/ai.md), [Providers](../packages/providers.md)

### Layer 5: Storage & State

**Components:**
- DHT Storage (distributed)
- Encrypted Storage (age encryption)
- UCXL Decision Publisher
- Hypercore Log (append-only)

**Responsibilities:**
- Distributed data storage
- Encryption and key management
- Immutable decision recording
- Event log persistence

**See:** [DHT](../packages/dht.md), [UCXL](../packages/ucxl.md)

### Layer 6: Security & Validation

**Components:**
- License Validator (KACHING)
- SHHH Sentinel (secrets detection)
- Crypto Layer (encryption)
- Security Policies

**Responsibilities:**
- License enforcement
- Secrets scanning and redaction
- Cryptographic operations
- Security policy enforcement

**See:** [Crypto](../packages/crypto.md), [SHHH](../packages/shhh.md), [Licensing](../internal/licensing.md)

### Layer 7: Observability

**Components:**
- Metrics Collector (CHORUS Metrics)
- Health Checks (liveness, readiness)
- BACKBEAT Integration (P2P telemetry)
- Hypercore Log (coordination events)

**Responsibilities:**
- System metrics collection
- Health monitoring
- P2P operation tracking
- Event logging and audit trails

**See:** [Metrics](../packages/metrics.md), [Health](../packages/health.md)

### Layer 8: External Interfaces

**Components:**
- HTTP API Server
- UCXI Server (content resolution)
- HAP Terminal Interface
- HAP Web Interface [STUB]

**Responsibilities:**
- REST API endpoints
- UCXL content resolution
- Human interaction interfaces
- External system integration

**See:** [API](../api/README.md), [UCXI](../packages/ucxi.md), [HAP UI](../internal/hapui.md)

---

## Key Components

### Runtime Architecture

```
┌──────────────────────────────────────────────────────────────┐
│ main.go (cmd/agent or cmd/hap)                               │
│   │                                                            │
│   └─→ internal/runtime.Initialize()                           │
│          │                                                     │
│          ├─→ Config Loading (environment)                     │
│          ├─→ License Validation (KACHING)                     │
│          ├─→ AI Provider Setup (Ollama/ResetData)            │
│          ├─→ P2P Node Creation (libp2p)                       │
│          ├─→ PubSub Initialization                            │
│          ├─→ DHT Setup (optional)                             │
│          ├─→ Election Manager                                 │
│          ├─→ Task Coordinator                                 │
│          ├─→ HTTP API Server                                  │
│          ├─→ UCXI Server (optional)                           │
│          └─→ Health & Metrics                                 │
│                                                                │
│   SharedRuntime                                               │
│   ├── Context & Cancellation                                  │
│   ├── Logger (SimpleLogger)                                   │
│   ├── Config (*config.Config)                                 │
│   ├── RuntimeConfig (dynamic assignments)                     │
│   ├── P2P Node (*p2p.Node)                                    │
│   ├── PubSub (*pubsub.PubSub)                                │
│   ├── DHT (*dht.LibP2PDHT)                                    │
│   ├── Encrypted Storage (*dht.EncryptedDHTStorage)           │
│   ├── Election Manager (*election.ElectionManager)           │
│   ├── Task Coordinator (*coordinator.TaskCoordinator)        │
│   ├── HTTP Server (*api.HTTPServer)                           │
│   ├── UCXI Server (*ucxi.Server)                              │
│   ├── Health Manager (*health.Manager)                        │
│   ├── Metrics (*metrics.CHORUSMetrics)                        │
│   ├── SHHH Sentinel (*shhh.Sentinel)                          │
│   ├── BACKBEAT Integration (*backbeat.Integration)           │
│   └── Decision Publisher (*ucxl.DecisionPublisher)           │
└──────────────────────────────────────────────────────────────┘
```

### Binary Separation

CHORUS provides three binaries with shared infrastructure:

| Binary | Purpose | Mode | Status |
|--------|---------|------|--------|
| **chorus-agent** | Autonomous AI agent | Agent Mode | ✅ Production |
| **chorus-hap** | Human Agent Portal | HAP Mode | 🔶 Beta |
| **chorus** | Compatibility wrapper | N/A | 🔴 Deprecated |

All binaries share:
- P2P infrastructure (libp2p, PubSub, DHT)
- Election and coordination systems
- Security and encryption layers
- Configuration and licensing

Differences:
- **Agent**: Automatic task execution, autonomous reasoning
- **HAP**: Terminal/web UI for human interaction, manual task approval

**See:** [Commands](../commands/README.md)

---

## Data Flow

### Task Execution Flow

```
1. Task Request Arrives
   │
   ├─→ Via PubSub (from another agent)
   ├─→ Via HTTP API (from external system)
   └─→ Via HAP (from human operator)
   │
   ↓
2. Task Coordinator Receives Task
   │
   ├─→ Check agent availability
   ├─→ Validate task structure
   └─→ Assign to execution engine
   │
   ↓
3. Execution Engine Processes
   │
   ├─→ Detect language (Go, Rust, Python, etc.)
   ├─→ Select Docker image
   ├─→ Create sandbox configuration
   ├─→ Start container
   │   │
   │   ├─→ Mount /workspace/input (read-only source)
   │   ├─→ Mount /workspace/data (working directory)
   │   └─→ Mount /workspace/output (deliverables)
   │
   ├─→ Execute commands via Docker Exec API
   ├─→ Stream stdout/stderr
   ├─→ Monitor resource usage
   └─→ Capture exit codes
   │
   ↓
4. Result Processing
   │
   ├─→ Collect artifacts from /workspace/output
   ├─→ Generate task summary
   ├─→ Create UCXL decision record
   └─→ Publish to DHT (encrypted)
   │
   ↓
5. Result Distribution
   │
   ├─→ Broadcast completion via PubSub
   ├─→ Update task tracker (availability)
   ├─→ Notify requester (if HTTP API)
   └─→ Log to Hypercore (audit trail)
```

### Decision Publishing Flow

```
Agent Decision Made
   │
   ↓
Generate UCXL Context Address
   │
   ├─→ Hash decision content (SHA-256)
   ├─→ Create ucxl:// URI
   └─→ Add metadata (agent ID, timestamp)
   │
   ↓
Encrypt Decision Data
   │
   ├─→ Use age encryption
   ├─→ Derive key from shared secret
   └─→ Create encrypted blob
   │
   ↓
Store in DHT
   │
   ├─→ Key: UCXL hash
   ├─→ Value: Encrypted decision
   └─→ TTL: Configured expiration
   │
   ↓
Announce on PubSub
   │
   ├─→ Topic: "chorus/decisions"
   ├─→ Payload: UCXL address only
   └─→ Interested peers can fetch from DHT
```

### Election Flow

```
Agent Startup
   │
   ↓
Join Election Topic
   │
   ├─→ Subscribe to "chorus/election/v1"
   ├─→ Announce presence
   └─→ Share capabilities
   │
   ↓
Send Heartbeats
   │
   ├─→ Every 5 seconds
   ├─→ Include: Node ID, Uptime, Load
   └─→ Track other peers' heartbeats
   │
   ↓
Monitor Admin Status
   │
   ├─→ Track last admin heartbeat
   ├─→ Timeout: 15 seconds
   └─→ If timeout → Trigger election
   │
   ↓
Election Triggered
   │
   ├─→ All agents propose themselves
   ├─→ Vote for highest uptime
   ├─→ Consensus on winner
   └─→ Winner becomes admin
   │
   ↓
Admin Elected
   │
   ├─→ Winner assumes admin role
   ├─→ Applies admin configuration
   ├─→ Enables SLURP coordination
   └─→ Continues heartbeat at higher frequency
```

---

## Deployment Models

### Model 1: Local Development

```
┌─────────────────────────────────────────┐
│ Developer Laptop                         │
│                                          │
│  ┌──────────────┐  ┌──────────────┐    │
│  │ chorus-agent │  │ chorus-hap   │    │
│  │  (Alice)     │  │  (Human)     │    │
│  └──────┬───────┘  └──────┬───────┘    │
│         │                  │             │
│         └────────┬─────────┘             │
│                  │                       │
│           mDNS Discovery                 │
│           P2P Mesh (local)               │
│                                          │
│  Ollama: localhost:11434                │
│  Docker: /var/run/docker.sock           │
└─────────────────────────────────────────┘
```

**Characteristics:**
- Single machine deployment
- mDNS for peer discovery
- Local Ollama instance
- Shared Docker socket
- No DHT required

**Use Cases:**
- Local testing
- Development workflows
- Single-user tasks

### Model 2: Docker Swarm Cluster

```
┌────────────────────────────────────────────────────────────┐
│ Docker Swarm Cluster                                        │
│                                                              │
│  Manager Node 1          Manager Node 2          Worker 1   │
│  ┌──────────────┐       ┌──────────────┐       ┌─────────┐ │
│  │ chorus-agent │←─────→│ chorus-agent │←─────→│ chorus  │ │
│  │ (Leader)     │       │ (Follower)   │       │ -agent  │ │
│  └──────────────┘       └──────────────┘       └─────────┘ │
│         ↑                       ↑                     ↑      │
│         │                       │                     │      │
│         └───────────────────────┴─────────────────────┘      │
│                     Docker Swarm Overlay Network             │
│                     P2P Mesh + DHT                           │
│                                                              │
│  Shared Services:                                           │
│  - Docker Registry (private)                                │
│  - Ollama Distributed (5 nodes)                             │
│  - NFS Storage (/rust)                                      │
│  - WHOOSH (assignment server)                               │
│  - KACHING (license server)                                 │
└────────────────────────────────────────────────────────────┘
```

**Characteristics:**
- Multi-node cluster
- DHT for global discovery
- Bootstrap peers for network joining
- Overlay networking
- Shared storage via NFS
- Centralized license validation

**Use Cases:**
- Production deployments
- Team collaboration
- High availability
- Scalable workloads

### Model 3: Hybrid (Agent + HAP)

```
┌──────────────────────────────────────────────────────────┐
│ Production Environment                                    │
│                                                            │
│  Docker Swarm                    Developer Workstation    │
│  ┌──────────────┐               ┌──────────────┐         │
│  │ chorus-agent │               │ chorus-hap   │         │
│  │ (Alice)      │←─────P2P─────→│ (Human-Bob)  │         │
│  └──────┬───────┘               └──────────────┘         │
│         │                                                  │
│  ┌──────┴───────┐                                         │
│  │ chorus-agent │                                         │
│  │ (Carol)      │                                         │
│  └──────────────┘                                         │
│                                                            │
│  Autonomous agents run in swarm                           │
│  Human operator joins via HAP (local or remote)           │
│  Same P2P protocol, equal participants                    │
└──────────────────────────────────────────────────────────┘
```

**Characteristics:**
- Autonomous agents in production
- Human operators join as needed
- Collaborative decision-making
- HMMM meta-discussion
- Humans can override or guide

**Use Cases:**
- Supervised automation
- Human-in-the-loop workflows
- Critical decision points
- Training and oversight

---

## Related Documents

### Getting Started
- [Commands Overview](../commands/README.md) - Entry points and CLI tools
- [Deployment Guide](../deployment/README.md) - How to deploy CHORUS
- [Configuration](../deployment/configuration.md) - Environment variables and settings

### Core Systems
- [Task Execution Engine](../../Modules/TaskExecutionEngine.md) - Complete execution engine documentation
- [P2P Infrastructure](../internal/p2p.md) - libp2p networking details
- [SLURP System](../packages/slurp/README.md) - Distributed coordination

### Security
- [Security Architecture](security.md) - Security layers and threat model
- [Crypto Package](../packages/crypto.md) - Encryption and key management
- [SHHH](../packages/shhh.md) - Secrets detection and redaction
- [Licensing](../internal/licensing.md) - License validation

### Integration
- [API Reference](../api/reference.md) - HTTP API endpoints
- [UCXL System](../packages/ucxl.md) - Context addressing
- [AI Providers](../packages/ai.md) - AI integration

---

## Next Steps

For detailed information on specific components:
1. **New to CHORUS?** Start with [System Architecture](system-architecture.md)
2. **Want to deploy?** See [Deployment Guide](../deployment/README.md)
3. **Developing features?** Review [Component Map](component-map.md)
4. **Understanding execution?** Read [Task Execution Engine](../../Modules/TaskExecutionEngine.md)