12 KiB
What is CHORUS?
CHORUS is a comprehensive, distributed platform primarily designed for AI coordination and communication within its ecosystem. It functions as a semantic context publishing system that enables autonomous AI agents to securely share decisions, coordinate activities, and maintain consistent contexts through role-based encryption and consensus mechanisms. The platform supports real-time, selective sharing of both live and historical contextual data, facilitating operations such as task management, artifact publishing, and provenance tracking.
Originally established as a task coordination system using the bzzz:// protocol, CHORUS is evolving into a semantic context publishing platform that enhances AI collaboration by transforming task coordination into a distributed decision graph based on the UCXL protocol. It also addresses operational aspects like data storage, configuration management, and security, including license management and threat protection. Overall, CHORUS aims to enable decentralized AI workflows, autonomous agent collaboration, and robust information sharing across distributed systems.
Context OS responsibilities: CHORUS orchestrates selective, time-aware context exchange between peers (no repo mirroring), carrying machine-readable provenance and confidence. In effect, it acts as the runtime for UCXL-addressed knowledge and policy-aware routing.
References:
- [KG] Entity 1 (CHORUS) describes its role as a distributed AI coordination and semantic context publishing platform.
- [DC] "Phase 1 Integration Test Framework" document emphasizes its purpose in enabling secure, real-time sharing of decisions and contextual data.
- [KG] Its support for autonomous agents and role-based encryption highlights its focus on secure, decentralized AI collaboration.
- [KG] The transformation from a task coordination system to a semantic platform indicates its goal of enhancing operational synchronization.
- [DC] User Manual details the system's functions in managing AI decision-sharing and collaboration.
Current Implementation Snapshot (2025-10)
- WHOOSH-assignment runtime –
internal/runtime/shared.gonow bootstraps apkg/config.RuntimeConfigthat merges base env vars with dynamic assignments pulled from WHOOSH (ASSIGN_URL). Reloads are wired toSIGHUP, and WHOOSH-provided bootstrap peers or join staggers override baked-in defaults. See docs/Modules/WHOOSH.md for the rendezvous API that serves these payloads. - License-first startup & AI provider wiring – Startup blocks on
internal/licensingvalidation against KACHING before any P2P work. Once licensed,initializeAIProviderconfigures ResetData or Ollama providers, composes persona prompts, and enables LightRAG via the MCP client whenLightRAG.Enabledis set. This keeps reasoning, prompt curation, and RAG feeds consistent across containers. - Tempo-aware operations via BACKBEAT – The BACKBEAT integration (
internal/backbeat/integration.go) tracks beat cadence, wraps long-running peer operations (DHT bootstrap, peer discovery, elections), and emits health/status claims. Election scoring (pkg/election/election.go) consumes tempo and beat-gap data to throttle discovery storms and to respect stability windows. - Instrumented transport stack –
pkg/transport/quic_transport.golayers QUIC + optional TCP fallback with pooled streams, dial metrics, and configurable libp2p options. A transport telemetry reporter (internal/runtime/transport_telemetry.go) publishes snapshots to NATS (CHORUS_TRANSPORT_METRICS_NATS_URL) and surfaces per-transport counters throughpkg/metrics. - Encrypted context pipeline – When
V2.DHT.Enabledis true, CHORUS spins up a libp2p DHT with AGE-backed storage (pkg/dht) and routes decision artifacts throughucxl.DecisionPublisher, exposing them via the optional UCXI HTTP server. Council opportunities are bridged over QUIC streams and mirrored onto NATS to keep WHOOSH/SLURP in sync.
Operational Notes
- Centralised QUIC option injection – Noise security, muxers, and relay support are now exclusively configured inside
transport.NewQUICTransport, preventing the duplicate-provider panic that surfaced when both the transport and caller registeredlibp2p.Security(noise). Custom libp2p options should be appended via runtime overrides to avoid reintroducing duplicates.
How does CHORUS establish a peer-to-peer network?
CHORUS connects to itself to form a peer-to-peer (P2P) network through its native libp2p-based infrastructure, which manages peer discovery, connectivity, and message routing. The system employs protocols such as pubsub for messaging and mDNS or DHT for peer discovery, enabling nodes to automatically recognize and establish connections with each other without manual configuration.
Specifically, the architecture supports direct peer connections where nodes connect directly with each other, forming a full mesh or ring topology depending on deployment. The network topology is managed via mechanisms like bootstrap peers, which are static nodes used to initialize and support connections within the network. Nodes exchange capability broadcasts periodically to maintain an updated view of the network state.
Security in the connection process is enhanced through cryptographic verification methods like signature validation and signature verification to ensure trustworthiness among peers. Additionally, peer discovery protocols like mDNS (multicast DNS) facilitate neighborhood detection within local networks, while content and capabilities are exchanged securely. The infrastructure includes features for network resilience such as circuit breakers, rate limiting, and network health monitoring.
Overall, CHORUS forms its P2P network by leveraging libp2p’s suite of protocols for secure, decentralized peer discovery and connection management, which allows nodes within the network to connect directly and participate effectively in distributed AI coordination activities.
Institutional stance: libp2p + scoped sync means context sharing is intentional, logged, and revocable; bootstrap + capability broadcasts make the mesh resilient while preserving least-privilege flow.
References: [KG] "CHORUS P2P Network": Describes the decentralized communication layer involving peer discovery, pubsub messaging, and mutual peer management. [DC] "README.md": Details on network discovery, peer connections, and mesh network topology. [KG] "libp2p": Mentioned as the core protocol facilitating mesh networking, peer discovery via mDNS, and secure connections. [KG] "CHORUS P2P Mesh": Organization that supports distributed peer connections for chat and AI collaboration. [DC] "CHORUS-ARCHITECTURE.md": Explains the architecture including bootstrap peers and full mesh topology.
1.1.2.1 Key Functions
- Decentralized task coordination and management
- Secure, role-based sharing of contextual data
- Autonomous AI agent collaboration
- Content publishing of decisions and task updates
- Distributed data storage and retrieval
- Role-based encryption and security
- System configuration management
- License enforcement and resilience
- System deployment and maintenance
1.1.2.2 Modules of CHORUS
CHORUS employs a range of components and modules that form its comprehensive architecture. The core components include the Main Application (main.go), decision-related modules such as the Decision Publisher, Election Manager, and Configuration Manager, and infrastructure elements like the Crypto Module (supporting Age encryption and Shamir secret sharing), Distributed Hash Table (DHT) Storage, and the Peer-to-Peer (P2P) Network for peer discovery and pub/sub messaging.
Additionally, CHORUS features specialized modules such as the UCXL protocol for semantic address management, SLURP for context management and decision reasoning, and a set of components dedicated to content publishing, security, and operational diagnostics. It also includes components responsible for the layered architecture, such as API handling, web sockets, and management tools.
For hybrid or flexible deployment, there are mock components for extensive testing and real components intended for production use. These include mock and real implementations of the DHT backend, address resolution, peer discovery, network layer, and connectors (such as the CHORUS Connector and RUSTLE Hybrid Components). The architecture is designed to support role-based security, distributed storage, consensus elections, and semantic addressing to facilitate decentralized AI coordination.
References:
[KG] Entity "CHORUS" description in the JSON graph. [DC] "CHORUS System Architecture" details in "CHORUS-2B-ARCHITECTURE.md". [DC] Components listed under "System Components" in "CHORUS_N8N_IMPLEMENTATION_COMPLETE.md". [DC] Architectural diagrams in "CHORUS Architecture Documentation".
What models does CHORUS use?
The CHORUS platform supports various models for its AI agents, depending on their roles and tasks. Specifically, the system includes features such as model integration within role configurations and capabilities for autonomous AI agents. One known provider mentioned is Ollama, which supplies models used for meta-thinking and performance metrics within the system, indicating that Ollama models are utilized to support agents in reasoning and performance evaluation.
CHORUS can execute tasks using its built-in smolLM3 model from HuggingFace, claude code using sonnet or opus from AnthropicAI, Google's gemini-cli tool, or models available for Ollama.
Additionally, CHORUS’s architecture involves the integration of models for different purposes, such as development, testing, and deployment, including models in the context of the RUSTLE component for meta-thinking and model capability announcements. The platform emphasizes a flexible, role-based model deployment framework, enabling agents to self-allocate tasks based on their available tooling and model capabilities.
References:
- [KG] Entity "Ollama" describing model providers for system agents.
- [DC] Content from "USER_MANUAL.md" and "CHORUS System Architecture" describing role-based and capability-driven model usage.
What happens if my machines crash?
If a CHORUS agent machine crashes or goes offline, the system employs fault detection and failover mechanisms to maintain operational continuity. Heartbeat signals, which are periodic messages sent by agents to indicate they are active, play a key role in fault detection. When an agent fails to send heartbeats within a configured timeout period, the system recognizes this as a failure scenario.
In response, the system can trigger an automatic election to replace or promote another agent, utilizing algorithms such as Raft to select a new leader or maintain consensus. The system also supports automatic failover, where services migrate from the failed node to remaining healthy nodes (B & C) to ensure continuous operation. This process enhances reliability by enabling the system to recover quickly from node failures. Moreover, monitoring and health checks are in place to track system status and trigger recovery events, including replacing failed agents and re-establishing communication channels.
In some cases, recovery events are tracked by the health monitoring system for performance analysis, ensuring that overall system stability and reliability are preserved even during individual machine failures.
TODO
-
Integrate the Sequential Thinking MCP wrapper as a first-class AI provider so architect-level personas use the encrypted reasoning flow powered by
deploy/seqthink/mcp_server.py. -
Deliver the brief-processing/event loop described in the task execution monitoring plan so councils automatically hand work to the execution engine.
-
DHT (production): Ensure the runtime uses libp2p-kad DHT (not mocks), with 3–5x replication, provider records, and SLOs validated (success >99%, p95 GET <300ms).
-
UCXL + encryption: Validate leader-only write/read paths with AGE/Shamir; document key rotation and run end-to-end tests for encrypted artifacts.
-
SLURP as leader-only: Codify SLURP as privileged leader-only paths gated by elections; add leader assertions and tests.
-
SHHH as leader-only: Introduce secrets sentinel hooks in publish/log ingest; implement redact/deny rules and audit trails under leader control.
-
COOEE announcements: Surface capability/availability and enrolment APIs with schemas and tests; integrate with agentid flows.
-
Elections/consensus: Document algorithms and add multi-node tests for failover, recovery, and consistency.