Files

anthonyrawlins 68a489b64d Initial commit: Fresh implementation of CHORUS architecture (ResetData Mandate)

2026-03-03 13:38:56 +11:00

52 KiB

Raw Permalink Blame History

Legacy module reference (WHOOSH-era architecture).
Current coordinator implementation is SWOOSH in /home/tony/chorus/SWOOSH.
This file is retained for historical context and migration lineage.

WHOOSH: Autonomous AI Development Teams Architecture

Executive Summary

WHOOSH is evolving from project kickoff council formation to self-organizing AI development teams that mirror human collaboration patterns. Currently implemented as a Council Formation Engine, WHOOSH automatically detects new project Design Briefs and assembles specialized councils of CHORUS agents to handle project kickoffs. This foundation enables future expansion to autonomous teams that collaborate through P2P channels, reach consensus on solutions, and submit high-quality deliverables to SLURP.

Current Implementation: WHOOSH monitors Gitea repositories for "Design Brief" issues labeled chorus-entrypoint, then intelligently composes kickoff councils using role definitions from human-roles.yaml. CHORUS agents are deployed via Docker Swarm to collaborate on project initialization, producing kickoff artifacts that define project direction and requirements.

Future Vision: Extend beyond project kickoffs to ongoing team management where CHORUS agents autonomously join teams based on capabilities, collaborate democratically through HMMM protocol, and deliver solutions without central orchestration points of failure.

Current Implementation Snapshot (2025-10)

Bootstrap rendezvous & topology – WHOOSH now exposes /api/v1/bootstrap-peers and /api/v1/topology (see internal/server/bootstrap.go) powered by a combined Swarm + QUIC discovery layer (internal/p2p/discovery.go, internal/p2p/quic_client.go). Returned entries include transport preference, certificate hash, and prioritised multiaddrs so CHORUS containers can form meshes without hard-coded peers.
Assignment broker for CHORUS replicas – internal/orchestrator/assignment_broker.go maintains template-driven runtime assignments (role, model, bootstrap peers, join stagger) and serves them via /api/v1/assignments. CHORUS’s runtime (pkg/config.RuntimeConfig) consumes the broker to merge WHOOSH-defined overrides into live container config.
Backbeat-integrated orchestration – A dedicated BACKBEAT client (internal/backbeat/integration.go) tracks beat cadence, reports search/analysis operations, and emits WHOOSH health claims. Tempo hints feed the scaling controller so wave launches respect cluster rhythm.
Wave-based scaling & health gates – The orchestrator stack (internal/orchestrator/*.go) coordinates Docker Swarm deployment, health gating (KACHING, BACKBEAT, self), bootstrap pool management, and metrics exports. Scaling decisions are captured through ScalingMetricsCollector and surfaced by the ScalingAPI.
Spec-Kit enterprise plugin – The spec-kit HTTP client (internal/composer/spec_kit_client.go) wraps the external Spec-Kit service with retries, circuit-breaker toggles, and structured artifact ingestion. Outputs are normalised into council artefacts so SLURP/BUBBLE can ingest enterprise deliverables alongside community workflows.

Architecture Overview

Current Implementation: Council Formation Engine

┌─────────────────────────────────────────────────────────────────┐
│                 WHOOSH COUNCIL FORMATION                        │
│          (Issue Detection + Council Composition)                │
└─────────────────────┬───────────────────────────────────────────┘
                      │ detects Design Brief issues
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                    GITEA MONITORING                             │
│         (chorus-entrypoint Labels + Webhook Triggers)           │
└─────────────────────┬───────────────────────────────────────────┘
                      │ triggers council deployment
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                 CHORUS AGENT DEPLOYMENT                         │
│            (Docker Swarm + human-roles.yaml)                   │
└─────────────────────┬───────────────────────────────────────────┘
                      │ councils collaborate via P2P
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│               COUNCIL P2P COLLABORATION                         │
│              (HMMM Protocol + UCXL Addressing)                 │
└─────────────────────┬───────────────────────────────────────────┘
                      │ produces kickoff artifacts
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                PROJECT KICKOFF DELIVERABLES                     │
│        (Manifests, DRs, Scaffold Plans, Gate Tests)            │
└─────────────────────────────────────────────────────────────────┘

Future Vision: Autonomous Team Architecture

┌─────────────────────────────────────────────────────────────────┐
│              WHOOSH TEAM COMPOSER (Phase 2)                     │
│              (LLM-Powered Team Formation)                       │
└─────────────────────┬───────────────────────────────────────────┘
                      │ extends council formation to ongoing teams
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                    GITEA TEAM MANAGEMENT                        │
│           (Team Issues + Role Assignments)                      │
└─────────────────────┬───────────────────────────────────────────┘
                      │ agents monitor & self-assign
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                 AUTONOMOUS CHORUS AGENTS                        │
│              (Self-Aware Capability Matching)                   │
└─────────────────────┬───────────────────────────────────────────┘
                      │ join team collaboration channels
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                P2P TEAM COLLABORATION                           │
│              (Dedicated Team Channels)                          │
└─────────────────────┬───────────────────────────────────────────┘
                      │ consensus-driven completion
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│                   SLURP INTEGRATION                             │
│            (Artifact Submission + Context)                      │
└─────────────────────────────────────────────────────────────────┘

WHOOSH Team Composer

Purpose

Uses LLM reasoning to analyze incoming tasks and determine optimal team compositions based on:

Task complexity and scope
Required skill domains
Estimated effort and timeline
Quality requirements

Team Composition Logic

Example Task Analysis:

Task: "Implement secure user authentication system with OAuth2 integration"

LLM Analysis:
- Complexity: High
- Domains: Security, Backend API, Frontend UI, Database, Testing
- Estimated Timeline: 2-3 days
- Quality Requirements: High (security-critical)

Recommended Team Composition:
├── Security Architect (1x)
│   ├── Role: Define security requirements and review implementation
│   ├── Skills: OAuth2, JWT, encryption, security best practices
│   └── AI Model: deepseek-coder-v2 (security focus)
├── Backend Developer (1x)
│   ├── Role: Implement API endpoints and authentication logic
│   ├── Skills: REST APIs, database integration, middleware
│   └── AI Model: qwen2.5-coder:32b (backend specialization)
├── Frontend Developer (1x)
│   ├── Role: Build login UI and authentication flows
│   ├── Skills: React/Vue, state management, form validation
│   └── AI Model: starcoder2:15b (frontend focus)
├── Database Engineer (1x)
│   ├── Role: Design user tables and session management
│   ├── Skills: SQL, database design, performance optimization
│   └── AI Model: granite3-dense:8b (data modeling)
└── QA Engineer (1x)
    ├── Role: Security testing and integration validation
    ├── Skills: Testing frameworks, security scanning, automation
    └── AI Model: phi4:14b (testing focus)

Team Templates

Pre-defined team configurations for common scenarios:

Feature Development Team
- Backend Developer + Frontend Developer + QA Engineer
Bug Fix Team
- Debugger + Code Reviewer
Architecture Design Team
- System Architect + Security Architect + Performance Engineer
Documentation Team
- Technical Writer + Code Reviewer + Subject Matter Expert
Refactoring Team
- Code Analyzer + Refactoring Specialist + QA Engineer

Scaling and Resilience

WHOOSH is designed for robust, automated scaling and resilience. The system is governed by a set of Service Level Objectives (SLOs) and includes automated testing, failure drills, and operational guardrails to ensure stability under pressure.

Golden Signals & SLOs

The following SLOs are defined and monitored, with alerts triggered upon breach:

KACHING: p95 lease issuance < 250 ms; error/429 rate < 1%.
Join success: ≥ 95% of new replicas join mesh within 30 s of container Ready.
BackBeat (JetStream): per-subject consumer lag < 200 msgs; publish acks < 100 ms.
Election stability: ≤ 1 leader change / 10 min per cluster during steady state.
Swarm: task start success ≥ 99%; median start→assigned<10 s.

Metrics exposed to monitor these SLOs include:

chorus_license_lease_latency_ms, chorus_license_breaker_open_total
chorus_bootstrap_join_duration_ms, chorus_join_success_total{result=...}
backbeat_stream_lag, backbeat_ack_latency_ms
whoosh_wave_size, whoosh_wave_backoff_ms, whoosh_gate_block_seconds
chorus_election_changes_total

Synthetic Scale Test

A nightly, repeatable synthetic scale test is scripted to validate scaling behavior:

Scale from N=3 → 3 + 12 in waves, following the WHOOSH policy.
Hold for 5 minutes under synthetic load (CHORUS pulls a fixed assignment that triggers normal P2P + small workload).
Scale back to 3.

Pass/Fail Criteria:

No gate held > 2 min.
No breaker > 60 s open.
Join p95 < 25 s.
0% orphan tasks after scale-down.

Canary Config Reload

To prove the runtime assignment merge works without restarts:

A prompt/model change is triggered via WHOOSH for a 10% canary (by assignment).
Success/error rates and JetStream lag are watched for 2 minutes.
The change is then rolled out to 100%.

Failure Drills

Cheap chaos engineering drills are run to test resilience:

KACHING brownout: Inject 500 ms latency + 5% 429s for 2 minutes. Expect grace-window starts, brief breaker openings, and no P2P join until the lease is acquired. The system should auto-heal and join within 30s after recovery.
Bootstrap peer loss: Take 50% of the pool out mid-wave. Expect WHOOSH to pause the next wave, with existing joiners still succeeding via a subset. The pool health should recover before continuing.
BackBeat clog: Cap consumer read to simulate lag > threshold. Expect the WHOOSH gate to block scaling, while replicas continue local work without dropping messages.
Leader eviction: Kill the current leader. Verify the stability window prevents thrashing and a new leader is elected in < 5s.

Ops Guardrails

Admission control in WHOOSH (implied by gates): Hard cap on "max replicas added / 5 min".
Per-node placement: Use Swarm labels so waves don’t pile onto one box (e.g., placement.max_replicas_per_node: 1 for critical roles).
Quarantine mode in CHORUS: When a license fails after grace or bootstrap joins time out, expose /health?quarantine=1 and refuse task intake until cleared.

Rollback & Recovery Runbook

Parameter rollback: WHOOSH re-points ASSIGNMENT_EPOCH to the last-good configuration, and POST /v1/reload is sent to all replicas.
Scale rollback: target = previous_replicas. WHOOSH drains the newest assignments first. Requires join_success ≥ 90% before any further changes.
KACHING outage: Flip the cluster to cached-lease only for up to 10 minutes, block new waves, and page the on-call team.
Bootstrap meltdown: Promote 3 healthy workers to temporary bootstrap (via label + static list), then resume.

WHOOSH Autoscale Policy (Example)

The scaler configuration is kept out of the code in a YAML file:

cluster: prod
service: chorus
wave:
  max_per_wave: 8
  min_per_wave: 3
  period_sec: 25
  placement:
    max_replicas_per_node: 1
gates:
  kaching:
    p95_latency_ms: 250
    max_error_rate: 0.01
  backbeat:
    max_stream_lag: 200
  bootstrap:
    min_healthy_peers: 3
  join:
    min_success_rate: 0.80
backoff:
  initial_ms: 15000
  factor: 2.0
  jitter: 0.2
  max_ms: 120000
quarantine:
  enable: true
  exit_on: "kaching_ok && bootstrap_ok"
canary:
  fraction: 0.1
  promote_after_sec: 120

CHORUS Agent Self-Organization

Agent Self-Awareness

Each CHORUS agent maintains awareness of:

Primary Capabilities: Core skills and specializations
Secondary Capabilities: Additional skills they can contribute
Current Load: Active team memberships and availability
Performance History: Success rates and peer feedback
Preferred AI Models: Best-performing models for their tasks

TODO

Team Composer API: Implement llama3.1-based team analysis as a Dockerized service (task analysis → capability mapping → team proposals) with unit/integration tests and metrics.
SLURP integration: Add endpoints for curated bundle ingest/retrieval and document contracts/auth; validate E2E with BUBBLE/DHT.
CHORUS connectivity: Enable and validate live consensus/task flows (configure chorus_endpoints) with health checks and error handling; remove reliance on mocked data.
Replace mocked UI test routes with real backend calls for agent lifecycle and health checks.
Hardware-driven model selection: Add agent-side hardware discovery to drive model selection; avoid hardcoded cluster IPs or model names in configs.
MCP integration later: Keep MCP optional; maintain clean API boundary for now.
Collaboration Style: Team role preferences

Autonomous Team Joining Process

Monitoring Phase
- Agents continuously monitor GITEA for team formation issues
- Filter by matching capabilities and availability
- Assess team composition gaps they could fill

Self-Assessment Phase

Agent Self-Evaluation:
- "This team needs a frontend developer"
- "I have React/TypeScript skills (confidence: 85%)"
- "My current load: 1 active team (capacity available)"
- "Team timeline: 3 days (fits my schedule)"
- "Decision: JOIN TEAM"

Team Application Phase
- Agent comments on GITEA issue with capability summary
- Provides availability window and estimated contribution
- Existing team members can review and approve/decline
Integration Phase
- Agent joins P2P team channel
- Introduces capabilities and proposes initial approach
- Begins collaborative work with team

Capability Matching Algorithm

def assess_team_fit(agent, team_requirement):
    skill_match = calculate_skill_overlap(agent.capabilities, team_requirement.skills)
    availability_match = check_schedule_compatibility(agent.schedule, team_requirement.timeline)
    team_chemistry = assess_collaboration_history(agent, team_requirement.existing_members)
    
    fit_score = (skill_match * 0.5) + (availability_match * 0.3) + (team_chemistry * 0.2)
    return fit_score

GITEA Team Management

Team Issue Structure

Each team is represented by a GITEA issue with structured metadata:

Title: "Team Formation: Secure Authentication System Implementation"

Labels:
- team:auth-system-v2
- complexity:high  
- timeline:3-days
- domain:security
- domain:backend
- domain:frontend

Team Composition:
- [ ] Security Architect (required)
- [x] Backend Developer (@agent-backend-specialist)
- [ ] Frontend Developer (required)
- [ ] QA Engineer (required)
- [ ] Code Reviewer (optional)

Timeline: 2024-08-15 to 2024-08-18
P2P Channel: team-auth-system-v2-channel
SLURP Address: ucxl://teams/auth-system-v2/artifacts

Role Status Management

Open: Role available for assignment
Applied: Agent has expressed interest
Assigned: Agent confirmed for role
Active: Agent currently working
Completed: Role deliverables finished
Blocked: Role waiting on dependencies

Progress Tracking

Teams update GITEA issue with:

Daily progress summaries
Milestone achievements
Blocker identification
Resource requests
Quality gate completions

P2P Team Collaboration Channels

HMMM in the loop

Reasoning channels, not just chat. Team channels carry structured thought (HMMM) as well as messages: intermediate chains, critiques, and mini-memos are timestamped, attributed, and ingested by SLURP for later DRs. This enables consensus with evidence, not vibes.

Channel Architecture

Each team gets dedicated communication infrastructure:

Team Channel: team-auth-system-v2-channel
├── Topic Streams:
│   ├── #planning (initial design discussions)
│   ├── #implementation (development coordination)  
│   ├── #review (code/design reviews)
│   ├── #testing (QA coordination)
│   └── #integration (final assembly)
├── File Sharing: Distributed artifact storage
├── Screen Sharing: Real-time collaboration sessions
└── Voice Channels: Synchronous discussion capability


### Context Preservation
All team communications are automatically:
- Timestamped and attributed to agents
- Categorized by topic stream
- Indexed for searchability
- Ingested by SLURP into Hypercore distributed log

## Consensus Mechanisms

> For quorum rules, vote semantics (green/yellow/red), tempo (beats), and the front‑of‑house review/delivery API contracts, see the WHOOSH Review & Consensus Policy: [../Policy/WHOOSH-Review-Policy.md](../Policy/WHOOSH-Review-Policy.md).

### Democratic Decision Making

Refer to the Review Policy for project‑configurable defaults and API shapes: [../Policy/WHOOSH-Review-Policy.md](../Policy/WHOOSH-Review-Policy.md).

**1. Voting Systems**
- **Simple Majority**: Basic feature decisions
- **Supermajority (2/3)**: Architecture changes
- **Unanimous**: Security-critical decisions
- **Technical Lead Override**: Deadlock resolution

**2. Quality Gates**
Before task completion, teams must achieve consensus on:
- **Functional Requirements**: All specified features implemented
- **Quality Standards**: Code review, testing, documentation complete
- **Security Review**: Security-sensitive changes approved by security role
- **Performance Benchmarks**: Performance requirements met
- **Integration Testing**: End-to-end functionality verified

**3. Completion Criteria**
```yaml
Completion Checklist:
- [ ] All assigned roles have marked deliverables complete
- [ ] Peer reviews completed by at least 2 team members
- [ ] Automated tests passing (unit + integration)
- [ ] Security review approved (if applicable)
- [ ] Documentation updated
- [ ] Team consensus vote: "Ready for submission" (majority required)

Conflict Resolution

1. Technical Disagreements

Structured debate with evidence presentation
Prototype/spike development for comparison
Expert agent consultation by posting to ...
Escalation to WHOOSH Admin User (human) for tie-breaking

2. Resource Conflicts

Workload re-balancing among team members
Additional agent recruitment if needed
Scope reduction with consensus approval and Issue lodgement

3. Quality Disputes

Independent review by WHOOSH Admin User (human)
Automated quality metric evaluation
Compromise solution development
Innovation agent inclusion to team

CHORUS Integration

UCXL-based Messaging Address Structure**

eg. For the following address:

ucxl://any:role@project:task/#/

@project:task is the Team ID.

This means any inter-agent discussions published to @project:task are seen by those CHORUS team members.

We use the [antennae protocol] (for libp2p) to pub / sub messaging between agents by sending the reasoning component to the other members.

So a communications log might look like this...

publish to chat room @website:architecture-design

PeerID = D0019:senior-software-architect

{
	"channel": "**@website:architecture-design**",
	"from-agentid": "**D0019:senior-software-architect**",
	"reponding-to": "None", 
	"thoughts": "
		<thinking>
		...
		</thinking>"
}

So as noted in our system prompts to every agent, between each step we gather any thoughts of our peers.

GET from API endpoint /api/v1/antennae/@website:architecture-design**

Implementation Phases

Phase 1: Foundation (WHOOSH Team Composer)

LLM-powered task analysis service
Team composition templates and logic
GITEA issue creation with team metadata
Basic team formation workflows

Phase 2: Agent Enhancement (CHORUS Self-Organization)

Agent capability self-assessment systems
GITEA monitoring and team application logic
Autonomous team joining decision algorithms
Agent-to-agent communication protocols

Phase 3: Collaboration Infrastructure (P2P Channels)

Team-specific communication channel creation
Message routing and topic organization
Real-time collaboration tools integration
Communication archival for SLURP submission

Phase 4: Consensus Systems (Democratic Decision Making)

Voting mechanisms and quorum rules
Quality gate automation and verification
Conflict resolution procedures
Completion criteria validation

Phase 5: Integration (SLURP Connectivity)

Artifact packaging and submission workflows
UCXL address management and organization
Context preservation and knowledge extraction
Performance analytics and optimization

Benefits & Considerations

Key Benefits

✅ Fault Tolerance: No single points of failure - teams operate independently ✅ Scalability: Teams form and dissolve dynamically based on demand ✅ Quality: Consensus-driven decisions improve deliverable quality ✅ Knowledge Preservation: Full context captured for future learning ✅ Natural Collaboration: Mirrors effective human team patterns ✅ Autonomous Operation: Minimal human intervention required ✅ Adaptive: Teams adjust composition based on task evolution ✅ Observable: Full transparency through GITEA and P2P channels

Considerations & Challenges

⚠️ Initial Complexity: Sophisticated system requiring careful implementation ⚠️ Coordination Overhead: Team formation and consensus processes take time ⚠️ Agent Training: CHORUS agents need enhanced self-awareness capabilities ⚠️ Network Dependencies: P2P channels require reliable connectivity ⚠️ Quality Variance: Team effectiveness may vary based on composition ⚠️ Resource Competition: Popular agents may become bottlenecks ⚠️ Conflict Resolution: Complex disputes may require escalation mechanisms

Success Metrics

Team Formation Efficiency:

Time from task request to team formation
Percentage of teams that form successfully
Quality of initial team composition decisions

Collaboration Effectiveness:

Team productivity metrics (velocity, quality)
Communication frequency and engagement
Consensus achievement rates

Deliverable Quality:

Automated quality metrics (test coverage, security scores)
Peer review feedback scores
Stakeholder satisfaction ratings

System Resilience:

Team reformation after agent failures
Graceful degradation under load
Recovery from network partitions

Knowledge Accumulation:

Reuse of solutions and patterns
Agent skill development over time
Continuous improvement in team formation

Future Evolution

Advanced Capabilities

Cross-Team Coordination: Teams collaborating on larger initiatives
Agent Specialization: Agents developing deep expertise in specific domains
Dynamic Reconfiguration: Teams adapting composition mid-task
Predictive Formation: AI predicting optimal teams before task assignment
Quality Prediction: Estimating deliverable quality during team formation

Integration Opportunities

External Stakeholders: Human team members or external AI services
Compliance Integration: Automated regulatory and policy compliance
Performance Optimization: ML-driven team composition optimization
Resource Management: Intelligent compute and storage allocation
Governance: Auditable decision trails and accountability mechanisms

This evolution represents a fundamental shift toward truly autonomous AI development capabilities that augment and eventually potentially replace traditional software development team structures, while maintaining the collaborative, consensus-driven decision-making that ensures high-quality outcomes.

🏗️ Technical Architecture

Current Implementation Architecture

WHOOSH is currently implemented as a specialized council formation system integrated into the existing CHORUS stack, with clear separation between detection, composition, deployment, and monitoring concerns.

High-Level System Flow

graph TB
    subgraph "Gitea Repository Monitoring"
        REPO[Repository] --> ISSUE[Design Brief Issue]
        ISSUE --> LABEL[chorus-entrypoint]
    end
    
    subgraph "WHOOSH Council Formation"
        LABEL --> MONITOR[WHOOSH Monitor]
        MONITOR --> DETECT[Issue Detection]
        DETECT --> COMPOSE[Council Composition]
        COMPOSE --> ROLES[human-roles.yaml]
    end
    
    subgraph "CHORUS Deployment"
        COMPOSE --> DEPLOY[Docker Swarm Deploy]
        DEPLOY --> CHORUS1[CHORUS Agent 1]
        DEPLOY --> CHORUS2[CHORUS Agent 2] 
        DEPLOY --> CHORUS3[CHORUS Agent N]
    end
    
    subgraph "P2P Collaboration"
        CHORUS1 --> P2P[P2P Network]
        CHORUS2 --> P2P
        CHORUS3 --> P2P
        P2P --> ARTIFACTS[Council Artifacts]
    end
    
    subgraph "Persistence Layer"
        DETECT --> PGDB[(PostgreSQL)]
        ARTIFACTS --> PGDB
        PGDB --> COUNCILS[Councils Table]
        PGDB --> AGENTS[Council Agents Table] 
        PGDB --> OUTPUTS[Council Artifacts Table]
    end

Ecosystem Integration Points

BZZZ Task Management Integration

Current: Council artifacts provide structured input for BZZZ task creation

Council deliverables (manifests, DRs, scaffold plans) inform task breakdown structure
Project context and constraints flow from council decisions to task specifications
Council role recommendations influence team composition for ongoing development

Future: Direct handoff mechanisms between councils and BZZZ teams

Automatic task generation based on scaffold plans
Agent transition from council roles to development team roles
Progress tracking continuity from kickoff through delivery

SLURP Knowledge Integration

Current: Council communications and artifacts preserved via UCXL addressing

All HMMM protocol messages stored with proper addressing for future reference
Decision rationale and evidence captured in structured format
Artifacts tagged with council ID and role attribution

Future: Enhanced knowledge graph integration

Automated DR generation from council consensus decisions
Cross-project pattern recognition and reuse recommendations
Council effectiveness analytics based on project outcomes

CHORUS Agent Ecosystem

Current: CHORUS agents configured with council-specific roles and context

Role identifiers passed via environment variables from human-roles.yaml
Design Brief content provided as task context
P2P network access for inter-council communication

Future: Enhanced agent capabilities for team transitions

Agent memory persistence across council and team phases
Specialized council expertise development over time
Cross-council knowledge sharing and best practice propagation

Institutional Quality Gates

Provenance present: artifacts reference UCXL addresses and cite prior DRs.
Secrets clean: SHHH pass on channel logs and artifacts.
Temporal pin: decisions pin the addressed time slice (~~/, #/) used.

Council Formation → SLURP Integration

WHOOSH composes councils → HMMM captures structured reasoning → SLURP ingests and packages kickoff artifacts for DR publication and future project reference.

Current Artifact Flow

Design Brief Detection → Council Formation → HMMM Reasoning → Artifact Production → SLURP Ingestion
                                                                              ↓
                                                                    UCXL-addressed storage
                                                                              ↓  
                                                                    Future project reference

Current Services Architecture

WHOOSH Council Formation Stack

graph TB
    subgraph "CHORUS Unified Stack"
        subgraph "Frontend Layer"
            UI[WHOOSH Dashboard]
            WS[WebSocket Council Updates]
            API[Council API]
        end
        
        subgraph "WHOOSH Services"
            MONITOR[Repository Monitor]
            COMPOSER[Council Composer]
            DEPLOYER[Agent Deployer]
            TRACKER[Progress Tracker]
        end
        
        subgraph "Data Layer"
            POSTGRES[(PostgreSQL)]
            COUNCILS[Councils Table]
            AGENTS[Council Agents Table]
            ARTIFACTS[Artifacts Table]
        end
        
        subgraph "CHORUS Agent Network"
            LEAD[Lead Design Director]
            ARCH[Senior Software Architect]
            SEC[Security Expert]
            DB[Database Engineer]
            STRAT[Marketing Strategist]
        end
    end
    
    subgraph "External Integrations"
        GITEA[Gitea Repository]
        DOCKER[Docker Swarm]
        P2P[P2P Network]
        SLURP[SLURP Knowledge Store]
    end
    
    GITEA --> MONITOR
    MONITOR --> COMPOSER
    COMPOSER --> DEPLOYER
    DEPLOYER --> DOCKER
    
    DOCKER --> LEAD
    DOCKER --> ARCH
    DOCKER --> SEC
    DOCKER --> DB
    DOCKER --> STRAT
    
    LEAD --> P2P
    ARCH --> P2P
    SEC --> P2P
    DB --> P2P
    STRAT --> P2P
    
    P2P --> ARTIFACTS
    ARTIFACTS --> SLURP
    
    MONITOR --> POSTGRES
    COMPOSER --> POSTGRES
    TRACKER --> POSTGRES

Future: Full Autonomous Team Architecture

graph TB
    subgraph "Enhanced WHOOSH Platform"
        subgraph "Frontend Layer"
            DASH[Team Dashboard]
            METRICS[Analytics UI]
            CONTROL[Control Panel]
        end
        
        subgraph "Core Services"
            TEAM_COMPOSER[Team Composer]
            AGENT_MANAGER[Agent Manager]
            WORKFLOW_ENGINE[Workflow Engine]
            CONSENSUS[Consensus Engine]
        end
        
        subgraph "Intelligence Layer"
            CAPABILITY_MATCHER[Capability Matcher]
            PERFORMANCE_ANALYZER[Performance Analyzer]
            PREDICTOR[Team Success Predictor]
        end
    end
    
    subgraph "Autonomous Agent Ecosystem"
        SELF_ORG[Self-Organizing Agents]
        SPECIALIST[Domain Specialists]
        GENERALIST[Generalist Agents]
    end

Component Specifications

Current Implementation Components

🗺️ Repository Monitor

Purpose: Continuously monitors Gitea repositories for Design Brief issues that trigger council formation.

Key Responsibilities:

Webhook-based repository event processing
Design Brief issue detection (chorus-entrypoint labels)
Repository sync status management (initial vs incremental)
Issue content extraction and context building

Current API Endpoints:

GET    /api/repositories          # List monitored repositories
POST   /api/repositories          # Add repository to monitoring
GET    /api/repositories/{id}/issues  # Get repository issues
POST   /webhooks/gitea            # Gitea webhook endpoint

Database Schema (Current):

-- Repository monitoring
repositories (
    id UUID PRIMARY KEY,
    full_name VARCHAR(255) NOT NULL,
    gitea_id INTEGER NOT NULL,
    sync_status VARCHAR(50) DEFAULT 'pending',
    last_issue_sync TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Design Brief issues that trigger councils
issues (
    id UUID PRIMARY KEY,
    repository_id UUID REFERENCES repositories(id),
    gitea_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    body TEXT,
    labels JSONB,
    state VARCHAR(20) DEFAULT 'open',
    created_at TIMESTAMP DEFAULT NOW()
);

🏢 Council Composer

Purpose: Analyzes Design Briefs and determines optimal council composition based on project requirements.

Key Responsibilities:

Design Brief content analysis
Role mapping from human-roles.yaml
Council size and composition optimization
Resource availability checking

Current Implementation:

// Council composition logic
func ComposeCouncil(designBrief DesignBrief) (*CouncilComposition, error) {
    projectType := analyzeProjectType(designBrief.Content)
    requiredDomains := extractDomains(designBrief.Content)
    
    baseRoles := getBaseRolesForType(projectType)
    additionalRoles := getAdditionalRoles(requiredDomains)
    
    return &CouncilComposition{
        Roles: append(baseRoles, additionalRoles...),
        Size: len(baseRoles) + len(additionalRoles),
        ProjectContext: designBrief.Content,
    }, nil
}

🚀 Agent Deployer

Purpose: Deploys CHORUS agents via Docker Swarm with council-specific configuration.

Key Responsibilities:

Docker Swarm service creation and management
Agent environment variable configuration
P2P network setup for council communication
Service health monitoring and recovery

Current Deployment Logic:

// Docker service deployment for council agents
func DeployCouncilAgent(role string, councilID string, context string) error {
    serviceName := fmt.Sprintf("council-%s-%s", councilID, role)
    
    serviceSpec := swarm.ServiceSpec{
        Annotations: swarm.Annotations{
            Name: serviceName,
        },
        TaskTemplate: swarm.TaskSpec{
            ContainerSpec: &swarm.ContainerSpec{
                Image: "anthonyrawlins/chorus:latest",
                Env: []string{
                    fmt.Sprintf("CHORUS_ROLE=%s", role),
                    fmt.Sprintf("CHORUS_TASK_CONTEXT=%s", context),
                    fmt.Sprintf("P2P_NETWORK=council-%s", councilID),
                },
                Mounts: []mount.Mount{{
                    Type:   mount.TypeBind,
                    Source: "/rust/containers/WHOOSH/prompts",
                    Target: "/app/prompts",
                }},
            },
        },
    }
    
    return dockerClient.ServiceCreate(context.Background(), serviceSpec, types.ServiceCreateOptions{})
}

📊 Progress Tracker

Purpose: Monitors council progress and artifact production throughout the kickoff process.

Key Responsibilities:

Council agent deployment status tracking
Artifact production monitoring
Decision consensus tracking
Council completion detection
Error handling and recovery coordination

Database Tracking (Current):

-- Council progress tracking
council_agents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    council_id UUID REFERENCES councils(id),
    role_name VARCHAR(100) NOT NULL,
    service_id VARCHAR(255), -- Docker service ID
    status VARCHAR(50) DEFAULT 'pending',
    deployed_at TIMESTAMP,
    UNIQUE(council_id, role_name)
);

council_artifacts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    council_id UUID REFERENCES councils(id),
    artifact_type VARCHAR(50) NOT NULL,
    content TEXT,
    produced_by VARCHAR(255),
    status VARCHAR(50) DEFAULT 'draft',
    produced_at TIMESTAMP DEFAULT NOW()
);

Future Components

🤖 Enhanced Agent Manager (Planned)

Purpose: Manages autonomous agent capabilities, performance, and self-organization for ongoing teams.

Future Responsibilities:

Agent capability self-assessment
Dynamic team joining algorithms
Performance tracking and optimization
Cross-team agent coordination
Predictive team formation

Future Agent Self-Registration Protocol:

{
    "agent_id": "chorus-agent-001",
    "name": "Senior Software Architect",
    "current_role": "senior-software-architect",
    "specializations": ["microservices", "system-design", "scalability"],
    "council_history": [
        {"council_id": "marketplace-kickoff", "role": "architect", "rating": 4.8},
        {"council_id": "analytics-platform", "role": "architect", "rating": 4.9}
    ],
    "capabilities": {
        "architecture_design": 0.95,
        "technology_selection": 0.90,
        "team_leadership": 0.85,
        "consensus_building": 0.88
    },
    "availability": {
        "current_councils": 1,
        "max_concurrent": 3,
        "preferred_domains": ["fintech", "ecommerce", "enterprise"]
    },
    "learning_metrics": {
        "councils_completed": 47,
        "avg_artifact_quality": 4.7,
        "consensus_success_rate": 0.92,
        "stakeholder_satisfaction": 4.8
    }
}

Current Council Health Monitoring:

// Council agent health check
type CouncilAgentHealth struct {
    AgentID        string    `json:"agent_id"`
    CouncilID      string    `json:"council_id"`
    Role           string    `json:"role"`
    ServiceID      string    `json:"service_id"`
    Status         string    `json:"status"`
    LastSeen       time.Time `json:"last_seen"`
    ArtifactsCount int       `json:"artifacts_produced"`
    P2PConnected   bool      `json:"p2p_connected"`
    ErrorMessage   *string   `json:"error_message,omitempty"`
}

// Future: Enhanced agent health with self-awareness
type AutonomousAgentHealth struct {
    CouncilAgentHealth
    SelfAssessment struct {
        TaskFit        float64 `json:"task_fit_confidence"`
        Workload       float64 `json:"current_workload_percent"`
        Collaboration  float64 `json:"team_collaboration_score"`
        LearningRate   float64 `json:"recent_learning_velocity"`
    } `json:"self_assessment"`
}

Current Data Architecture

🗄️ Council Database Schema (Implemented)

Current Tables:

-- Council management (from migrations/005_add_council_tables.up.sql)
CREATE TABLE councils (
    id UUID PRIMARY KEY,
    project_name VARCHAR(255) NOT NULL,
    repository VARCHAR(500) NOT NULL,
    project_brief TEXT NOT NULL,
    constraints TEXT,
    tech_limits TEXT,
    compliance_notes TEXT,
    targets TEXT,
    status VARCHAR(50) NOT NULL DEFAULT 'forming',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    task_id UUID REFERENCES tasks(id)
);

-- Council agent tracking
CREATE TABLE council_agents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    council_id UUID NOT NULL REFERENCES councils(id),
    role_name VARCHAR(100) NOT NULL,
    agent_name VARCHAR(255) NOT NULL,
    deployed BOOLEAN NOT NULL DEFAULT false,
    service_id VARCHAR(255), -- Docker service ID
    status VARCHAR(50) NOT NULL DEFAULT 'pending',
    UNIQUE(council_id, role_name)
);

-- Council artifact production
CREATE TABLE council_artifacts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    council_id UUID NOT NULL REFERENCES councils(id),
    artifact_type VARCHAR(50) NOT NULL, -- kickoff_manifest, seminal_dr, etc.
    content TEXT,
    produced_by VARCHAR(255),
    status VARCHAR(50) NOT NULL DEFAULT 'draft',
    produced_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Council decision tracking
CREATE TABLE council_decisions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    council_id UUID NOT NULL REFERENCES councils(id),
    decision_type VARCHAR(50) NOT NULL,
    decision_title VARCHAR(255) NOT NULL,
    options JSONB,
    chosen_option JSONB,
    votes JSONB,
    decided_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Performance Indexes:

-- Optimized for council operations
CREATE INDEX idx_councils_status ON councils(status);
CREATE INDEX idx_councils_repository ON councils(repository);
CREATE INDEX idx_council_agents_council_id ON council_agents(council_id);
CREATE INDEX idx_council_agents_deployed ON council_agents(deployed);
CREATE INDEX idx_council_artifacts_type ON council_artifacts(artifact_type);

Future: Enhanced Team Database Schema

Planned Extensions for Autonomous Teams:

-- Team capability tracking (future)
CREATE TABLE agent_capabilities (
    agent_id UUID,
    capability_name VARCHAR(100),
    proficiency_score FLOAT,
    confidence_level FLOAT,
    last_updated TIMESTAMP
);

-- Team performance metrics (future)
CREATE TABLE team_performance (
    team_id UUID,
    project_id UUID,
    success_metrics JSONB,
    completion_time INTERVAL,
    quality_score FLOAT,
    stakeholder_satisfaction FLOAT
);

Council Communication Architecture

Current: P2P Council Network

HMMM Protocol Integration:

interface CouncilMessage {
    messageId: string;
    councilId: string;
    fromRole: string;
    toRoles: string[]; // Broadcast or targeted
    messageType: 'reasoning' | 'decision' | 'artifact' | 'consensus';
    content: {
        thinking: string;     // HMMM reasoning chain
        evidence: any[];      // Supporting data
        recommendation?: any; // Proposed action/decision
        confidence: number;   // 0.0 to 1.0
    };
    ucxlAddress: string;     // UCXL addressing
    timestamp: string;
}

// Example council communication
{
    "messageId": "msg_council_001",
    "councilId": "marketplace-kickoff",
    "fromRole": "senior-software-architect",
    "toRoles": ["lead-design-director", "database-engineer"],
    "messageType": "reasoning",
    "content": {
        "thinking": "Given the multi-vendor requirements, microservices architecture provides necessary isolation...",
        "evidence": ["scalability-requirements", "vendor-isolation-needs"],
        "recommendation": "microservices-with-api-gateway",
        "confidence": 0.87
    },
    "ucxlAddress": "ucxl://senior-software-architect@marketplace:kickoff#architecture/",
    "timestamp": "2025-01-12T10:30:00Z"
}

Future: Enhanced Team Communication

Autonomous Team Coordination:

interface TeamCoordinationMessage {
    teamId: string;
    phase: 'planning' | 'execution' | 'review' | 'integration';
    priority: 'low' | 'medium' | 'high' | 'critical';
    requiresConsensus: boolean;
    votingDeadline?: string;
    escalationPath?: string[];
}

📡 Event Streaming

Event Bus Architecture:

@dataclass
class WHOOSHEvent:
    id: str
    type: str
    source: str
    timestamp: datetime
    data: Dict[str, Any]
    correlation_id: Optional[str] = None

class EventBus:
    async def publish(self, event: WHOOSHEvent) -> None:
        """Publish event to all subscribers"""
        
    async def subscribe(self, event_type: str, handler: Callable) -> str:
        """Subscribe to specific event types"""
        
    async def unsubscribe(self, subscription_id: str) -> None:
        """Remove subscription"""

Event Types:

# Agent Events
AGENT_REGISTERED = "agent.registered"
AGENT_STATUS_CHANGED = "agent.status_changed"
AGENT_PERFORMANCE_UPDATE = "agent.performance_update"

# Task Events
TASK_CREATED = "task.created"
TASK_ASSIGNED = "task.assigned"
TASK_STARTED = "task.started"
TASK_COMPLETED = "task.completed"
TASK_FAILED = "task.failed"

# Workflow Events
WORKFLOW_EXECUTION_STARTED = "workflow.execution_started"
WORKFLOW_NODE_COMPLETED = "workflow.node_completed"
WORKFLOW_EXECUTION_COMPLETED = "workflow.execution_completed"

# System Events
SYSTEM_ALERT = "system.alert"
SYSTEM_MAINTENANCE = "system.maintenance"

Security Architecture

🔒 Authentication & Authorization

JWT Token Structure:

{
    "sub": "user_id",
    "iat": 1625097600,
    "exp": 1625184000,
    "roles": ["admin", "developer"],
    "permissions": [
        "workflows.create",
        "agents.manage",
        "executions.view"
    ],
    "tenant": "organization_id"
}

Permission Matrix:

roles:
  admin:
    permissions: ["*"]
    description: "Full system access"
    
  developer:
    permissions:
      - "workflows.*"
      - "executions.*"
      - "agents.view"
      - "tasks.create"
    description: "Development and execution access"
    
  viewer:
    permissions:
      - "workflows.view"
      - "executions.view"
      - "agents.view"
    description: "Read-only access"

🛡️ API Security

Rate Limiting:

# Rate limits by endpoint and user role
RATE_LIMITS = {
    "api.workflows.create": {"admin": 100, "developer": 50, "viewer": 0},
    "api.executions.start": {"admin": 200, "developer": 100, "viewer": 0},
    "api.agents.register": {"admin": 10, "developer": 0, "viewer": 0},
}

Input Validation:

from pydantic import BaseModel, validator

class WorkflowCreateRequest(BaseModel):
    name: str
    description: Optional[str]
    n8n_data: Dict[str, Any]
    
    @validator('name')
    def validate_name(cls, v):
        if len(v) < 3 or len(v) > 255:
            raise ValueError('Name must be 3-255 characters')
        return v
    
    @validator('n8n_data')
    def validate_n8n_data(cls, v):
        required_fields = ['nodes', 'connections']
        if not all(field in v for field in required_fields):
            raise ValueError('Invalid n8n workflow format')
        return v

Deployment Architecture

🐳 Container Strategy

Docker Compose Structure:

version: '3.8'
services:
  whoosh-coordinator:
    image: whoosh/coordinator:latest
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/whoosh
      - REDIS_URL=redis://redis:6379
    depends_on: [postgres, redis]
    
  whoosh-frontend:
    image: whoosh/frontend:latest
    environment:
      - API_URL=http://whoosh-coordinator:8000
    depends_on: [whoosh-coordinator]
    
  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=whoosh
      - POSTGRES_USER=whoosh
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      
  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
      
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
      
  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
    volumes:
      - grafana_data:/var/lib/grafana

🌐 Network Architecture

Production Network Topology:

Internet
    ↓
[Traefik Load Balancer] (SSL Termination)
    ↓
[tengig Overlay Network]
    ↓
┌─────────────────────────────────────┐
│  WHOOSH Application Services         │
│  ├── Frontend (React)              │
│  ├── Backend API (FastAPI)         │
│  ├── WebSocket Gateway             │
│  └── Task Queue Workers            │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│  Data Services                      │
│  ├── PostgreSQL (Primary DB)       │
│  ├── Redis (Cache + Sessions)      │
│  ├── InfluxDB (Metrics)            │
│  └── Prometheus (Monitoring)       │
└─────────────────────────────────────┘
    ↓
┌─────────────────────────────────────┐
│  AI Agent Network (2-node cluster) │
│  ├── WALNUT (192.168.1.27:11434)  │
│  │   └── ROCm (RX 9060 XT)        │
│  └── ACACIA (192.168.1.72:11434)  │
│      └── CUDA (RTX 2080 Super)    │
└─────────────────────────────────────┘

Performance Considerations

🚀 Optimization Strategies

Database Optimization:

Connection pooling with asyncpg
Query optimization with proper indexing
Time-series data partitioning for metrics
Read replicas for analytics queries

Caching Strategy:

Redis for session and temporary data
Application-level caching for expensive computations
CDN for static assets
Database query result caching

Concurrency Management:

AsyncIO for I/O-bound operations
Connection pools for database and HTTP clients
Semaphores for limiting concurrent agent requests
Queue-based task processing

📊 Monitoring & Observability

Key Metrics:

# Application Metrics
- whoosh_active_agents_total
- whoosh_task_queue_length
- whoosh_workflow_executions_total
- whoosh_api_request_duration_seconds
- whoosh_websocket_connections_active

# Infrastructure Metrics  
- whoosh_database_connections_active
- whoosh_redis_memory_usage_bytes
- whoosh_container_cpu_usage_percent
- whoosh_container_memory_usage_bytes

# Business Metrics
- whoosh_workflows_created_daily
- whoosh_execution_success_rate
- whoosh_agent_utilization_percent
- whoosh_average_task_completion_time

Alerting Rules:

groups:
- name: whoosh.rules
  rules:
  - alert: HighErrorRate
    expr: rate(whoosh_api_errors_total[5m]) > 0.1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High error rate detected"
      
  - alert: AgentDown
    expr: whoosh_agent_health_status == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Agent {{ $labels.agent_id }} is down"

This architecture provides a solid foundation for the unified WHOOSH platform, combining the best practices from our existing distributed AI projects while ensuring scalability, maintainability, and observability.

52 KiB Raw Permalink Blame History Unescape Escape

WHOOSH: Autonomous AI Development Teams Architecture

Executive Summary

Current Implementation Snapshot (2025-10)

Architecture Overview

Current Implementation: Council Formation Engine

Future Vision: Autonomous Team Architecture

WHOOSH Team Composer

Purpose

Team Composition Logic

Team Templates

Scaling and Resilience

Golden Signals & SLOs

Synthetic Scale Test

Canary Config Reload

Failure Drills

Ops Guardrails

Rollback & Recovery Runbook

WHOOSH Autoscale Policy (Example)

CHORUS Agent Self-Organization

Agent Self-Awareness

TODO

Autonomous Team Joining Process

Capability Matching Algorithm

GITEA Team Management

Team Issue Structure

Role Status Management

Progress Tracking

P2P Team Collaboration Channels

HMMM in the loop

Channel Architecture

Conflict Resolution

CHORUS Integration

UCXL-based Messaging Address Structure**

Implementation Phases

Phase 1: Foundation (WHOOSH Team Composer)

Phase 2: Agent Enhancement (CHORUS Self-Organization)

Phase 3: Collaboration Infrastructure (P2P Channels)

Phase 4: Consensus Systems (Democratic Decision Making)

Phase 5: Integration (SLURP Connectivity)

Benefits & Considerations

Key Benefits

Considerations & Challenges

Success Metrics

Future Evolution

Advanced Capabilities

Integration Opportunities

🏗️ Technical Architecture

Current Implementation Architecture

High-Level System Flow

Ecosystem Integration Points

BZZZ Task Management Integration

SLURP Knowledge Integration

CHORUS Agent Ecosystem

Institutional Quality Gates

Council Formation → SLURP Integration

Current Artifact Flow

Current Services Architecture

WHOOSH Council Formation Stack

Future: Full Autonomous Team Architecture

Component Specifications

Current Implementation Components

🗺️ Repository Monitor

🏢 Council Composer

🚀 Agent Deployer

📊 Progress Tracker

Future Components

🤖 Enhanced Agent Manager (Planned)

Current Data Architecture

🗄️ Council Database Schema (Implemented)

Future: Enhanced Team Database Schema

Council Communication Architecture

Current: P2P Council Network

Future: Enhanced Team Communication

📡 Event Streaming

Security Architecture

🔒 Authentication & Authorization

🛡️ API Security

Deployment Architecture

🐳 Container Strategy

52 KiB

Raw Permalink Blame History