> Legacy module reference (WHOOSH-era architecture). > Current coordinator implementation is SWOOSH in `/home/tony/chorus/SWOOSH`. > This file is retained for historical context and migration lineage. # WHOOSH: Autonomous AI Development Teams Architecture ## Executive Summary WHOOSH is evolving from project kickoff council formation to **self-organizing AI development teams** that mirror human collaboration patterns. Currently implemented as a **Council Formation Engine**, WHOOSH automatically detects new project Design Briefs and assembles specialized councils of CHORUS agents to handle project kickoffs. This foundation enables future expansion to autonomous teams that collaborate through P2P channels, reach consensus on solutions, and submit high-quality deliverables to SLURP. **Current Implementation**: WHOOSH monitors Gitea repositories for "Design Brief" issues labeled `chorus-entrypoint`, then intelligently composes kickoff councils using role definitions from human-roles.yaml. CHORUS agents are deployed via Docker Swarm to collaborate on project initialization, producing kickoff artifacts that define project direction and requirements. **Future Vision**: Extend beyond project kickoffs to ongoing team management where CHORUS agents autonomously join teams based on capabilities, collaborate democratically through HMMM protocol, and deliver solutions without central orchestration points of failure. ### Current Implementation Snapshot (2025-10) - **Bootstrap rendezvous & topology** – WHOOSH now exposes `/api/v1/bootstrap-peers` and `/api/v1/topology` (see `internal/server/bootstrap.go`) powered by a combined Swarm + QUIC discovery layer (`internal/p2p/discovery.go`, `internal/p2p/quic_client.go`). Returned entries include transport preference, certificate hash, and prioritised multiaddrs so CHORUS containers can form meshes without hard-coded peers. - **Assignment broker for CHORUS replicas** – `internal/orchestrator/assignment_broker.go` maintains template-driven runtime assignments (role, model, bootstrap peers, join stagger) and serves them via `/api/v1/assignments`. CHORUS’s runtime (`pkg/config.RuntimeConfig`) consumes the broker to merge WHOOSH-defined overrides into live container config. - **Backbeat-integrated orchestration** – A dedicated BACKBEAT client (`internal/backbeat/integration.go`) tracks beat cadence, reports search/analysis operations, and emits WHOOSH health claims. Tempo hints feed the scaling controller so wave launches respect cluster rhythm. - **Wave-based scaling & health gates** – The orchestrator stack (`internal/orchestrator/*.go`) coordinates Docker Swarm deployment, health gating (KACHING, BACKBEAT, self), bootstrap pool management, and metrics exports. Scaling decisions are captured through `ScalingMetricsCollector` and surfaced by the `ScalingAPI`. - **Spec-Kit enterprise plugin** – The spec-kit HTTP client (`internal/composer/spec_kit_client.go`) wraps the external Spec-Kit service with retries, circuit-breaker toggles, and structured artifact ingestion. Outputs are normalised into council artefacts so SLURP/BUBBLE can ingest enterprise deliverables alongside community workflows. ## Architecture Overview ### Current Implementation: Council Formation Engine ``` ┌─────────────────────────────────────────────────────────────────┐ │ WHOOSH COUNCIL FORMATION │ │ (Issue Detection + Council Composition) │ └─────────────────────┬───────────────────────────────────────────┘ │ detects Design Brief issues ▼ ┌─────────────────────────────────────────────────────────────────┐ │ GITEA MONITORING │ │ (chorus-entrypoint Labels + Webhook Triggers) │ └─────────────────────┬───────────────────────────────────────────┘ │ triggers council deployment ▼ ┌─────────────────────────────────────────────────────────────────┐ │ CHORUS AGENT DEPLOYMENT │ │ (Docker Swarm + human-roles.yaml) │ └─────────────────────┬───────────────────────────────────────────┘ │ councils collaborate via P2P ▼ ┌─────────────────────────────────────────────────────────────────┐ │ COUNCIL P2P COLLABORATION │ │ (HMMM Protocol + UCXL Addressing) │ └─────────────────────┬───────────────────────────────────────────┘ │ produces kickoff artifacts ▼ ┌─────────────────────────────────────────────────────────────────┐ │ PROJECT KICKOFF DELIVERABLES │ │ (Manifests, DRs, Scaffold Plans, Gate Tests) │ └─────────────────────────────────────────────────────────────────┘ ``` ### Future Vision: Autonomous Team Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ WHOOSH TEAM COMPOSER (Phase 2) │ │ (LLM-Powered Team Formation) │ └─────────────────────┬───────────────────────────────────────────┘ │ extends council formation to ongoing teams ▼ ┌─────────────────────────────────────────────────────────────────┐ │ GITEA TEAM MANAGEMENT │ │ (Team Issues + Role Assignments) │ └─────────────────────┬───────────────────────────────────────────┘ │ agents monitor & self-assign ▼ ┌─────────────────────────────────────────────────────────────────┐ │ AUTONOMOUS CHORUS AGENTS │ │ (Self-Aware Capability Matching) │ └─────────────────────┬───────────────────────────────────────────┘ │ join team collaboration channels ▼ ┌─────────────────────────────────────────────────────────────────┐ │ P2P TEAM COLLABORATION │ │ (Dedicated Team Channels) │ └─────────────────────┬───────────────────────────────────────────┘ │ consensus-driven completion ▼ ┌─────────────────────────────────────────────────────────────────┐ │ SLURP INTEGRATION │ │ (Artifact Submission + Context) │ └─────────────────────────────────────────────────────────────────┘ ``` ## WHOOSH Team Composer ### Purpose Uses LLM reasoning to analyze incoming tasks and determine optimal team compositions based on: - Task complexity and scope - Required skill domains - Estimated effort and timeline - Quality requirements ### Team Composition Logic **Example Task Analysis:** ``` Task: "Implement secure user authentication system with OAuth2 integration" LLM Analysis: - Complexity: High - Domains: Security, Backend API, Frontend UI, Database, Testing - Estimated Timeline: 2-3 days - Quality Requirements: High (security-critical) Recommended Team Composition: ├── Security Architect (1x) │ ├── Role: Define security requirements and review implementation │ ├── Skills: OAuth2, JWT, encryption, security best practices │ └── AI Model: deepseek-coder-v2 (security focus) ├── Backend Developer (1x) │ ├── Role: Implement API endpoints and authentication logic │ ├── Skills: REST APIs, database integration, middleware │ └── AI Model: qwen2.5-coder:32b (backend specialization) ├── Frontend Developer (1x) │ ├── Role: Build login UI and authentication flows │ ├── Skills: React/Vue, state management, form validation │ └── AI Model: starcoder2:15b (frontend focus) ├── Database Engineer (1x) │ ├── Role: Design user tables and session management │ ├── Skills: SQL, database design, performance optimization │ └── AI Model: granite3-dense:8b (data modeling) └── QA Engineer (1x) ├── Role: Security testing and integration validation ├── Skills: Testing frameworks, security scanning, automation └── AI Model: phi4:14b (testing focus) ``` ### Team Templates **Pre-defined team configurations for common scenarios:** 1. **Feature Development Team** - Backend Developer + Frontend Developer + QA Engineer 2. **Bug Fix Team** - Debugger + Code Reviewer 3. **Architecture Design Team** - System Architect + Security Architect + Performance Engineer 4. **Documentation Team** - Technical Writer + Code Reviewer + Subject Matter Expert 5. **Refactoring Team** - Code Analyzer + Refactoring Specialist + QA Engineer ### Scaling and Resilience WHOOSH is designed for robust, automated scaling and resilience. The system is governed by a set of Service Level Objectives (SLOs) and includes automated testing, failure drills, and operational guardrails to ensure stability under pressure. #### Golden Signals & SLOs The following SLOs are defined and monitored, with alerts triggered upon breach: * **KACHING**: p95 lease issuance < 250 ms; error/429 rate < 1%. * **Join success**: ≥ 95% of new replicas join mesh within 30 s of container Ready. * **BackBeat (JetStream)**: per-subject **consumer lag** < 200 msgs; publish acks < 100 ms. * **Election stability**: ≤ 1 leader change / 10 min per cluster during steady state. * **Swarm**: task start success ≥ 99%; median start→assigned<10 s. Metrics exposed to monitor these SLOs include: * `chorus_license_lease_latency_ms`, `chorus_license_breaker_open_total` * `chorus_bootstrap_join_duration_ms`, `chorus_join_success_total{result=...}` * `backbeat_stream_lag`, `backbeat_ack_latency_ms` * `whoosh_wave_size`, `whoosh_wave_backoff_ms`, `whoosh_gate_block_seconds` * `chorus_election_changes_total` #### Synthetic Scale Test A nightly, repeatable synthetic scale test is scripted to validate scaling behavior: 1. Scale from N=3 → 3 + 12 in **waves**, following the WHOOSH policy. 2. Hold for 5 minutes under synthetic load (CHORUS pulls a fixed assignment that triggers normal P2P + small workload). 3. Scale back to 3. **Pass/Fail Criteria**: * No gate held > 2 min. * No breaker > 60 s open. * Join p95 < 25 s. * 0% orphan tasks after scale-down. #### Canary Config Reload To prove the runtime assignment merge works without restarts: * A prompt/model change is triggered via WHOOSH for a **10% canary** (by assignment). * Success/error rates and JetStream lag are watched for 2 minutes. * The change is then rolled out to 100%. #### Failure Drills Cheap chaos engineering drills are run to test resilience: * **KACHING brownout**: Inject 500 ms latency + 5% 429s for 2 minutes. Expect grace-window starts, brief breaker openings, and no P2P join until the lease is acquired. The system should auto-heal and join within 30s after recovery. * **Bootstrap peer loss**: Take 50% of the pool out mid-wave. Expect WHOOSH to pause the next wave, with existing joiners still succeeding via a subset. The pool health should recover before continuing. * **BackBeat clog**: Cap consumer read to simulate lag > threshold. Expect the WHOOSH gate to block scaling, while replicas continue local work without dropping messages. * **Leader eviction**: Kill the current leader. Verify the stability window prevents thrashing and a new leader is elected in < 5s. #### Ops Guardrails * **Admission control** in WHOOSH (implied by gates): Hard cap on "max replicas added / 5 min". * **Per-node placement**: Use Swarm labels so waves don’t pile onto one box (e.g., `placement.max_replicas_per_node: 1` for critical roles). * **Quarantine mode** in CHORUS: When a license fails after grace or bootstrap joins time out, expose `/health?quarantine=1` and refuse task intake until cleared. #### Rollback & Recovery Runbook 1. **Parameter rollback**: WHOOSH re-points `ASSIGNMENT_EPOCH` to the last-good configuration, and `POST /v1/reload` is sent to all replicas. 2. **Scale rollback**: `target = previous_replicas`. WHOOSH drains the newest assignments first. Requires `join_success ≥ 90%` before any further changes. 3. **KACHING outage**: Flip the cluster to **cached-lease only** for up to 10 minutes, block new waves, and page the on-call team. 4. **Bootstrap meltdown**: Promote 3 healthy workers to **temporary bootstrap** (via label + static list), then resume. #### WHOOSH Autoscale Policy (Example) The scaler configuration is kept out of the code in a YAML file: ```yaml cluster: prod service: chorus wave: max_per_wave: 8 min_per_wave: 3 period_sec: 25 placement: max_replicas_per_node: 1 gates: kaching: p95_latency_ms: 250 max_error_rate: 0.01 backbeat: max_stream_lag: 200 bootstrap: min_healthy_peers: 3 join: min_success_rate: 0.80 backoff: initial_ms: 15000 factor: 2.0 jitter: 0.2 max_ms: 120000 quarantine: enable: true exit_on: "kaching_ok && bootstrap_ok" canary: fraction: 0.1 promote_after_sec: 120 ``` ## CHORUS Agent Self-Organization ### Agent Self-Awareness Each CHORUS agent maintains awareness of: - **Primary Capabilities**: Core skills and specializations - **Secondary Capabilities**: Additional skills they can contribute - **Current Load**: Active team memberships and availability - **Performance History**: Success rates and peer feedback - **Preferred AI Models**: Best-performing models for their tasks ## TODO - Team Composer API: Implement llama3.1-based team analysis as a Dockerized service (task analysis → capability mapping → team proposals) with unit/integration tests and metrics. - SLURP integration: Add endpoints for curated bundle ingest/retrieval and document contracts/auth; validate E2E with BUBBLE/DHT. - CHORUS connectivity: Enable and validate live consensus/task flows (configure `chorus_endpoints`) with health checks and error handling; remove reliance on mocked data. - Replace mocked UI test routes with real backend calls for agent lifecycle and health checks. - Hardware-driven model selection: Add agent-side hardware discovery to drive model selection; avoid hardcoded cluster IPs or model names in configs. - MCP integration later: Keep MCP optional; maintain clean API boundary for now. - **Collaboration Style**: Team role preferences ### Autonomous Team Joining Process 1. **Monitoring Phase** - Agents continuously monitor GITEA for team formation issues - Filter by matching capabilities and availability - Assess team composition gaps they could fill 2. **Self-Assessment Phase** ``` Agent Self-Evaluation: - "This team needs a frontend developer" - "I have React/TypeScript skills (confidence: 85%)" - "My current load: 1 active team (capacity available)" - "Team timeline: 3 days (fits my schedule)" - "Decision: JOIN TEAM" ``` 3. **Team Application Phase** - Agent comments on GITEA issue with capability summary - Provides availability window and estimated contribution - Existing team members can review and approve/decline 4. **Integration Phase** - Agent joins P2P team channel - Introduces capabilities and proposes initial approach - Begins collaborative work with team ### Capability Matching Algorithm ```python def assess_team_fit(agent, team_requirement): skill_match = calculate_skill_overlap(agent.capabilities, team_requirement.skills) availability_match = check_schedule_compatibility(agent.schedule, team_requirement.timeline) team_chemistry = assess_collaboration_history(agent, team_requirement.existing_members) fit_score = (skill_match * 0.5) + (availability_match * 0.3) + (team_chemistry * 0.2) return fit_score ``` ## GITEA Team Management ### Team Issue Structure Each team is represented by a GITEA issue with structured metadata: ```yaml Title: "Team Formation: Secure Authentication System Implementation" Labels: - team:auth-system-v2 - complexity:high - timeline:3-days - domain:security - domain:backend - domain:frontend Team Composition: - [ ] Security Architect (required) - [x] Backend Developer (@agent-backend-specialist) - [ ] Frontend Developer (required) - [ ] QA Engineer (required) - [ ] Code Reviewer (optional) Timeline: 2024-08-15 to 2024-08-18 P2P Channel: team-auth-system-v2-channel SLURP Address: ucxl://teams/auth-system-v2/artifacts ``` ### Role Status Management - **Open**: Role available for assignment - **Applied**: Agent has expressed interest - **Assigned**: Agent confirmed for role - **Active**: Agent currently working - **Completed**: Role deliverables finished - **Blocked**: Role waiting on dependencies ### Progress Tracking Teams update GITEA issue with: - Daily progress summaries - Milestone achievements - Blocker identification - Resource requests - Quality gate completions ## P2P Team Collaboration Channels #### HMMM in the loop **Reasoning channels, not just chat.** Team channels carry **structured thought** (HMMM) as well as messages: intermediate chains, critiques, and mini-memos are timestamped, attributed, and ingested by SLURP for later DRs. This enables consensus with evidence, not vibes. ### Channel Architecture Each team gets dedicated communication infrastructure: ``` Team Channel: team-auth-system-v2-channel ├── Topic Streams: │ ├── #planning (initial design discussions) │ ├── #implementation (development coordination) │ ├── #review (code/design reviews) │ ├── #testing (QA coordination) │ └── #integration (final assembly) ├── File Sharing: Distributed artifact storage ├── Screen Sharing: Real-time collaboration sessions └── Voice Channels: Synchronous discussion capability ``` ``` ### Context Preservation All team communications are automatically: - Timestamped and attributed to agents - Categorized by topic stream - Indexed for searchability - Ingested by SLURP into Hypercore distributed log ## Consensus Mechanisms > For quorum rules, vote semantics (green/yellow/red), tempo (beats), and the front‑of‑house review/delivery API contracts, see the WHOOSH Review & Consensus Policy: [../Policy/WHOOSH-Review-Policy.md](../Policy/WHOOSH-Review-Policy.md). ### Democratic Decision Making Refer to the Review Policy for project‑configurable defaults and API shapes: [../Policy/WHOOSH-Review-Policy.md](../Policy/WHOOSH-Review-Policy.md). **1. Voting Systems** - **Simple Majority**: Basic feature decisions - **Supermajority (2/3)**: Architecture changes - **Unanimous**: Security-critical decisions - **Technical Lead Override**: Deadlock resolution **2. Quality Gates** Before task completion, teams must achieve consensus on: - **Functional Requirements**: All specified features implemented - **Quality Standards**: Code review, testing, documentation complete - **Security Review**: Security-sensitive changes approved by security role - **Performance Benchmarks**: Performance requirements met - **Integration Testing**: End-to-end functionality verified **3. Completion Criteria** ```yaml Completion Checklist: - [ ] All assigned roles have marked deliverables complete - [ ] Peer reviews completed by at least 2 team members - [ ] Automated tests passing (unit + integration) - [ ] Security review approved (if applicable) - [ ] Documentation updated - [ ] Team consensus vote: "Ready for submission" (majority required) ``` ### Conflict Resolution **1. Technical Disagreements** - Structured debate with evidence presentation - Prototype/spike development for comparison - Expert agent consultation by posting to ... - Escalation to WHOOSH Admin User (human) for tie-breaking **2. Resource Conflicts** - Workload re-balancing among team members - Additional agent recruitment if needed - Scope reduction with consensus approval and Issue lodgement **3. Quality Disputes** - Independent review by WHOOSH Admin User (human) - Automated quality metric evaluation - Compromise solution development - Innovation agent inclusion to team ## CHORUS Integration ### UCXL-based Messaging Address Structure** eg. For the following address: ucxl://any:role@project:task/#/ **@project:task** *is* the Team ID. This means any inter-agent discussions published to **@project:task** are seen by those CHORUS team members. We use the *[antennae protocol]* (for libp2p) to pub / sub messaging between agents by sending the reasoning component to the other members. So a communications log might look like this... publish to chat room **@website:architecture-design** PeerID = D0019:senior-software-architect ``` { "channel": "**@website:architecture-design**", "from-agentid": "**D0019:senior-software-architect**", "reponding-to": "None", "thoughts": " ... " } ``` So as noted in our system prompts to every agent, between each step we gather any thoughts of our peers. GET from API endpoint /api/v1/antennae/@website:architecture-design** ## Implementation Phases ### Phase 1: Foundation (WHOOSH Team Composer) - **LLM-powered task analysis service** - **Team composition templates and logic** - **GITEA issue creation with team metadata** - **Basic team formation workflows** ### Phase 2: Agent Enhancement (CHORUS Self-Organization) - **Agent capability self-assessment systems** - **GITEA monitoring and team application logic** - **Autonomous team joining decision algorithms** - **Agent-to-agent communication protocols** ### Phase 3: Collaboration Infrastructure (P2P Channels) - **Team-specific communication channel creation** - **Message routing and topic organization** - **Real-time collaboration tools integration** - **Communication archival for SLURP submission** ### Phase 4: Consensus Systems (Democratic Decision Making) - **Voting mechanisms and quorum rules** - **Quality gate automation and verification** - **Conflict resolution procedures** - **Completion criteria validation** ### Phase 5: Integration (SLURP Connectivity) - **Artifact packaging and submission workflows** - **UCXL address management and organization** - **Context preservation and knowledge extraction** - **Performance analytics and optimization** ## Benefits & Considerations ### Key Benefits ✅ **Fault Tolerance**: No single points of failure - teams operate independently ✅ **Scalability**: Teams form and dissolve dynamically based on demand ✅ **Quality**: Consensus-driven decisions improve deliverable quality ✅ **Knowledge Preservation**: Full context captured for future learning ✅ **Natural Collaboration**: Mirrors effective human team patterns ✅ **Autonomous Operation**: Minimal human intervention required ✅ **Adaptive**: Teams adjust composition based on task evolution ✅ **Observable**: Full transparency through GITEA and P2P channels ### Considerations & Challenges ⚠️ **Initial Complexity**: Sophisticated system requiring careful implementation ⚠️ **Coordination Overhead**: Team formation and consensus processes take time ⚠️ **Agent Training**: CHORUS agents need enhanced self-awareness capabilities ⚠️ **Network Dependencies**: P2P channels require reliable connectivity ⚠️ **Quality Variance**: Team effectiveness may vary based on composition ⚠️ **Resource Competition**: Popular agents may become bottlenecks ⚠️ **Conflict Resolution**: Complex disputes may require escalation mechanisms ### Success Metrics **Team Formation Efficiency:** - Time from task request to team formation - Percentage of teams that form successfully - Quality of initial team composition decisions **Collaboration Effectiveness:** - Team productivity metrics (velocity, quality) - Communication frequency and engagement - Consensus achievement rates **Deliverable Quality:** - Automated quality metrics (test coverage, security scores) - Peer review feedback scores - Stakeholder satisfaction ratings **System Resilience:** - Team reformation after agent failures - Graceful degradation under load - Recovery from network partitions **Knowledge Accumulation:** - Reuse of solutions and patterns - Agent skill development over time - Continuous improvement in team formation ## Future Evolution ### Advanced Capabilities - **Cross-Team Coordination**: Teams collaborating on larger initiatives - **Agent Specialization**: Agents developing deep expertise in specific domains - **Dynamic Reconfiguration**: Teams adapting composition mid-task - **Predictive Formation**: AI predicting optimal teams before task assignment - **Quality Prediction**: Estimating deliverable quality during team formation ### Integration Opportunities - **External Stakeholders**: Human team members or external AI services - **Compliance Integration**: Automated regulatory and policy compliance - **Performance Optimization**: ML-driven team composition optimization - **Resource Management**: Intelligent compute and storage allocation - **Governance**: Auditable decision trails and accountability mechanisms This evolution represents a fundamental shift toward truly autonomous AI development capabilities that augment and eventually potentially replace traditional software development team structures, while maintaining the collaborative, consensus-driven decision-making that ensures high-quality outcomes. --- # 🏗️ Technical Architecture ## Current Implementation Architecture WHOOSH is currently implemented as a specialized council formation system integrated into the existing CHORUS stack, with clear separation between detection, composition, deployment, and monitoring concerns. ### High-Level System Flow ```mermaid graph TB subgraph "Gitea Repository Monitoring" REPO[Repository] --> ISSUE[Design Brief Issue] ISSUE --> LABEL[chorus-entrypoint] end subgraph "WHOOSH Council Formation" LABEL --> MONITOR[WHOOSH Monitor] MONITOR --> DETECT[Issue Detection] DETECT --> COMPOSE[Council Composition] COMPOSE --> ROLES[human-roles.yaml] end subgraph "CHORUS Deployment" COMPOSE --> DEPLOY[Docker Swarm Deploy] DEPLOY --> CHORUS1[CHORUS Agent 1] DEPLOY --> CHORUS2[CHORUS Agent 2] DEPLOY --> CHORUS3[CHORUS Agent N] end subgraph "P2P Collaboration" CHORUS1 --> P2P[P2P Network] CHORUS2 --> P2P CHORUS3 --> P2P P2P --> ARTIFACTS[Council Artifacts] end subgraph "Persistence Layer" DETECT --> PGDB[(PostgreSQL)] ARTIFACTS --> PGDB PGDB --> COUNCILS[Councils Table] PGDB --> AGENTS[Council Agents Table] PGDB --> OUTPUTS[Council Artifacts Table] end ``` ## Ecosystem Integration Points ### BZZZ Task Management Integration **Current**: Council artifacts provide structured input for BZZZ task creation - Council deliverables (manifests, DRs, scaffold plans) inform task breakdown structure - Project context and constraints flow from council decisions to task specifications - Council role recommendations influence team composition for ongoing development **Future**: Direct handoff mechanisms between councils and BZZZ teams - Automatic task generation based on scaffold plans - Agent transition from council roles to development team roles - Progress tracking continuity from kickoff through delivery ### SLURP Knowledge Integration **Current**: Council communications and artifacts preserved via UCXL addressing - All HMMM protocol messages stored with proper addressing for future reference - Decision rationale and evidence captured in structured format - Artifacts tagged with council ID and role attribution **Future**: Enhanced knowledge graph integration - Automated DR generation from council consensus decisions - Cross-project pattern recognition and reuse recommendations - Council effectiveness analytics based on project outcomes ### CHORUS Agent Ecosystem **Current**: CHORUS agents configured with council-specific roles and context - Role identifiers passed via environment variables from human-roles.yaml - Design Brief content provided as task context - P2P network access for inter-council communication **Future**: Enhanced agent capabilities for team transitions - Agent memory persistence across council and team phases - Specialized council expertise development over time - Cross-council knowledge sharing and best practice propagation ## Institutional Quality Gates - **Provenance present:** artifacts reference UCXL addresses and cite prior DRs. - **Secrets clean:** SHHH pass on channel logs and artifacts. - **Temporal pin:** decisions pin the **addressed** time slice (`~~/`, `#/`) used. ## Council Formation → SLURP Integration > WHOOSH composes councils → HMMM captures structured reasoning → **SLURP** ingests and packages kickoff artifacts for DR publication and future project reference. ### Current Artifact Flow ``` Design Brief Detection → Council Formation → HMMM Reasoning → Artifact Production → SLURP Ingestion ↓ UCXL-addressed storage ↓ Future project reference ``` ## Current Services Architecture ### WHOOSH Council Formation Stack ```mermaid graph TB subgraph "CHORUS Unified Stack" subgraph "Frontend Layer" UI[WHOOSH Dashboard] WS[WebSocket Council Updates] API[Council API] end subgraph "WHOOSH Services" MONITOR[Repository Monitor] COMPOSER[Council Composer] DEPLOYER[Agent Deployer] TRACKER[Progress Tracker] end subgraph "Data Layer" POSTGRES[(PostgreSQL)] COUNCILS[Councils Table] AGENTS[Council Agents Table] ARTIFACTS[Artifacts Table] end subgraph "CHORUS Agent Network" LEAD[Lead Design Director] ARCH[Senior Software Architect] SEC[Security Expert] DB[Database Engineer] STRAT[Marketing Strategist] end end subgraph "External Integrations" GITEA[Gitea Repository] DOCKER[Docker Swarm] P2P[P2P Network] SLURP[SLURP Knowledge Store] end GITEA --> MONITOR MONITOR --> COMPOSER COMPOSER --> DEPLOYER DEPLOYER --> DOCKER DOCKER --> LEAD DOCKER --> ARCH DOCKER --> SEC DOCKER --> DB DOCKER --> STRAT LEAD --> P2P ARCH --> P2P SEC --> P2P DB --> P2P STRAT --> P2P P2P --> ARTIFACTS ARTIFACTS --> SLURP MONITOR --> POSTGRES COMPOSER --> POSTGRES TRACKER --> POSTGRES ``` ### Future: Full Autonomous Team Architecture ```mermaid graph TB subgraph "Enhanced WHOOSH Platform" subgraph "Frontend Layer" DASH[Team Dashboard] METRICS[Analytics UI] CONTROL[Control Panel] end subgraph "Core Services" TEAM_COMPOSER[Team Composer] AGENT_MANAGER[Agent Manager] WORKFLOW_ENGINE[Workflow Engine] CONSENSUS[Consensus Engine] end subgraph "Intelligence Layer" CAPABILITY_MATCHER[Capability Matcher] PERFORMANCE_ANALYZER[Performance Analyzer] PREDICTOR[Team Success Predictor] end end subgraph "Autonomous Agent Ecosystem" SELF_ORG[Self-Organizing Agents] SPECIALIST[Domain Specialists] GENERALIST[Generalist Agents] end ``` ## Component Specifications ### Current Implementation Components #### 🗺️ Repository Monitor **Purpose**: Continuously monitors Gitea repositories for Design Brief issues that trigger council formation. **Key Responsibilities**: - Webhook-based repository event processing - Design Brief issue detection (`chorus-entrypoint` labels) - Repository sync status management (initial vs incremental) - Issue content extraction and context building **Current API Endpoints**: ```bash GET /api/repositories # List monitored repositories POST /api/repositories # Add repository to monitoring GET /api/repositories/{id}/issues # Get repository issues POST /webhooks/gitea # Gitea webhook endpoint ``` **Database Schema (Current)**: ```sql -- Repository monitoring repositories ( id UUID PRIMARY KEY, full_name VARCHAR(255) NOT NULL, gitea_id INTEGER NOT NULL, sync_status VARCHAR(50) DEFAULT 'pending', last_issue_sync TIMESTAMP, created_at TIMESTAMP DEFAULT NOW() ); -- Design Brief issues that trigger councils issues ( id UUID PRIMARY KEY, repository_id UUID REFERENCES repositories(id), gitea_id INTEGER NOT NULL, title VARCHAR(255) NOT NULL, body TEXT, labels JSONB, state VARCHAR(20) DEFAULT 'open', created_at TIMESTAMP DEFAULT NOW() ); ``` #### 🏢 Council Composer **Purpose**: Analyzes Design Briefs and determines optimal council composition based on project requirements. **Key Responsibilities**: - Design Brief content analysis - Role mapping from human-roles.yaml - Council size and composition optimization - Resource availability checking **Current Implementation**: ```go // Council composition logic func ComposeCouncil(designBrief DesignBrief) (*CouncilComposition, error) { projectType := analyzeProjectType(designBrief.Content) requiredDomains := extractDomains(designBrief.Content) baseRoles := getBaseRolesForType(projectType) additionalRoles := getAdditionalRoles(requiredDomains) return &CouncilComposition{ Roles: append(baseRoles, additionalRoles...), Size: len(baseRoles) + len(additionalRoles), ProjectContext: designBrief.Content, }, nil } ``` #### 🚀 Agent Deployer **Purpose**: Deploys CHORUS agents via Docker Swarm with council-specific configuration. **Key Responsibilities**: - Docker Swarm service creation and management - Agent environment variable configuration - P2P network setup for council communication - Service health monitoring and recovery **Current Deployment Logic**: ```go // Docker service deployment for council agents func DeployCouncilAgent(role string, councilID string, context string) error { serviceName := fmt.Sprintf("council-%s-%s", councilID, role) serviceSpec := swarm.ServiceSpec{ Annotations: swarm.Annotations{ Name: serviceName, }, TaskTemplate: swarm.TaskSpec{ ContainerSpec: &swarm.ContainerSpec{ Image: "anthonyrawlins/chorus:latest", Env: []string{ fmt.Sprintf("CHORUS_ROLE=%s", role), fmt.Sprintf("CHORUS_TASK_CONTEXT=%s", context), fmt.Sprintf("P2P_NETWORK=council-%s", councilID), }, Mounts: []mount.Mount{{ Type: mount.TypeBind, Source: "/rust/containers/WHOOSH/prompts", Target: "/app/prompts", }}, }, }, } return dockerClient.ServiceCreate(context.Background(), serviceSpec, types.ServiceCreateOptions{}) } ``` #### 📊 Progress Tracker **Purpose**: Monitors council progress and artifact production throughout the kickoff process. **Key Responsibilities**: - Council agent deployment status tracking - Artifact production monitoring - Decision consensus tracking - Council completion detection - Error handling and recovery coordination **Database Tracking (Current)**: ```sql -- Council progress tracking council_agents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), council_id UUID REFERENCES councils(id), role_name VARCHAR(100) NOT NULL, service_id VARCHAR(255), -- Docker service ID status VARCHAR(50) DEFAULT 'pending', deployed_at TIMESTAMP, UNIQUE(council_id, role_name) ); council_artifacts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), council_id UUID REFERENCES councils(id), artifact_type VARCHAR(50) NOT NULL, content TEXT, produced_by VARCHAR(255), status VARCHAR(50) DEFAULT 'draft', produced_at TIMESTAMP DEFAULT NOW() ); ``` ### Future Components #### 🤖 Enhanced Agent Manager (Planned) **Purpose**: Manages autonomous agent capabilities, performance, and self-organization for ongoing teams. **Future Responsibilities**: - Agent capability self-assessment - Dynamic team joining algorithms - Performance tracking and optimization - Cross-team agent coordination - Predictive team formation **Future Agent Self-Registration Protocol**: ```json { "agent_id": "chorus-agent-001", "name": "Senior Software Architect", "current_role": "senior-software-architect", "specializations": ["microservices", "system-design", "scalability"], "council_history": [ {"council_id": "marketplace-kickoff", "role": "architect", "rating": 4.8}, {"council_id": "analytics-platform", "role": "architect", "rating": 4.9} ], "capabilities": { "architecture_design": 0.95, "technology_selection": 0.90, "team_leadership": 0.85, "consensus_building": 0.88 }, "availability": { "current_councils": 1, "max_concurrent": 3, "preferred_domains": ["fintech", "ecommerce", "enterprise"] }, "learning_metrics": { "councils_completed": 47, "avg_artifact_quality": 4.7, "consensus_success_rate": 0.92, "stakeholder_satisfaction": 4.8 } } ``` **Current Council Health Monitoring**: ```go // Council agent health check type CouncilAgentHealth struct { AgentID string `json:"agent_id"` CouncilID string `json:"council_id"` Role string `json:"role"` ServiceID string `json:"service_id"` Status string `json:"status"` LastSeen time.Time `json:"last_seen"` ArtifactsCount int `json:"artifacts_produced"` P2PConnected bool `json:"p2p_connected"` ErrorMessage *string `json:"error_message,omitempty"` } // Future: Enhanced agent health with self-awareness type AutonomousAgentHealth struct { CouncilAgentHealth SelfAssessment struct { TaskFit float64 `json:"task_fit_confidence"` Workload float64 `json:"current_workload_percent"` Collaboration float64 `json:"team_collaboration_score"` LearningRate float64 `json:"recent_learning_velocity"` } `json:"self_assessment"` } ``` ## Current Data Architecture ### 🗄️ Council Database Schema (Implemented) **Current Tables**: ```sql -- Council management (from migrations/005_add_council_tables.up.sql) CREATE TABLE councils ( id UUID PRIMARY KEY, project_name VARCHAR(255) NOT NULL, repository VARCHAR(500) NOT NULL, project_brief TEXT NOT NULL, constraints TEXT, tech_limits TEXT, compliance_notes TEXT, targets TEXT, status VARCHAR(50) NOT NULL DEFAULT 'forming', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), task_id UUID REFERENCES tasks(id) ); -- Council agent tracking CREATE TABLE council_agents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), council_id UUID NOT NULL REFERENCES councils(id), role_name VARCHAR(100) NOT NULL, agent_name VARCHAR(255) NOT NULL, deployed BOOLEAN NOT NULL DEFAULT false, service_id VARCHAR(255), -- Docker service ID status VARCHAR(50) NOT NULL DEFAULT 'pending', UNIQUE(council_id, role_name) ); -- Council artifact production CREATE TABLE council_artifacts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), council_id UUID NOT NULL REFERENCES councils(id), artifact_type VARCHAR(50) NOT NULL, -- kickoff_manifest, seminal_dr, etc. content TEXT, produced_by VARCHAR(255), status VARCHAR(50) NOT NULL DEFAULT 'draft', produced_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Council decision tracking CREATE TABLE council_decisions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), council_id UUID NOT NULL REFERENCES councils(id), decision_type VARCHAR(50) NOT NULL, decision_title VARCHAR(255) NOT NULL, options JSONB, chosen_option JSONB, votes JSONB, decided_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); ``` **Performance Indexes**: ```sql -- Optimized for council operations CREATE INDEX idx_councils_status ON councils(status); CREATE INDEX idx_councils_repository ON councils(repository); CREATE INDEX idx_council_agents_council_id ON council_agents(council_id); CREATE INDEX idx_council_agents_deployed ON council_agents(deployed); CREATE INDEX idx_council_artifacts_type ON council_artifacts(artifact_type); ``` ### Future: Enhanced Team Database Schema **Planned Extensions for Autonomous Teams**: ```sql -- Team capability tracking (future) CREATE TABLE agent_capabilities ( agent_id UUID, capability_name VARCHAR(100), proficiency_score FLOAT, confidence_level FLOAT, last_updated TIMESTAMP ); -- Team performance metrics (future) CREATE TABLE team_performance ( team_id UUID, project_id UUID, success_metrics JSONB, completion_time INTERVAL, quality_score FLOAT, stakeholder_satisfaction FLOAT ); ``` ## Council Communication Architecture ### Current: P2P Council Network **HMMM Protocol Integration**: ```typescript interface CouncilMessage { messageId: string; councilId: string; fromRole: string; toRoles: string[]; // Broadcast or targeted messageType: 'reasoning' | 'decision' | 'artifact' | 'consensus'; content: { thinking: string; // HMMM reasoning chain evidence: any[]; // Supporting data recommendation?: any; // Proposed action/decision confidence: number; // 0.0 to 1.0 }; ucxlAddress: string; // UCXL addressing timestamp: string; } // Example council communication { "messageId": "msg_council_001", "councilId": "marketplace-kickoff", "fromRole": "senior-software-architect", "toRoles": ["lead-design-director", "database-engineer"], "messageType": "reasoning", "content": { "thinking": "Given the multi-vendor requirements, microservices architecture provides necessary isolation...", "evidence": ["scalability-requirements", "vendor-isolation-needs"], "recommendation": "microservices-with-api-gateway", "confidence": 0.87 }, "ucxlAddress": "ucxl://senior-software-architect@marketplace:kickoff#architecture/", "timestamp": "2025-01-12T10:30:00Z" } ``` ### Future: Enhanced Team Communication **Autonomous Team Coordination**: ```typescript interface TeamCoordinationMessage { teamId: string; phase: 'planning' | 'execution' | 'review' | 'integration'; priority: 'low' | 'medium' | 'high' | 'critical'; requiresConsensus: boolean; votingDeadline?: string; escalationPath?: string[]; } ``` ### 📡 Event Streaming **Event Bus Architecture**: ```python @dataclass class WHOOSHEvent: id: str type: str source: str timestamp: datetime data: Dict[str, Any] correlation_id: Optional[str] = None class EventBus: async def publish(self, event: WHOOSHEvent) -> None: """Publish event to all subscribers""" async def subscribe(self, event_type: str, handler: Callable) -> str: """Subscribe to specific event types""" async def unsubscribe(self, subscription_id: str) -> None: """Remove subscription""" ``` **Event Types**: ```python # Agent Events AGENT_REGISTERED = "agent.registered" AGENT_STATUS_CHANGED = "agent.status_changed" AGENT_PERFORMANCE_UPDATE = "agent.performance_update" # Task Events TASK_CREATED = "task.created" TASK_ASSIGNED = "task.assigned" TASK_STARTED = "task.started" TASK_COMPLETED = "task.completed" TASK_FAILED = "task.failed" # Workflow Events WORKFLOW_EXECUTION_STARTED = "workflow.execution_started" WORKFLOW_NODE_COMPLETED = "workflow.node_completed" WORKFLOW_EXECUTION_COMPLETED = "workflow.execution_completed" # System Events SYSTEM_ALERT = "system.alert" SYSTEM_MAINTENANCE = "system.maintenance" ``` ## Security Architecture ### 🔒 Authentication & Authorization **JWT Token Structure**: ```json { "sub": "user_id", "iat": 1625097600, "exp": 1625184000, "roles": ["admin", "developer"], "permissions": [ "workflows.create", "agents.manage", "executions.view" ], "tenant": "organization_id" } ``` **Permission Matrix**: ```yaml roles: admin: permissions: ["*"] description: "Full system access" developer: permissions: - "workflows.*" - "executions.*" - "agents.view" - "tasks.create" description: "Development and execution access" viewer: permissions: - "workflows.view" - "executions.view" - "agents.view" description: "Read-only access" ``` ### 🛡️ API Security **Rate Limiting**: ```python # Rate limits by endpoint and user role RATE_LIMITS = { "api.workflows.create": {"admin": 100, "developer": 50, "viewer": 0}, "api.executions.start": {"admin": 200, "developer": 100, "viewer": 0}, "api.agents.register": {"admin": 10, "developer": 0, "viewer": 0}, } ``` **Input Validation**: ```python from pydantic import BaseModel, validator class WorkflowCreateRequest(BaseModel): name: str description: Optional[str] n8n_data: Dict[str, Any] @validator('name') def validate_name(cls, v): if len(v) < 3 or len(v) > 255: raise ValueError('Name must be 3-255 characters') return v @validator('n8n_data') def validate_n8n_data(cls, v): required_fields = ['nodes', 'connections'] if not all(field in v for field in required_fields): raise ValueError('Invalid n8n workflow format') return v ``` ## Deployment Architecture ### 🐳 Container Strategy **Docker Compose Structure**: ```yaml version: '3.8' services: whoosh-coordinator: image: whoosh/coordinator:latest environment: - DATABASE_URL=postgresql://user:pass@postgres:5432/whoosh - REDIS_URL=redis://redis:6379 depends_on: [postgres, redis] whoosh-frontend: image: whoosh/frontend:latest environment: - API_URL=http://whoosh-coordinator:8000 depends_on: [whoosh-coordinator] postgres: image: postgres:15 environment: - POSTGRES_DB=whoosh - POSTGRES_USER=whoosh - POSTGRES_PASSWORD=${DB_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data redis: image: redis:7-alpine volumes: - redis_data:/data prometheus: image: prom/prometheus:latest volumes: - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml grafana: image: grafana/grafana:latest environment: - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD} volumes: - grafana_data:/var/lib/grafana ``` ### 🌐 Network Architecture **Production Network Topology**: ``` Internet ↓ [Traefik Load Balancer] (SSL Termination) ↓ [tengig Overlay Network] ↓ ┌─────────────────────────────────────┐ │ WHOOSH Application Services │ │ ├── Frontend (React) │ │ ├── Backend API (FastAPI) │ │ ├── WebSocket Gateway │ │ └── Task Queue Workers │ └─────────────────────────────────────┘ ↓ ┌─────────────────────────────────────┐ │ Data Services │ │ ├── PostgreSQL (Primary DB) │ │ ├── Redis (Cache + Sessions) │ │ ├── InfluxDB (Metrics) │ │ └── Prometheus (Monitoring) │ └─────────────────────────────────────┘ ↓ ┌─────────────────────────────────────┐ │ AI Agent Network (2-node cluster) │ │ ├── WALNUT (192.168.1.27:11434) │ │ │ └── ROCm (RX 9060 XT) │ │ └── ACACIA (192.168.1.72:11434) │ │ └── CUDA (RTX 2080 Super) │ └─────────────────────────────────────┘ ``` ## Performance Considerations ### 🚀 Optimization Strategies **Database Optimization**: - Connection pooling with asyncpg - Query optimization with proper indexing - Time-series data partitioning for metrics - Read replicas for analytics queries **Caching Strategy**: - Redis for session and temporary data - Application-level caching for expensive computations - CDN for static assets - Database query result caching **Concurrency Management**: - AsyncIO for I/O-bound operations - Connection pools for database and HTTP clients - Semaphores for limiting concurrent agent requests - Queue-based task processing ### 📊 Monitoring & Observability **Key Metrics**: ```yaml # Application Metrics - whoosh_active_agents_total - whoosh_task_queue_length - whoosh_workflow_executions_total - whoosh_api_request_duration_seconds - whoosh_websocket_connections_active # Infrastructure Metrics - whoosh_database_connections_active - whoosh_redis_memory_usage_bytes - whoosh_container_cpu_usage_percent - whoosh_container_memory_usage_bytes # Business Metrics - whoosh_workflows_created_daily - whoosh_execution_success_rate - whoosh_agent_utilization_percent - whoosh_average_task_completion_time ``` **Alerting Rules**: ```yaml groups: - name: whoosh.rules rules: - alert: HighErrorRate expr: rate(whoosh_api_errors_total[5m]) > 0.1 for: 2m labels: severity: warning annotations: summary: "High error rate detected" - alert: AgentDown expr: whoosh_agent_health_status == 0 for: 1m labels: severity: critical annotations: summary: "Agent {{ $labels.agent_id }} is down" ``` This architecture provides a solid foundation for the unified WHOOSH platform, combining the best practices from our existing distributed AI projects while ensuring scalability, maintainability, and observability.