Files
WHOOSH/docs/BACKEND_ARCHITECTURE.md
Claude Code 9aeaa433fc Fix Docker Swarm discovery network name mismatch
- Changed NetworkName from 'chorus_default' to 'chorus_net'
- This matches the actual network 'CHORUS_chorus_net' (service prefix added automatically)
- Fixes discovered_count:0 issue - now successfully discovering all 25 agents
- Updated IMPLEMENTATION-SUMMARY with deployment status

Result: All 25 CHORUS agents now discovered successfully via Docker Swarm API

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 10:35:25 +11:00

41 KiB

WHOOSH Backend Architecture Documentation

Version: 0.1.1-debug Last Updated: October 2025 Status: Beta (MVP + Council Formation)


Table of Contents

  1. System Overview
  2. Architecture Patterns
  3. Core Components
  4. Database Architecture
  5. API Layer
  6. External Service Integrations
  7. Orchestration & Deployment
  8. Configuration Management
  9. Security & Authentication
  10. Observability
  11. Development Workflow

System Overview

WHOOSH is an autonomous AI development team orchestration system built in Go. It monitors Gitea repositories for Design Brief issues, forms project kickoff councils, composes teams, and deploys CHORUS AI agents to Docker Swarm for autonomous development work.

Current Status

Working:

  • Gitea Design Brief detection + council composition
  • Docker Swarm agent deployment with role-specific environment variables
  • JWT authentication, rate limiting, OpenTelemetry hooks
  • Repository monitoring and issue synchronization
  • Team composition with heuristic-based analysis

🚧 Under Construction:

  • API persistence (REST handlers return placeholder data while Postgres wiring is finished)
  • Analysis ingestion (composer relies on heuristic classification; LLM analysis is logged but unimplemented)
  • Deployment telemetry (results aren't persisted yet)
  • Autonomous team joining and role balancing

Technology Stack

  • Language: Go 1.22+ (toolchain go1.24.5)
  • Web Framework: go-chi/chi/v5 (HTTP router)
  • Database: PostgreSQL (pgx/v5 driver)
  • Container Orchestration: Docker Swarm API
  • Migrations: golang-migrate/migrate/v4
  • Logging: zerolog (structured logging)
  • Tracing: OpenTelemetry + Jaeger
  • Authentication: JWT (golang-jwt/jwt/v5)
  • External Services: Gitea API, BACKBEAT timing system, N8N workflows

Architecture Patterns

1. Layered Architecture

┌─────────────────────────────────────────┐
│         API Layer (server/)             │  HTTP Handlers, Routing, Middleware
├─────────────────────────────────────────┤
│      Business Logic Layer               │
│  ┌─────────────┬──────────────────────┐ │
│  │  Composer   │   Orchestrator       │ │  Team Formation, Agent Deployment
│  ├─────────────┼──────────────────────┤ │
│  │  Monitor    │   Council            │ │  Repository Sync, Council Formation
│  └─────────────┴──────────────────────┘ │
├─────────────────────────────────────────┤
│    Integration Layer                    │  Gitea, Docker, BACKBEAT, N8N
├─────────────────────────────────────────┤
│    Data Layer (database/)               │  PostgreSQL Connection Pool
└─────────────────────────────────────────┘

2. Service-Oriented Design

Each internal package represents a distinct service with clear responsibilities:

  • Composer: Task analysis and team composition
  • Orchestrator: Container deployment and scaling
  • Monitor: Repository monitoring and issue ingestion
  • Council: Project kickoff council formation
  • Gitea Client: Gitea API integration
  • Agent Registry: Agent lifecycle management

3. Context-Driven Execution

All operations use Go context for:

  • Request tracing (OpenTelemetry spans)
  • Timeout management
  • Graceful cancellation
  • Propagation of request-scoped values

Core Components

1. Server (internal/server/)

Responsibilities:

  • HTTP server lifecycle management
  • Router configuration (chi)
  • Middleware setup (CORS, auth, rate limiting, security headers)
  • Health check endpoints
  • API route registration

Key Files:

  • server.go: Main server struct, initialization, routing setup

Initialization Flow:

1. Load configuration from environment variables
2. Initialize database connection pool
3. Initialize external service clients (Gitea, Docker)
4. Create business logic services (composer, orchestrator, monitor)
5. Setup router with middleware
6. Register API routes
7. Start background services (monitor, P2P discovery, agent registry)
8. Start HTTP server

API Routes (v1):

  • /api/v1/teams - Team management
  • /api/v1/tasks - Task ingestion and management
  • /api/v1/projects - Project management (Gitea repositories)
  • /api/v1/agents - Agent registration and status
  • /api/v1/repositories - Repository monitoring configuration
  • /api/v1/councils - Council management and artifacts
  • /api/v1/assignments - Agent assignment broker (if Docker enabled)
  • /api/v1/scaling - Wave-based scaling API (if Docker enabled)
  • /api/v1/slurp - SLURP proxy for UCXL content submission
  • /api/v1/backbeat - BACKBEAT status monitoring

2. Monitor (internal/monitor/)

Responsibilities:

  • Periodic repository synchronization (default: 5 minutes)
  • Issue detection and ingestion from Gitea
  • Design Brief detection for council formation
  • Task creation and updates in database
  • Triggering team composition or council formation

Key Features:

  • Incremental sync using since parameter (after initial scan)
  • Label-based filtering (e.g., bzzz-task, chorus-entrypoint)
  • Support for multiple sync states: pending, initial_scan, active, error, disabled
  • Automatic transition from initial scan to active when content found

Council Detection Logic:

func isProjectKickoffBrief(issue) bool {
    // Must have "chorus-entrypoint" label
    // Must have "Design Brief" in title
    return hasChorusEntrypoint && containsDesignBrief
}

Sync Flow:

1. Get all monitored repositories (WHERE monitor_issues = true)
2. For each repository:
   a. Fetch issues from Gitea API
   b. Filter by CHORUS labels if enabled
   c. Create or update task records
   d. Check for Design Brief issues → trigger council formation
   e. Check for bzzz-task issues → trigger team composition
   f. Update repository sync timestamps
3. Log sync results and statistics

3. Composer (internal/composer/)

Responsibilities:

  • Task classification (feature, bug fix, security, etc.)
  • Complexity analysis and risk assessment
  • Skill requirement extraction
  • Team composition and agent matching
  • Team persistence to database

Configuration:

type ComposerConfig struct {
    ClassificationModel  string  // LLM model for classification
    SkillAnalysisModel   string  // LLM model for skill analysis
    MatchingModel        string  // LLM model for team matching
    DefaultStrategy      string  // "minimal_viable"
    MinTeamSize          int     // 1
    MaxTeamSize          int     // 3
    SkillMatchThreshold  float64 // 0.6
    AnalysisTimeoutSecs  int     // 30-60
    FeatureFlags         FeatureFlags
}

Feature Flags:

  • EnableLLMClassification: Use LLM vs heuristics (default: false)
  • EnableLLMSkillAnalysis: Use LLM vs heuristics (default: false)
  • EnableLLMTeamMatching: Use LLM vs heuristics (default: false)
  • EnableFailsafeFallback: Fallback to heuristics on LLM failure (default: true)

Analysis Pipeline:

TaskAnalysisInput
    ↓
1. classifyTask() → TaskClassification
    - determineTaskType() [heuristic or LLM]
    - estimateComplexity()
    - identifyDomains()
    ↓
2. analyzeSkillRequirements() → SkillRequirements
    - Map domains to skills
    - Determine critical vs desirable
    ↓
3. getAvailableAgents() → []*Agent
    ↓
4. composeTeam() → TeamComposition
    - selectRequiredRoles()
    - matchAgentsToRoles()
    - calculateConfidence()
    ↓
5. CreateTeam() → Team (persisted to DB)

Task Types:

  • feature_development
  • bug_fix
  • refactoring
  • security
  • integration
  • migration
  • research
  • optimization
  • maintenance

4. Council (internal/council/)

Responsibilities:

  • Project kickoff council formation
  • Core agent selection (Product Manager, Engineering Lead, Quality Lead)
  • Optional agent selection (Security, DevOps, UX)
  • Council composition persistence

Council Composition:

type CouncilComposition struct {
    CouncilID      uuid.UUID
    ProjectName    string
    CoreAgents     []CouncilAgent    // PM, Eng Lead, QA Lead
    OptionalAgents []CouncilAgent    // Security, DevOps, UX
    Strategy       string
    Status         string
}

Council Roles:

  • Core Agents (always deployed):

    • Product Manager (PM)
    • Engineering Lead (eng-lead)
    • Quality Lead (qa-lead)
  • Optional Agents (deployed based on project needs):

    • Security Lead (sec-lead)
    • DevOps Lead (devops-lead)
    • UX Lead (ux-lead)

5. Orchestrator (internal/orchestrator/)

Responsibilities:

  • Docker Swarm service deployment
  • Agent container configuration
  • Resource allocation (CPU/memory limits)
  • Volume mounting and network configuration
  • Service scaling and health monitoring

Components:

SwarmManager (swarm_manager.go)

  • Docker Swarm API client wrapper
  • Service creation, scaling, removal
  • Task monitoring and status tracking

Key Methods:

DeployAgent(config *AgentDeploymentConfig) (*swarm.Service, error)
ScaleService(serviceName string, replicas int) error
GetServiceStatus(serviceName string) (*ServiceStatus, error)
RemoveAgent(serviceID string) error

AgentDeployer (agent_deployer.go)

  • Team agent deployment orchestration
  • Council agent deployment orchestration
  • Agent assignment to CHORUS containers

Deployment Flow:

DeploymentRequest
    ↓
1. For each agent in team/council:
   a. selectAgentImage() → CHORUS image
   b. buildAgentEnvironment() → env vars
   c. buildAgentVolumes() → Docker socket + workspace
   d. calculateResources() → CPU/memory limits
   e. deploySingleAgent() → Swarm service
    ↓
2. recordDeployment() → Update database
3. updateTeamDeploymentStatus() → Track overall status

Agent Environment Variables:

CHORUS_AGENT_NAME=<role_name>        # Maps to human-roles.yaml
CHORUS_TEAM_ID=<uuid>
CHORUS_TASK_ID=<uuid>
CHORUS_PROJECT=<repository>
CHORUS_TASK_TITLE=<title>
CHORUS_TASK_DESC=<description>
CHORUS_PRIORITY=<priority>
CHORUS_EXTERNAL_URL=<issue_url>
WHOOSH_COORDINATOR=true
WHOOSH_ENDPOINT=http://whoosh:8080
DOCKER_HOST=unix:///var/run/docker.sock

Resource Allocation:

ResourceLimits{
    CPULimit:      1000000000,  // 1 CPU core
    MemoryLimit:   1073741824,  // 1 GB RAM
    CPURequest:    500000000,   // 0.5 CPU core
    MemoryRequest: 536870912,   // 512 MB RAM
}

Scaling System (scaling_*.go)

  • Wave-based scaling controller
  • Bootstrap pool manager
  • Assignment broker
  • Health gates (KACHING, BACKBEAT, CHORUS)
  • Metrics collector

Scaling Components:

  • ScalingController: Coordinates scaling operations
  • BootstrapPoolManager: Manages pre-warmed agent pool
  • AssignmentBroker: Assigns tasks to available agents
  • HealthGates: Checks system health before scaling
  • ScalingMetricsCollector: Tracks scaling operation metrics

6. Gitea Client (internal/gitea/)

Responsibilities:

  • Gitea API client with retry logic
  • Issue listing and retrieval
  • Repository information fetching
  • Label management and creation
  • Webhook payload parsing

Configuration Options:

type GITEAConfig struct {
    BaseURL          string        // Gitea instance URL
    Token            string        // API token
    TokenFile        string        // Token from file
    WebhookPath      string        // Webhook endpoint path
    WebhookToken     string        // Webhook secret
    EagerFilter      bool          // Pre-filter by labels at API level
    FullRescan       bool          // Ignore since parameter for full rescan
    DebugURLs        bool          // Log exact URLs
    MaxRetries       int           // Retry attempts (default: 3)
    RetryDelay       time.Duration // Delay between retries (default: 2s)
}

Retry Logic:

  • Automatic retry on 5xx errors and 429 (rate limiting)
  • Configurable max retries and delay
  • No retry on 4xx client errors
  • Exponential backoff via configured delay

Issue Fetching:

func GetIssues(owner, repo string, opts IssueListOptions) ([]Issue, error)
    - Supports state filtering (open/closed/all)
    - Label filtering (eager at API or in-code)
    - Since parameter for incremental sync
    - Pagination support

Label Management:

func EnsureRequiredLabels(owner, repo string) error
    - Creates standardized labels:
      - bug, enhancement, duplicate, invalid, etc.
      - bzzz-task (CHORUS task marker)
      - chorus-entrypoint (Design Brief marker)

7. BACKBEAT Integration (internal/backbeat/)

Responsibilities:

  • Integration with BACKBEAT timing system (NATS-based)
  • Beat-synchronized status emission
  • Search operation tracking
  • Health monitoring

Key Concepts:

  • Beat: Regular timing event (every 30 seconds at 2 BPM default)
  • Downbeat: Bar start event (every 4 beats = 2 minutes)
  • StatusClaim: Progress update emitted to NATS

Search Operation Phases:

PhaseStarted  PhaseIndexing  PhaseQuerying  PhaseRanking  PhaseCompleted/PhaseFailed

Integration Flow:

1. Start(ctx) → Connect to NATS cluster
2. OnBeat() → Emit status claims every beat
3. OnDownbeat() → Cleanup completed operations
4. StartSearch() → Register new search operation
5. UpdateSearchPhase() → Update operation progress
6. CompleteSearch() → Mark operation complete

8. Authentication & Security (internal/auth/)

Components:

Middleware (middleware.go)

  • JWT token validation
  • Service token authentication
  • Admin role checking
  • Request authentication

Methods:

Authenticate(next http.Handler) http.Handler        // Generic auth
ServiceTokenRequired(next http.Handler) http.Handler  // Service tokens only
AdminRequired(next http.Handler) http.Handler        // Admin role required

Rate Limiter (ratelimit.go)

  • IP-based rate limiting
  • Configurable requests per time window
  • In-memory storage with automatic cleanup

Default Configuration:

RateLimiter{
    RequestsPerMinute: 100,
    CleanupInterval:   time.Minute,
}

9. Validation (internal/validation/)

Security Headers:

func SecurityHeaders(next http.Handler) http.Handler
    - X-Content-Type-Options: nosniff
    - X-Frame-Options: DENY
    - X-XSS-Protection: 1; mode=block
    - Content-Security-Policy: default-src 'self'

Input Validation:

  • UUID validation
  • Request body size limits
  • Content-Type validation

10. Tracing (internal/tracing/)

OpenTelemetry Integration:

  • Jaeger exporter for distributed tracing
  • Span creation for key operations
  • Context propagation across services
  • Performance monitoring

Span Types:

StartSpan(ctx, "operation_name")  Generic span
StartMonitorSpan(ctx, "operation", "repository")  Repository monitoring
StartCouncilSpan(ctx, "operation", "council_id")  Council operations
StartDeploymentSpan(ctx, "operation", "resource_id")  Deployment operations

Configuration:

type OpenTelemetryConfig struct {
    Enabled        bool
    ServiceName    string   // "whoosh"
    ServiceVersion string   // "1.0.0"
    Environment    string   // "production"
    JaegerEndpoint string   // "http://localhost:14268/api/traces"
    SampleRate     float64  // 1.0 (100%)
}

Database Architecture

Schema Overview

Core Tables:

  1. teams - Team records
  2. team_roles - Role definitions (executor, coordinator, reviewer)
  3. team_assignments - Agent-to-role assignments
  4. agents - AI agent registry
  5. tasks - Task records from Gitea/external sources
  6. repositories - Monitored repository configurations
  7. repository_sync_logs - Sync operation history
  8. councils - Project kickoff council records
  9. council_agents - Council agent assignments
  10. council_artifacts - Council-generated artifacts

Key Relationships

repositories (1) ──→ (N) tasks
tasks (1) ──→ (1) teams (assigned_team_id)
tasks (1) ──→ (1) councils (via task_id)
teams (1) ──→ (N) team_assignments
team_assignments (N) ──→ (1) agents
team_assignments (N) ──→ (1) team_roles
councils (1) ──→ (N) council_agents

Migration System

Location: /migrations/*.sql

Migration Files:

  1. 001_init_schema.up.sql - Initial teams, agents, roles
  2. 002_add_tasks_table.up.sql - Task management
  3. 003_add_repositories_table.up.sql - Repository monitoring
  4. 004_enhance_task_team_integration.up.sql - Enhanced relationships
  5. 005_add_council_tables.up.sql - Council management
  6. 006_add_performance_indexes.up.sql - Query optimization
  7. 007_add_team_deployment_status.up.sql - Deployment tracking

Running Migrations:

# Automatic on startup (if AutoMigrate=true)
WHOOSH_DATABASE_AUTO_MIGRATE=true go run ./cmd/whoosh

# Manual via migrate CLI
migrate -database "postgres://..." -path ./migrations up

Connection Pooling

type DatabaseConfig struct {
    MaxOpenConns  int  // 25 (default)
    MaxIdleConns  int  // 5 (default)
    MaxConnLifetime time.Duration  // 1 hour
    MaxConnIdleTime time.Duration  // 30 minutes
}

Key Indexes

Performance Indexes:

-- Agent availability
idx_agents_status_last_seen ON agents(status, last_seen)

-- Repository lookups
idx_repositories_full_name_lookup ON repositories(full_name)
idx_repositories_last_issue_sync ON repositories(last_issue_sync)

-- Task lookups
idx_tasks_external_source_lookup ON tasks(external_id, source_type)
idx_tasks_repository_id ON tasks(repository_id)
idx_tasks_assigned_team_id ON tasks(assigned_team_id)

-- Team deployment
idx_teams_deployment_status ON teams(deployment_status)

API Layer

Request/Response Format

Standard Response:

{
  "status": "success",
  "data": { ... },
  "message": "Operation completed successfully"
}

Error Response:

{
  "status": "error",
  "error": "Error message",
  "details": { ... }
}

Authentication

JWT Token Format:

Authorization: Bearer <jwt_token>

Service Token Format:

Authorization: Bearer <service_token>

Key API Endpoints

Teams API

GET    /api/v1/teams              - List all teams (with pagination)
POST   /api/v1/teams              - Create new team (admin only)
GET    /api/v1/teams/{teamID}     - Get team details
PUT    /api/v1/teams/{teamID}/status - Update team status (admin only)
POST   /api/v1/teams/analyze      - Analyze task for team composition

Tasks API

GET    /api/v1/tasks              - List all tasks
POST   /api/v1/tasks/ingest       - Ingest task from external source (service token)
GET    /api/v1/tasks/{taskID}     - Get task details

Projects API (Gitea Repositories)

GET    /api/v1/projects           - List all projects
POST   /api/v1/projects           - Create new project (admin only)
GET    /api/v1/projects/{projectID} - Get project details
GET    /api/v1/projects/{projectID}/tasks - List project tasks
POST   /api/v1/projects/{projectID}/tasks/{taskNumber}/claim - Claim task

Repositories API

GET    /api/v1/repositories       - List monitored repositories
POST   /api/v1/repositories       - Add repository for monitoring (admin only)
GET    /api/v1/repositories/{repoID} - Get repository details
PUT    /api/v1/repositories/{repoID} - Update repository config (admin only)
POST   /api/v1/repositories/{repoID}/sync - Trigger manual sync (admin only)
POST   /api/v1/repositories/{repoID}/ensure-labels - Create standard labels (admin only)
GET    /api/v1/repositories/{repoID}/logs - Get sync logs

Councils API

GET    /api/v1/councils/{councilID} - Get council details
GET    /api/v1/councils/{councilID}/artifacts - List council artifacts
POST   /api/v1/councils/{councilID}/artifacts - Create artifact (admin only)

Agents API

GET    /api/v1/agents             - List all agents
POST   /api/v1/agents/register    - Register new agent
PUT    /api/v1/agents/{agentID}/status - Update agent status

Scaling API (if Docker enabled)

GET    /api/v1/scaling/status     - Get scaling system status
POST   /api/v1/scaling/scale-up   - Manually trigger scale-up
POST   /api/v1/scaling/scale-down - Manually trigger scale-down
GET    /api/v1/scaling/metrics    - Get scaling metrics

Health & Monitoring

GET    /health                    - Basic health check
GET    /health/ready              - Readiness probe
GET    /admin/health/details      - Detailed health information
GET    /api/v1/backbeat/status    - BACKBEAT integration status

Webhook Endpoints

Gitea Webhook

POST   /webhooks/gitea            - Receive Gitea webhook events

Supported Events:

  • issues - Issue opened/closed/edited
  • issue_comment - Comment added
  • push - Code pushed
  • pull_request - PR opened/merged

Webhook Security:

  • HMAC signature verification using webhook token
  • X-Gitea-Signature header validation

External Service Integrations

1. Gitea Integration

Base URL: Configured via WHOOSH_GITEA_BASE_URL

Authentication: API token (from file or environment)

API Operations:

  • List repositories
  • Get repository details
  • List issues (with filtering)
  • Get issue details
  • Create/manage labels
  • Test connection

Webhook Integration:

  • Receives issue events (create, update, close)
  • Triggers team composition or council formation
  • Updates task status in database

2. Docker Swarm Integration

Socket: Unix socket (/var/run/docker.sock) or TCP

Operations:

  • Service creation (ServiceCreate)
  • Service scaling (ServiceUpdate)
  • Service inspection (ServiceInspectWithRaw)
  • Task listing (TaskList)
  • Service removal (ServiceRemove)
  • Service logs (ServiceLogs)

Network: Agents deployed to chorus_default network by default

Image Registry: registry.home.deepblack.cloud (private registry)

Standard Image: docker.io/anthonyrawlins/chorus:backbeat-v2.0.1

3. BACKBEAT Integration

Protocol: NATS messaging

NATS URL: Configured via WHOOSH_BACKBEAT_NATS_URL

Operations:

  • Beat synchronization (30-second intervals at 2 BPM)
  • Status claim emission
  • Health monitoring
  • Task progress tracking

Health Indicators:

  • Connected to NATS cluster
  • Current beat index
  • Measured BPM vs target tempo
  • Tempo drift
  • Reconnection count
  • Active searches/operations

4. N8N Workflows

Base URL: https://n8n.home.deepblack.cloud

Integration Points:

  • Gitea webhook → N8N → BZZZ task coordination
  • WHOOSH events → N8N → External notifications
  • Council formation → N8N → Project initialization workflows

5. SLURP (UCXL Content System)

Purpose: UCXL address-based artifact storage

API Endpoints:

  • POST /api/v1/slurp/submit - Submit artifact to SLURP
  • GET /api/v1/slurp/artifacts/{ucxlAddr} - Retrieve artifact

Use Cases:

  • Decision records (BUBBLE integration)
  • Council artifacts (project documentation)
  • Compliance documentation

Configuration Management

Environment Variables

Database Configuration:

WHOOSH_DATABASE_HOST=localhost
WHOOSH_DATABASE_PORT=5432
WHOOSH_DATABASE_DB_NAME=whoosh
WHOOSH_DATABASE_USERNAME=whoosh
WHOOSH_DATABASE_PASSWORD=<password>
WHOOSH_DATABASE_PASSWORD_FILE=/secrets/db_password  # Alternative
WHOOSH_DATABASE_SSL_MODE=disable
WHOOSH_DATABASE_AUTO_MIGRATE=true
WHOOSH_DATABASE_MAX_OPEN_CONNS=25
WHOOSH_DATABASE_MAX_IDLE_CONNS=5

Server Configuration:

WHOOSH_SERVER_LISTEN_ADDR=:8080
WHOOSH_SERVER_READ_TIMEOUT=30s
WHOOSH_SERVER_WRITE_TIMEOUT=30s
WHOOSH_SERVER_SHUTDOWN_TIMEOUT=30s
WHOOSH_SERVER_ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8080
WHOOSH_SERVER_ALLOWED_ORIGINS_FILE=/secrets/allowed_origins  # Alternative

Gitea Configuration:

WHOOSH_GITEA_BASE_URL=http://ironwood:3000
WHOOSH_GITEA_TOKEN=<token>
WHOOSH_GITEA_TOKEN_FILE=/secrets/gitea_token  # Alternative
WHOOSH_GITEA_WEBHOOK_PATH=/webhooks/gitea
WHOOSH_GITEA_WEBHOOK_TOKEN=<secret>
WHOOSH_GITEA_WEBHOOK_TOKEN_FILE=/secrets/webhook_token  # Alternative
WHOOSH_GITEA_EAGER_FILTER=true
WHOOSH_GITEA_FULL_RESCAN=false
WHOOSH_GITEA_DEBUG_URLS=false
WHOOSH_GITEA_MAX_RETRIES=3
WHOOSH_GITEA_RETRY_DELAY=2s

Authentication Configuration:

WHOOSH_AUTH_JWT_SECRET=<secret_min_32_chars>
WHOOSH_AUTH_JWT_SECRET_FILE=/secrets/jwt_secret  # Alternative
WHOOSH_AUTH_SERVICE_TOKENS=token1,token2,token3
WHOOSH_AUTH_SERVICE_TOKENS_FILE=/secrets/service_tokens  # Alternative
WHOOSH_AUTH_JWT_EXPIRY=24h

Logging Configuration:

WHOOSH_LOGGING_LEVEL=debug  # debug, info, warn, error
WHOOSH_LOGGING_ENVIRONMENT=development  # development, production
LOG_LEVEL=info  # Alternative for zerolog
ENVIRONMENT=development  # Enables pretty logging

Team Composer Configuration:

# LLM-based analysis (experimental, default: false)
WHOOSH_COMPOSER_ENABLE_LLM_CLASSIFICATION=false
WHOOSH_COMPOSER_ENABLE_LLM_SKILL_ANALYSIS=false
WHOOSH_COMPOSER_ENABLE_LLM_TEAM_MATCHING=false

# Analysis features
WHOOSH_COMPOSER_ENABLE_COMPLEXITY_ANALYSIS=true
WHOOSH_COMPOSER_ENABLE_RISK_ASSESSMENT=true
WHOOSH_COMPOSER_ENABLE_ALTERNATIVE_OPTIONS=false

# Debug and monitoring
WHOOSH_COMPOSER_ENABLE_ANALYSIS_LOGGING=true
WHOOSH_COMPOSER_ENABLE_PERFORMANCE_METRICS=true
WHOOSH_COMPOSER_ENABLE_FAILSAFE_FALLBACK=true

# LLM model configuration
WHOOSH_COMPOSER_CLASSIFICATION_MODEL=llama3.1:8b
WHOOSH_COMPOSER_SKILL_ANALYSIS_MODEL=llama3.1:8b
WHOOSH_COMPOSER_MATCHING_MODEL=llama3.1:8b

# Performance settings
WHOOSH_COMPOSER_ANALYSIS_TIMEOUT_SECS=60
WHOOSH_COMPOSER_SKILL_MATCH_THRESHOLD=0.6

BACKBEAT Configuration:

WHOOSH_BACKBEAT_ENABLED=true
WHOOSH_BACKBEAT_CLUSTER_ID=chorus-production
WHOOSH_BACKBEAT_AGENT_ID=whoosh
WHOOSH_BACKBEAT_NATS_URL=nats://backbeat-nats:4222

Docker Configuration:

WHOOSH_DOCKER_ENABLED=true
WHOOSH_DOCKER_HOST=unix:///var/run/docker.sock

OpenTelemetry Configuration:

WHOOSH_OPENTELEMETRY_ENABLED=true
WHOOSH_OPENTELEMETRY_SERVICE_NAME=whoosh
WHOOSH_OPENTELEMETRY_SERVICE_VERSION=1.0.0
WHOOSH_OPENTELEMETRY_ENVIRONMENT=production
WHOOSH_OPENTELEMETRY_JAEGER_ENDPOINT=http://localhost:14268/api/traces
WHOOSH_OPENTELEMETRY_SAMPLE_RATE=1.0

N8N Configuration:

WHOOSH_N8N_BASE_URL=https://n8n.home.deepblack.cloud

Configuration Loading

Priority Order:

  1. Environment variables
  2. Secret files (if *_FILE variant specified)
  3. Default values in code

Secret File Loading:

// Example: JWT secret loading
if cfg.Auth.JWTSecretFile != "" {
    secret, err := readSecretFile(cfg.Auth.JWTSecretFile)
    cfg.Auth.JWTSecret = secret
}

Validation:

func (c *Config) Validate() error {
    - Check required fields (database password, Gitea token, etc.)
    - Build database URL if not provided
    - Validate CORS origins
    - Ensure JWT secret meets minimum length
    - Validate service tokens present
}

Security & Authentication

Authentication Mechanisms

1. JWT Authentication

  • Used for user/admin API access
  • Token expiry: 24 hours (configurable)
  • Claims include: user_id, role, issued_at, expires_at
  • Validated on protected endpoints via middleware

2. Service Token Authentication

  • Used for service-to-service communication
  • Static tokens configured via environment
  • Required for task ingestion endpoints
  • Validated via ServiceTokenRequired middleware

3. Admin Role Enforcement

  • Admin-only endpoints protected via AdminRequired middleware
  • Role claim must be "admin" in JWT
  • Used for repository management, team creation, etc.

Security Headers

Applied to all responses:

X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src 'self'

CORS Configuration

  • Allowed origins: Configured via environment
  • Allowed methods: GET, POST, PUT, DELETE, OPTIONS
  • Credentials: Enabled
  • Max age: 300 seconds

Rate Limiting

  • Default: 100 requests per minute per IP
  • In-memory storage with automatic cleanup
  • Applied globally via middleware

Webhook Security

  • Gitea webhooks: HMAC signature verification
  • Token stored securely (from file or environment)
  • Signature header: X-Gitea-Signature

Secret Management

Best Practices:

  • Use *_FILE environment variables for secrets
  • Mount secrets as files in Docker Swarm
  • Never commit secrets to Git
  • Rotate tokens regularly

Example Docker Secret:

secrets:
  gitea_token:
    file: /path/to/gitea_token.txt

services:
  whoosh:
    secrets:
      - gitea_token
    environment:
      WHOOSH_GITEA_TOKEN_FILE: /run/secrets/gitea_token

Observability

Logging

Structured Logging (zerolog):

log.Info().
    Str("team_id", teamID).
    Int("agent_count", count).
    Dur("duration", duration).
    Msg("Team deployed successfully")

Log Levels:

  • debug: Detailed debugging information
  • info: General information messages
  • warn: Warning messages (recoverable errors)
  • error: Error messages (operation failures)

Pretty Logging:

  • Enabled in development mode
  • Human-readable console output
  • Colored output for log levels

Distributed Tracing

OpenTelemetry + Jaeger:

ctx, span := tracing.StartSpan(ctx, "operation_name")
defer span.End()

span.SetAttributes(
    attribute.String("resource.id", id),
    attribute.Int("resource.count", count),
)

// On error
tracing.SetSpanError(span, err)

Trace Propagation:

  • Context passed through entire request lifecycle
  • Spans created at key operations:
    • HTTP request handling
    • Database queries
    • External API calls
    • Docker operations
    • Council/team operations

Jaeger UI:

  • Access at: http://localhost:16686
  • View traces by service, operation, duration
  • Analyze performance bottlenecks
  • Debug distributed operations

Health Checks

Basic Health Check (/health):

{
  "status": "ok",
  "service": "whoosh",
  "version": "0.1.0-mvp",
  "backbeat": {
    "enabled": true,
    "connected": true,
    "current_beat": 12345
  }
}

Readiness Check (/health/ready):

{
  "status": "ready",
  "database": "connected"
}

Detailed Health (/admin/health/details):

{
  "service": "whoosh",
  "version": "0.1.1-debug",
  "timestamp": 1696118400,
  "status": "healthy",
  "components": {
    "database": {
      "status": "healthy",
      "type": "postgresql",
      "statistics": {
        "max_conns": 25,
        "acquired_conns": 3,
        "idle_conns": 5
      }
    },
    "gitea": {
      "status": "healthy",
      "endpoint": "http://ironwood:3000"
    },
    "backbeat": {
      "status": "healthy",
      "connected": true,
      "current_tempo": 2
    },
    "docker_swarm": {
      "status": "unknown",
      "note": "Health check not implemented"
    }
  }
}

Metrics

Database Metrics:

  • Connection pool statistics
  • Active connections
  • Idle connections
  • Query duration

Deployment Metrics (via ScalingMetricsCollector):

  • Wave execution count
  • Agent deployment success/failure rate
  • Average deployment duration
  • Error rate

BACKBEAT Metrics:

  • Current beat index
  • Tempo (BPM)
  • Tempo drift
  • Reconnection count
  • Active operations

Development Workflow

Running Locally

Prerequisites:

# Install Go 1.22+
go version

# Install PostgreSQL 14+
psql --version

# Install Docker (for Swarm testing)
docker version

Setup:

# 1. Clone repository
git clone https://gitea.chorus.services/tony/WHOOSH.git
cd WHOOSH

# 2. Copy environment configuration
cp .env.example .env
# Edit .env with local values

# 3. Start PostgreSQL (Docker example)
docker run -d \
  --name whoosh-postgres \
  -e POSTGRES_DB=whoosh \
  -e POSTGRES_USER=whoosh \
  -e POSTGRES_PASSWORD=whoosh \
  -p 5432:5432 \
  postgres:15

# 4. Run migrations
make migrate
# Or manual:
# migrate -database "postgres://whoosh:whoosh@localhost:5432/whoosh?sslmode=disable" -path ./migrations up

# 5. Run the server
go run ./cmd/whoosh
# Or with hot reload:
# air (requires cosmtrek/air)

Development Commands:

# Run with live reload
air

# Run tests
go test ./...

# Run specific package tests
go test ./internal/composer/...

# Format code
go fmt ./...

# Vet code
go vet ./...

# Build binary
go build -o bin/whoosh ./cmd/whoosh

# Check version
./bin/whoosh --version

Testing

Unit Tests:

// internal/composer/service_test.go
func TestDetermineTaskType(t *testing.T) {
    service := NewService(nil, nil)

    taskType := service.DetermineTaskType("Fix bug in login", "...")
    assert.Equal(t, TaskTypeBugFix, taskType)
}

Integration Tests:

# Requires running database
go test -tags=integration ./internal/database/...

Database Setup for Tests:

# Create test database
createdb whoosh_test

# Run migrations
migrate -database "postgres://whoosh:whoosh@localhost:5432/whoosh_test?sslmode=disable" -path ./migrations up

Building for Production

Docker Build:

# Build binary
go build -o whoosh ./cmd/whoosh

# Build Docker image
docker build -t registry.home.deepblack.cloud/whoosh:v0.1.1 .

# Push to registry
docker push registry.home.deepblack.cloud/whoosh:v0.1.1

Docker Compose:

version: '3.8'

services:
  whoosh:
    image: registry.home.deepblack.cloud/whoosh:v0.1.1
    environment:
      WHOOSH_DATABASE_HOST: postgres
      WHOOSH_DATABASE_PORT: 5432
      WHOOSH_DATABASE_DB_NAME: whoosh
      WHOOSH_DATABASE_USERNAME: whoosh
      WHOOSH_DATABASE_PASSWORD: ${DATABASE_PASSWORD}
      WHOOSH_GITEA_BASE_URL: http://ironwood:3000
      WHOOSH_GITEA_TOKEN: ${GITEA_TOKEN}
      WHOOSH_AUTH_JWT_SECRET: ${JWT_SECRET}
    ports:
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      - postgres
    networks:
      - chorus_default

  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: whoosh
      POSTGRES_USER: whoosh
      POSTGRES_PASSWORD: ${DATABASE_PASSWORD}
    volumes:
      - whoosh_data:/var/lib/postgresql/data
    networks:
      - chorus_default

volumes:
  whoosh_data:

networks:
  chorus_default:
    external: true

Docker Swarm Deploy:

# Create secrets
echo "my_jwt_secret" | docker secret create whoosh_jwt_secret -
echo "my_gitea_token" | docker secret create whoosh_gitea_token -

# Deploy stack
docker stack deploy -c docker-compose.swarm.yml whoosh

# Check services
docker service ls
docker service ps whoosh_whoosh

# View logs
docker service logs whoosh_whoosh -f

Debugging

Enable Debug Logging:

export WHOOSH_LOGGING_LEVEL=debug
export LOG_LEVEL=debug
go run ./cmd/whoosh

Database Query Logging:

# Set pgx log level
export WHOOSH_DATABASE_LOG_LEVEL=trace

Gitea URL Debugging:

export WHOOSH_GITEA_DEBUG_URLS=true

Trace a Request:

# View in Jaeger UI
curl -H "X-Request-ID: test-request-123" http://localhost:8080/api/v1/teams

# Find trace in Jaeger
open http://localhost:16686
# Search for: service=whoosh, tags=request.id=test-request-123

Interactive Debugging (Delve):

# Install delve
go install github.com/go-delve/delve/cmd/dlv@latest

# Debug main
dlv debug ./cmd/whoosh

# Set breakpoint
(dlv) break internal/server/server.go:200
(dlv) continue

Appendix

Directory Structure

WHOOSH/
├── cmd/
│   ├── whoosh/              # Main application entry point
│   └── test-llm/            # LLM testing utility
├── internal/
│   ├── agents/              # Agent registry service
│   ├── auth/                # Authentication & authorization
│   ├── backbeat/            # BACKBEAT timing integration
│   ├── composer/            # Team composition service
│   ├── config/              # Configuration management
│   ├── council/             # Council formation service
│   ├── database/            # Database connection & migrations
│   ├── gitea/               # Gitea API client
│   ├── licensing/           # Enterprise licensing (stub)
│   ├── monitor/             # Repository monitoring service
│   ├── orchestrator/        # Docker Swarm orchestration
│   ├── p2p/                 # P2P discovery service
│   ├── server/              # HTTP server & routing
│   ├── tasks/               # Task management service
│   ├── tracing/             # OpenTelemetry tracing
│   └── validation/          # Input validation & security
├── migrations/              # Database migration files
├── ui/                      # Frontend assets (if any)
├── docs/                    # Documentation
├── scripts/                 # Utility scripts
├── requirements/            # Requirements documents
├── BACKBEAT-prototype/      # BACKBEAT SDK integration
├── go.mod                   # Go module definition
├── go.sum                   # Go module checksums
├── .env.example             # Environment variable template
├── Dockerfile               # Container build definition
└── README.md                # Project README

Common Issues & Solutions

Issue: Database connection failed

Error: failed to ping database: dial tcp 127.0.0.1:5432: connect: connection refused

Solution:
1. Ensure PostgreSQL is running: systemctl status postgresql
2. Check connection parameters in .env
3. Verify firewall rules allow port 5432
4. Check PostgreSQL logs: journalctl -u postgresql

Issue: Gitea API connection failed

Error: connection test failed: API request failed with status 401

Solution:
1. Verify Gitea token is correct
2. Check token has required permissions (read:repository, write:issue)
3. Verify Gitea base URL is accessible
4. Test manually: curl -H "Authorization: token YOUR_TOKEN" http://ironwood:3000/api/v1/user

Issue: Docker Swarm deployment failed

Error: failed to deploy agent service: Error response from daemon: This node is not a swarm manager

Solution:
1. Initialize Docker Swarm: docker swarm init
2. Or join existing swarm: docker swarm join --token TOKEN MANAGER_IP:2377
3. Verify swarm status: docker info | grep Swarm

Issue: Migrations not running

Error: Database migration failed: Dirty database version 5

Solution:
1. Check migration status: migrate -database "..." -path ./migrations version
2. Force version: migrate -database "..." -path ./migrations force 5
3. Re-run migrations: migrate -database "..." -path ./migrations up

Performance Tuning

Database Connection Pool:

# Increase for high concurrency
WHOOSH_DATABASE_MAX_OPEN_CONNS=50
WHOOSH_DATABASE_MAX_IDLE_CONNS=10

HTTP Server Timeouts:

# Increase for long-running operations
WHOOSH_SERVER_READ_TIMEOUT=60s
WHOOSH_SERVER_WRITE_TIMEOUT=60s

Rate Limiting:

// Adjust in server initialization
rateLimiter := auth.NewRateLimiter(200, time.Minute)  // 200 req/min

Composer Analysis Timeout:

# Reduce for faster failover to heuristics
WHOOSH_COMPOSER_ANALYSIS_TIMEOUT_SECS=30

Contributing

Code Style:

  • Follow standard Go conventions
  • Run go fmt before committing
  • Use go vet to check for issues
  • Add comments for exported functions
  • Write tests for new features

Git Workflow:

# 1. Create feature branch
git checkout -b feature/my-feature

# 2. Make changes and commit
git add .
git commit -m "Add feature: description"

# 3. Push to Gitea
git push origin feature/my-feature

# 4. Create pull request via Gitea UI
# 5. Address review comments
# 6. Merge when approved

Database Migrations:

# Create new migration
migrate create -ext sql -dir migrations -seq add_new_table

# Edit up and down files
# migrations/008_add_new_table.up.sql
# migrations/008_add_new_table.down.sql

# Test migration
migrate -database "postgres://..." -path ./migrations up
migrate -database "postgres://..." -path ./migrations down

References

  • CHORUS Project: Autonomous AI agent system (depends on WHOOSH for orchestration)
  • BACKBEAT: Cluster-wide timing and coordination system
  • BZZZ: Distributed task system integration
  • SLURP: UCXL content address system
  • BUBBLE: Decision tracking and policy management

Related Documentation:

  • /home/tony/chorus/CLAUDE.md - Project instructions
  • /home/tony/chorus/GEMINI.md - Cluster context
  • /home/tony/chorus/project-queues/active/WHOOSH/README.md - Quick start
  • /home/tony/chorus/project-queues/active/WHOOSH/docs/progress/WHOOSH-roadmap.md - Development roadmap
  • /home/tony/chorus/project-queues/active/WHOOSH/DEVELOPMENT_PLAN.md - Implementation plan

External Resources:


Document Version: 1.0 Generated: October 2025 Maintained by: WHOOSH Development Team