Files
WHOOSH/docs/BACKEND_ARCHITECTURE.md
Claude Code 9aeaa433fc Fix Docker Swarm discovery network name mismatch
- Changed NetworkName from 'chorus_default' to 'chorus_net'
- This matches the actual network 'CHORUS_chorus_net' (service prefix added automatically)
- Fixes discovered_count:0 issue - now successfully discovering all 25 agents
- Updated IMPLEMENTATION-SUMMARY with deployment status

Result: All 25 CHORUS agents now discovered successfully via Docker Swarm API

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-10 10:35:25 +11:00

1545 lines
41 KiB
Markdown

# WHOOSH Backend Architecture Documentation
**Version**: 0.1.1-debug
**Last Updated**: October 2025
**Status**: Beta (MVP + Council Formation)
---
## Table of Contents
1. [System Overview](#system-overview)
2. [Architecture Patterns](#architecture-patterns)
3. [Core Components](#core-components)
4. [Database Architecture](#database-architecture)
5. [API Layer](#api-layer)
6. [External Service Integrations](#external-service-integrations)
7. [Orchestration & Deployment](#orchestration--deployment)
8. [Configuration Management](#configuration-management)
9. [Security & Authentication](#security--authentication)
10. [Observability](#observability)
11. [Development Workflow](#development-workflow)
---
## System Overview
WHOOSH is an autonomous AI development team orchestration system built in Go. It monitors Gitea repositories for Design Brief issues, forms project kickoff councils, composes teams, and deploys CHORUS AI agents to Docker Swarm for autonomous development work.
### Current Status
**✅ Working:**
- Gitea Design Brief detection + council composition
- Docker Swarm agent deployment with role-specific environment variables
- JWT authentication, rate limiting, OpenTelemetry hooks
- Repository monitoring and issue synchronization
- Team composition with heuristic-based analysis
**🚧 Under Construction:**
- API persistence (REST handlers return placeholder data while Postgres wiring is finished)
- Analysis ingestion (composer relies on heuristic classification; LLM analysis is logged but unimplemented)
- Deployment telemetry (results aren't persisted yet)
- Autonomous team joining and role balancing
### Technology Stack
- **Language**: Go 1.22+ (toolchain go1.24.5)
- **Web Framework**: go-chi/chi/v5 (HTTP router)
- **Database**: PostgreSQL (pgx/v5 driver)
- **Container Orchestration**: Docker Swarm API
- **Migrations**: golang-migrate/migrate/v4
- **Logging**: zerolog (structured logging)
- **Tracing**: OpenTelemetry + Jaeger
- **Authentication**: JWT (golang-jwt/jwt/v5)
- **External Services**: Gitea API, BACKBEAT timing system, N8N workflows
---
## Architecture Patterns
### 1. Layered Architecture
```
┌─────────────────────────────────────────┐
│ API Layer (server/) │ HTTP Handlers, Routing, Middleware
├─────────────────────────────────────────┤
│ Business Logic Layer │
│ ┌─────────────┬──────────────────────┐ │
│ │ Composer │ Orchestrator │ │ Team Formation, Agent Deployment
│ ├─────────────┼──────────────────────┤ │
│ │ Monitor │ Council │ │ Repository Sync, Council Formation
│ └─────────────┴──────────────────────┘ │
├─────────────────────────────────────────┤
│ Integration Layer │ Gitea, Docker, BACKBEAT, N8N
├─────────────────────────────────────────┤
│ Data Layer (database/) │ PostgreSQL Connection Pool
└─────────────────────────────────────────┘
```
### 2. Service-Oriented Design
Each internal package represents a distinct service with clear responsibilities:
- **Composer**: Task analysis and team composition
- **Orchestrator**: Container deployment and scaling
- **Monitor**: Repository monitoring and issue ingestion
- **Council**: Project kickoff council formation
- **Gitea Client**: Gitea API integration
- **Agent Registry**: Agent lifecycle management
### 3. Context-Driven Execution
All operations use Go context for:
- Request tracing (OpenTelemetry spans)
- Timeout management
- Graceful cancellation
- Propagation of request-scoped values
---
## Core Components
### 1. Server (`internal/server/`)
**Responsibilities:**
- HTTP server lifecycle management
- Router configuration (chi)
- Middleware setup (CORS, auth, rate limiting, security headers)
- Health check endpoints
- API route registration
**Key Files:**
- `server.go`: Main server struct, initialization, routing setup
**Initialization Flow:**
```go
1. Load configuration from environment variables
2. Initialize database connection pool
3. Initialize external service clients (Gitea, Docker)
4. Create business logic services (composer, orchestrator, monitor)
5. Setup router with middleware
6. Register API routes
7. Start background services (monitor, P2P discovery, agent registry)
8. Start HTTP server
```
**API Routes (v1):**
- `/api/v1/teams` - Team management
- `/api/v1/tasks` - Task ingestion and management
- `/api/v1/projects` - Project management (Gitea repositories)
- `/api/v1/agents` - Agent registration and status
- `/api/v1/repositories` - Repository monitoring configuration
- `/api/v1/councils` - Council management and artifacts
- `/api/v1/assignments` - Agent assignment broker (if Docker enabled)
- `/api/v1/scaling` - Wave-based scaling API (if Docker enabled)
- `/api/v1/slurp` - SLURP proxy for UCXL content submission
- `/api/v1/backbeat` - BACKBEAT status monitoring
### 2. Monitor (`internal/monitor/`)
**Responsibilities:**
- Periodic repository synchronization (default: 5 minutes)
- Issue detection and ingestion from Gitea
- Design Brief detection for council formation
- Task creation and updates in database
- Triggering team composition or council formation
**Key Features:**
- Incremental sync using `since` parameter (after initial scan)
- Label-based filtering (e.g., `bzzz-task`, `chorus-entrypoint`)
- Support for multiple sync states: `pending`, `initial_scan`, `active`, `error`, `disabled`
- Automatic transition from initial scan to active when content found
**Council Detection Logic:**
```go
func isProjectKickoffBrief(issue) bool {
// Must have "chorus-entrypoint" label
// Must have "Design Brief" in title
return hasChorusEntrypoint && containsDesignBrief
}
```
**Sync Flow:**
```
1. Get all monitored repositories (WHERE monitor_issues = true)
2. For each repository:
a. Fetch issues from Gitea API
b. Filter by CHORUS labels if enabled
c. Create or update task records
d. Check for Design Brief issues → trigger council formation
e. Check for bzzz-task issues → trigger team composition
f. Update repository sync timestamps
3. Log sync results and statistics
```
### 3. Composer (`internal/composer/`)
**Responsibilities:**
- Task classification (feature, bug fix, security, etc.)
- Complexity analysis and risk assessment
- Skill requirement extraction
- Team composition and agent matching
- Team persistence to database
**Configuration:**
```go
type ComposerConfig struct {
ClassificationModel string // LLM model for classification
SkillAnalysisModel string // LLM model for skill analysis
MatchingModel string // LLM model for team matching
DefaultStrategy string // "minimal_viable"
MinTeamSize int // 1
MaxTeamSize int // 3
SkillMatchThreshold float64 // 0.6
AnalysisTimeoutSecs int // 30-60
FeatureFlags FeatureFlags
}
```
**Feature Flags:**
- `EnableLLMClassification`: Use LLM vs heuristics (default: false)
- `EnableLLMSkillAnalysis`: Use LLM vs heuristics (default: false)
- `EnableLLMTeamMatching`: Use LLM vs heuristics (default: false)
- `EnableFailsafeFallback`: Fallback to heuristics on LLM failure (default: true)
**Analysis Pipeline:**
```
TaskAnalysisInput
1. classifyTask() → TaskClassification
- determineTaskType() [heuristic or LLM]
- estimateComplexity()
- identifyDomains()
2. analyzeSkillRequirements() → SkillRequirements
- Map domains to skills
- Determine critical vs desirable
3. getAvailableAgents() → []*Agent
4. composeTeam() → TeamComposition
- selectRequiredRoles()
- matchAgentsToRoles()
- calculateConfidence()
5. CreateTeam() → Team (persisted to DB)
```
**Task Types:**
- `feature_development`
- `bug_fix`
- `refactoring`
- `security`
- `integration`
- `migration`
- `research`
- `optimization`
- `maintenance`
### 4. Council (`internal/council/`)
**Responsibilities:**
- Project kickoff council formation
- Core agent selection (Product Manager, Engineering Lead, Quality Lead)
- Optional agent selection (Security, DevOps, UX)
- Council composition persistence
**Council Composition:**
```go
type CouncilComposition struct {
CouncilID uuid.UUID
ProjectName string
CoreAgents []CouncilAgent // PM, Eng Lead, QA Lead
OptionalAgents []CouncilAgent // Security, DevOps, UX
Strategy string
Status string
}
```
**Council Roles:**
- **Core Agents** (always deployed):
- Product Manager (PM)
- Engineering Lead (eng-lead)
- Quality Lead (qa-lead)
- **Optional Agents** (deployed based on project needs):
- Security Lead (sec-lead)
- DevOps Lead (devops-lead)
- UX Lead (ux-lead)
### 5. Orchestrator (`internal/orchestrator/`)
**Responsibilities:**
- Docker Swarm service deployment
- Agent container configuration
- Resource allocation (CPU/memory limits)
- Volume mounting and network configuration
- Service scaling and health monitoring
**Components:**
#### SwarmManager (`swarm_manager.go`)
- Docker Swarm API client wrapper
- Service creation, scaling, removal
- Task monitoring and status tracking
**Key Methods:**
```go
DeployAgent(config *AgentDeploymentConfig) (*swarm.Service, error)
ScaleService(serviceName string, replicas int) error
GetServiceStatus(serviceName string) (*ServiceStatus, error)
RemoveAgent(serviceID string) error
```
#### AgentDeployer (`agent_deployer.go`)
- Team agent deployment orchestration
- Council agent deployment orchestration
- Agent assignment to CHORUS containers
**Deployment Flow:**
```
DeploymentRequest
1. For each agent in team/council:
a. selectAgentImage() → CHORUS image
b. buildAgentEnvironment() → env vars
c. buildAgentVolumes() → Docker socket + workspace
d. calculateResources() → CPU/memory limits
e. deploySingleAgent() → Swarm service
2. recordDeployment() → Update database
3. updateTeamDeploymentStatus() → Track overall status
```
**Agent Environment Variables:**
```bash
CHORUS_AGENT_NAME=<role_name> # Maps to human-roles.yaml
CHORUS_TEAM_ID=<uuid>
CHORUS_TASK_ID=<uuid>
CHORUS_PROJECT=<repository>
CHORUS_TASK_TITLE=<title>
CHORUS_TASK_DESC=<description>
CHORUS_PRIORITY=<priority>
CHORUS_EXTERNAL_URL=<issue_url>
WHOOSH_COORDINATOR=true
WHOOSH_ENDPOINT=http://whoosh:8080
DOCKER_HOST=unix:///var/run/docker.sock
```
**Resource Allocation:**
```go
ResourceLimits{
CPULimit: 1000000000, // 1 CPU core
MemoryLimit: 1073741824, // 1 GB RAM
CPURequest: 500000000, // 0.5 CPU core
MemoryRequest: 536870912, // 512 MB RAM
}
```
#### Scaling System (`scaling_*.go`)
- Wave-based scaling controller
- Bootstrap pool manager
- Assignment broker
- Health gates (KACHING, BACKBEAT, CHORUS)
- Metrics collector
**Scaling Components:**
- `ScalingController`: Coordinates scaling operations
- `BootstrapPoolManager`: Manages pre-warmed agent pool
- `AssignmentBroker`: Assigns tasks to available agents
- `HealthGates`: Checks system health before scaling
- `ScalingMetricsCollector`: Tracks scaling operation metrics
### 6. Gitea Client (`internal/gitea/`)
**Responsibilities:**
- Gitea API client with retry logic
- Issue listing and retrieval
- Repository information fetching
- Label management and creation
- Webhook payload parsing
**Configuration Options:**
```go
type GITEAConfig struct {
BaseURL string // Gitea instance URL
Token string // API token
TokenFile string // Token from file
WebhookPath string // Webhook endpoint path
WebhookToken string // Webhook secret
EagerFilter bool // Pre-filter by labels at API level
FullRescan bool // Ignore since parameter for full rescan
DebugURLs bool // Log exact URLs
MaxRetries int // Retry attempts (default: 3)
RetryDelay time.Duration // Delay between retries (default: 2s)
}
```
**Retry Logic:**
- Automatic retry on 5xx errors and 429 (rate limiting)
- Configurable max retries and delay
- No retry on 4xx client errors
- Exponential backoff via configured delay
**Issue Fetching:**
```go
func GetIssues(owner, repo string, opts IssueListOptions) ([]Issue, error)
- Supports state filtering (open/closed/all)
- Label filtering (eager at API or in-code)
- Since parameter for incremental sync
- Pagination support
```
**Label Management:**
```go
func EnsureRequiredLabels(owner, repo string) error
- Creates standardized labels:
- bug, enhancement, duplicate, invalid, etc.
- bzzz-task (CHORUS task marker)
- chorus-entrypoint (Design Brief marker)
```
### 7. BACKBEAT Integration (`internal/backbeat/`)
**Responsibilities:**
- Integration with BACKBEAT timing system (NATS-based)
- Beat-synchronized status emission
- Search operation tracking
- Health monitoring
**Key Concepts:**
- **Beat**: Regular timing event (every 30 seconds at 2 BPM default)
- **Downbeat**: Bar start event (every 4 beats = 2 minutes)
- **StatusClaim**: Progress update emitted to NATS
**Search Operation Phases:**
```go
PhaseStarted PhaseIndexing PhaseQuerying PhaseRanking PhaseCompleted/PhaseFailed
```
**Integration Flow:**
```
1. Start(ctx) → Connect to NATS cluster
2. OnBeat() → Emit status claims every beat
3. OnDownbeat() → Cleanup completed operations
4. StartSearch() → Register new search operation
5. UpdateSearchPhase() → Update operation progress
6. CompleteSearch() → Mark operation complete
```
### 8. Authentication & Security (`internal/auth/`)
**Components:**
#### Middleware (`middleware.go`)
- JWT token validation
- Service token authentication
- Admin role checking
- Request authentication
**Methods:**
```go
Authenticate(next http.Handler) http.Handler // Generic auth
ServiceTokenRequired(next http.Handler) http.Handler // Service tokens only
AdminRequired(next http.Handler) http.Handler // Admin role required
```
#### Rate Limiter (`ratelimit.go`)
- IP-based rate limiting
- Configurable requests per time window
- In-memory storage with automatic cleanup
**Default Configuration:**
```go
RateLimiter{
RequestsPerMinute: 100,
CleanupInterval: time.Minute,
}
```
### 9. Validation (`internal/validation/`)
**Security Headers:**
```go
func SecurityHeaders(next http.Handler) http.Handler
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
- X-XSS-Protection: 1; mode=block
- Content-Security-Policy: default-src 'self'
```
**Input Validation:**
- UUID validation
- Request body size limits
- Content-Type validation
### 10. Tracing (`internal/tracing/`)
**OpenTelemetry Integration:**
- Jaeger exporter for distributed tracing
- Span creation for key operations
- Context propagation across services
- Performance monitoring
**Span Types:**
```go
StartSpan(ctx, "operation_name") Generic span
StartMonitorSpan(ctx, "operation", "repository") Repository monitoring
StartCouncilSpan(ctx, "operation", "council_id") Council operations
StartDeploymentSpan(ctx, "operation", "resource_id") Deployment operations
```
**Configuration:**
```go
type OpenTelemetryConfig struct {
Enabled bool
ServiceName string // "whoosh"
ServiceVersion string // "1.0.0"
Environment string // "production"
JaegerEndpoint string // "http://localhost:14268/api/traces"
SampleRate float64 // 1.0 (100%)
}
```
---
## Database Architecture
### Schema Overview
**Core Tables:**
1. `teams` - Team records
2. `team_roles` - Role definitions (executor, coordinator, reviewer)
3. `team_assignments` - Agent-to-role assignments
4. `agents` - AI agent registry
5. `tasks` - Task records from Gitea/external sources
6. `repositories` - Monitored repository configurations
7. `repository_sync_logs` - Sync operation history
8. `councils` - Project kickoff council records
9. `council_agents` - Council agent assignments
10. `council_artifacts` - Council-generated artifacts
### Key Relationships
```
repositories (1) ──→ (N) tasks
tasks (1) ──→ (1) teams (assigned_team_id)
tasks (1) ──→ (1) councils (via task_id)
teams (1) ──→ (N) team_assignments
team_assignments (N) ──→ (1) agents
team_assignments (N) ──→ (1) team_roles
councils (1) ──→ (N) council_agents
```
### Migration System
**Location**: `/migrations/*.sql`
**Migration Files:**
1. `001_init_schema.up.sql` - Initial teams, agents, roles
2. `002_add_tasks_table.up.sql` - Task management
3. `003_add_repositories_table.up.sql` - Repository monitoring
4. `004_enhance_task_team_integration.up.sql` - Enhanced relationships
5. `005_add_council_tables.up.sql` - Council management
6. `006_add_performance_indexes.up.sql` - Query optimization
7. `007_add_team_deployment_status.up.sql` - Deployment tracking
**Running Migrations:**
```bash
# Automatic on startup (if AutoMigrate=true)
WHOOSH_DATABASE_AUTO_MIGRATE=true go run ./cmd/whoosh
# Manual via migrate CLI
migrate -database "postgres://..." -path ./migrations up
```
### Connection Pooling
```go
type DatabaseConfig struct {
MaxOpenConns int // 25 (default)
MaxIdleConns int // 5 (default)
MaxConnLifetime time.Duration // 1 hour
MaxConnIdleTime time.Duration // 30 minutes
}
```
### Key Indexes
**Performance Indexes:**
```sql
-- Agent availability
idx_agents_status_last_seen ON agents(status, last_seen)
-- Repository lookups
idx_repositories_full_name_lookup ON repositories(full_name)
idx_repositories_last_issue_sync ON repositories(last_issue_sync)
-- Task lookups
idx_tasks_external_source_lookup ON tasks(external_id, source_type)
idx_tasks_repository_id ON tasks(repository_id)
idx_tasks_assigned_team_id ON tasks(assigned_team_id)
-- Team deployment
idx_teams_deployment_status ON teams(deployment_status)
```
---
## API Layer
### Request/Response Format
**Standard Response:**
```json
{
"status": "success",
"data": { ... },
"message": "Operation completed successfully"
}
```
**Error Response:**
```json
{
"status": "error",
"error": "Error message",
"details": { ... }
}
```
### Authentication
**JWT Token Format:**
```
Authorization: Bearer <jwt_token>
```
**Service Token Format:**
```
Authorization: Bearer <service_token>
```
### Key API Endpoints
#### Teams API
```
GET /api/v1/teams - List all teams (with pagination)
POST /api/v1/teams - Create new team (admin only)
GET /api/v1/teams/{teamID} - Get team details
PUT /api/v1/teams/{teamID}/status - Update team status (admin only)
POST /api/v1/teams/analyze - Analyze task for team composition
```
#### Tasks API
```
GET /api/v1/tasks - List all tasks
POST /api/v1/tasks/ingest - Ingest task from external source (service token)
GET /api/v1/tasks/{taskID} - Get task details
```
#### Projects API (Gitea Repositories)
```
GET /api/v1/projects - List all projects
POST /api/v1/projects - Create new project (admin only)
GET /api/v1/projects/{projectID} - Get project details
GET /api/v1/projects/{projectID}/tasks - List project tasks
POST /api/v1/projects/{projectID}/tasks/{taskNumber}/claim - Claim task
```
#### Repositories API
```
GET /api/v1/repositories - List monitored repositories
POST /api/v1/repositories - Add repository for monitoring (admin only)
GET /api/v1/repositories/{repoID} - Get repository details
PUT /api/v1/repositories/{repoID} - Update repository config (admin only)
POST /api/v1/repositories/{repoID}/sync - Trigger manual sync (admin only)
POST /api/v1/repositories/{repoID}/ensure-labels - Create standard labels (admin only)
GET /api/v1/repositories/{repoID}/logs - Get sync logs
```
#### Councils API
```
GET /api/v1/councils/{councilID} - Get council details
GET /api/v1/councils/{councilID}/artifacts - List council artifacts
POST /api/v1/councils/{councilID}/artifacts - Create artifact (admin only)
```
#### Agents API
```
GET /api/v1/agents - List all agents
POST /api/v1/agents/register - Register new agent
PUT /api/v1/agents/{agentID}/status - Update agent status
```
#### Scaling API (if Docker enabled)
```
GET /api/v1/scaling/status - Get scaling system status
POST /api/v1/scaling/scale-up - Manually trigger scale-up
POST /api/v1/scaling/scale-down - Manually trigger scale-down
GET /api/v1/scaling/metrics - Get scaling metrics
```
#### Health & Monitoring
```
GET /health - Basic health check
GET /health/ready - Readiness probe
GET /admin/health/details - Detailed health information
GET /api/v1/backbeat/status - BACKBEAT integration status
```
### Webhook Endpoints
#### Gitea Webhook
```
POST /webhooks/gitea - Receive Gitea webhook events
```
**Supported Events:**
- `issues` - Issue opened/closed/edited
- `issue_comment` - Comment added
- `push` - Code pushed
- `pull_request` - PR opened/merged
**Webhook Security:**
- HMAC signature verification using webhook token
- X-Gitea-Signature header validation
---
## External Service Integrations
### 1. Gitea Integration
**Base URL**: Configured via `WHOOSH_GITEA_BASE_URL`
**Authentication**: API token (from file or environment)
**API Operations:**
- List repositories
- Get repository details
- List issues (with filtering)
- Get issue details
- Create/manage labels
- Test connection
**Webhook Integration:**
- Receives issue events (create, update, close)
- Triggers team composition or council formation
- Updates task status in database
### 2. Docker Swarm Integration
**Socket**: Unix socket (`/var/run/docker.sock`) or TCP
**Operations:**
- Service creation (`ServiceCreate`)
- Service scaling (`ServiceUpdate`)
- Service inspection (`ServiceInspectWithRaw`)
- Task listing (`TaskList`)
- Service removal (`ServiceRemove`)
- Service logs (`ServiceLogs`)
**Network**: Agents deployed to `chorus_default` network by default
**Image Registry**: `registry.home.deepblack.cloud` (private registry)
**Standard Image**: `docker.io/anthonyrawlins/chorus:backbeat-v2.0.1`
### 3. BACKBEAT Integration
**Protocol**: NATS messaging
**NATS URL**: Configured via `WHOOSH_BACKBEAT_NATS_URL`
**Operations:**
- Beat synchronization (30-second intervals at 2 BPM)
- Status claim emission
- Health monitoring
- Task progress tracking
**Health Indicators:**
- Connected to NATS cluster
- Current beat index
- Measured BPM vs target tempo
- Tempo drift
- Reconnection count
- Active searches/operations
### 4. N8N Workflows
**Base URL**: `https://n8n.home.deepblack.cloud`
**Integration Points:**
- Gitea webhook → N8N → BZZZ task coordination
- WHOOSH events → N8N → External notifications
- Council formation → N8N → Project initialization workflows
### 5. SLURP (UCXL Content System)
**Purpose**: UCXL address-based artifact storage
**API Endpoints:**
- `POST /api/v1/slurp/submit` - Submit artifact to SLURP
- `GET /api/v1/slurp/artifacts/{ucxlAddr}` - Retrieve artifact
**Use Cases:**
- Decision records (BUBBLE integration)
- Council artifacts (project documentation)
- Compliance documentation
---
## Configuration Management
### Environment Variables
**Database Configuration:**
```bash
WHOOSH_DATABASE_HOST=localhost
WHOOSH_DATABASE_PORT=5432
WHOOSH_DATABASE_DB_NAME=whoosh
WHOOSH_DATABASE_USERNAME=whoosh
WHOOSH_DATABASE_PASSWORD=<password>
WHOOSH_DATABASE_PASSWORD_FILE=/secrets/db_password # Alternative
WHOOSH_DATABASE_SSL_MODE=disable
WHOOSH_DATABASE_AUTO_MIGRATE=true
WHOOSH_DATABASE_MAX_OPEN_CONNS=25
WHOOSH_DATABASE_MAX_IDLE_CONNS=5
```
**Server Configuration:**
```bash
WHOOSH_SERVER_LISTEN_ADDR=:8080
WHOOSH_SERVER_READ_TIMEOUT=30s
WHOOSH_SERVER_WRITE_TIMEOUT=30s
WHOOSH_SERVER_SHUTDOWN_TIMEOUT=30s
WHOOSH_SERVER_ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8080
WHOOSH_SERVER_ALLOWED_ORIGINS_FILE=/secrets/allowed_origins # Alternative
```
**Gitea Configuration:**
```bash
WHOOSH_GITEA_BASE_URL=http://ironwood:3000
WHOOSH_GITEA_TOKEN=<token>
WHOOSH_GITEA_TOKEN_FILE=/secrets/gitea_token # Alternative
WHOOSH_GITEA_WEBHOOK_PATH=/webhooks/gitea
WHOOSH_GITEA_WEBHOOK_TOKEN=<secret>
WHOOSH_GITEA_WEBHOOK_TOKEN_FILE=/secrets/webhook_token # Alternative
WHOOSH_GITEA_EAGER_FILTER=true
WHOOSH_GITEA_FULL_RESCAN=false
WHOOSH_GITEA_DEBUG_URLS=false
WHOOSH_GITEA_MAX_RETRIES=3
WHOOSH_GITEA_RETRY_DELAY=2s
```
**Authentication Configuration:**
```bash
WHOOSH_AUTH_JWT_SECRET=<secret_min_32_chars>
WHOOSH_AUTH_JWT_SECRET_FILE=/secrets/jwt_secret # Alternative
WHOOSH_AUTH_SERVICE_TOKENS=token1,token2,token3
WHOOSH_AUTH_SERVICE_TOKENS_FILE=/secrets/service_tokens # Alternative
WHOOSH_AUTH_JWT_EXPIRY=24h
```
**Logging Configuration:**
```bash
WHOOSH_LOGGING_LEVEL=debug # debug, info, warn, error
WHOOSH_LOGGING_ENVIRONMENT=development # development, production
LOG_LEVEL=info # Alternative for zerolog
ENVIRONMENT=development # Enables pretty logging
```
**Team Composer Configuration:**
```bash
# LLM-based analysis (experimental, default: false)
WHOOSH_COMPOSER_ENABLE_LLM_CLASSIFICATION=false
WHOOSH_COMPOSER_ENABLE_LLM_SKILL_ANALYSIS=false
WHOOSH_COMPOSER_ENABLE_LLM_TEAM_MATCHING=false
# Analysis features
WHOOSH_COMPOSER_ENABLE_COMPLEXITY_ANALYSIS=true
WHOOSH_COMPOSER_ENABLE_RISK_ASSESSMENT=true
WHOOSH_COMPOSER_ENABLE_ALTERNATIVE_OPTIONS=false
# Debug and monitoring
WHOOSH_COMPOSER_ENABLE_ANALYSIS_LOGGING=true
WHOOSH_COMPOSER_ENABLE_PERFORMANCE_METRICS=true
WHOOSH_COMPOSER_ENABLE_FAILSAFE_FALLBACK=true
# LLM model configuration
WHOOSH_COMPOSER_CLASSIFICATION_MODEL=llama3.1:8b
WHOOSH_COMPOSER_SKILL_ANALYSIS_MODEL=llama3.1:8b
WHOOSH_COMPOSER_MATCHING_MODEL=llama3.1:8b
# Performance settings
WHOOSH_COMPOSER_ANALYSIS_TIMEOUT_SECS=60
WHOOSH_COMPOSER_SKILL_MATCH_THRESHOLD=0.6
```
**BACKBEAT Configuration:**
```bash
WHOOSH_BACKBEAT_ENABLED=true
WHOOSH_BACKBEAT_CLUSTER_ID=chorus-production
WHOOSH_BACKBEAT_AGENT_ID=whoosh
WHOOSH_BACKBEAT_NATS_URL=nats://backbeat-nats:4222
```
**Docker Configuration:**
```bash
WHOOSH_DOCKER_ENABLED=true
WHOOSH_DOCKER_HOST=unix:///var/run/docker.sock
```
**OpenTelemetry Configuration:**
```bash
WHOOSH_OPENTELEMETRY_ENABLED=true
WHOOSH_OPENTELEMETRY_SERVICE_NAME=whoosh
WHOOSH_OPENTELEMETRY_SERVICE_VERSION=1.0.0
WHOOSH_OPENTELEMETRY_ENVIRONMENT=production
WHOOSH_OPENTELEMETRY_JAEGER_ENDPOINT=http://localhost:14268/api/traces
WHOOSH_OPENTELEMETRY_SAMPLE_RATE=1.0
```
**N8N Configuration:**
```bash
WHOOSH_N8N_BASE_URL=https://n8n.home.deepblack.cloud
```
### Configuration Loading
**Priority Order:**
1. Environment variables
2. Secret files (if `*_FILE` variant specified)
3. Default values in code
**Secret File Loading:**
```go
// Example: JWT secret loading
if cfg.Auth.JWTSecretFile != "" {
secret, err := readSecretFile(cfg.Auth.JWTSecretFile)
cfg.Auth.JWTSecret = secret
}
```
**Validation:**
```go
func (c *Config) Validate() error {
- Check required fields (database password, Gitea token, etc.)
- Build database URL if not provided
- Validate CORS origins
- Ensure JWT secret meets minimum length
- Validate service tokens present
}
```
---
## Security & Authentication
### Authentication Mechanisms
#### 1. JWT Authentication
- Used for user/admin API access
- Token expiry: 24 hours (configurable)
- Claims include: user_id, role, issued_at, expires_at
- Validated on protected endpoints via middleware
#### 2. Service Token Authentication
- Used for service-to-service communication
- Static tokens configured via environment
- Required for task ingestion endpoints
- Validated via `ServiceTokenRequired` middleware
#### 3. Admin Role Enforcement
- Admin-only endpoints protected via `AdminRequired` middleware
- Role claim must be "admin" in JWT
- Used for repository management, team creation, etc.
### Security Headers
Applied to all responses:
```
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Security-Policy: default-src 'self'
```
### CORS Configuration
- Allowed origins: Configured via environment
- Allowed methods: GET, POST, PUT, DELETE, OPTIONS
- Credentials: Enabled
- Max age: 300 seconds
### Rate Limiting
- Default: 100 requests per minute per IP
- In-memory storage with automatic cleanup
- Applied globally via middleware
### Webhook Security
- Gitea webhooks: HMAC signature verification
- Token stored securely (from file or environment)
- Signature header: `X-Gitea-Signature`
### Secret Management
**Best Practices:**
- Use `*_FILE` environment variables for secrets
- Mount secrets as files in Docker Swarm
- Never commit secrets to Git
- Rotate tokens regularly
**Example Docker Secret:**
```yaml
secrets:
gitea_token:
file: /path/to/gitea_token.txt
services:
whoosh:
secrets:
- gitea_token
environment:
WHOOSH_GITEA_TOKEN_FILE: /run/secrets/gitea_token
```
---
## Observability
### Logging
**Structured Logging (zerolog):**
```go
log.Info().
Str("team_id", teamID).
Int("agent_count", count).
Dur("duration", duration).
Msg("Team deployed successfully")
```
**Log Levels:**
- `debug`: Detailed debugging information
- `info`: General information messages
- `warn`: Warning messages (recoverable errors)
- `error`: Error messages (operation failures)
**Pretty Logging:**
- Enabled in development mode
- Human-readable console output
- Colored output for log levels
### Distributed Tracing
**OpenTelemetry + Jaeger:**
```go
ctx, span := tracing.StartSpan(ctx, "operation_name")
defer span.End()
span.SetAttributes(
attribute.String("resource.id", id),
attribute.Int("resource.count", count),
)
// On error
tracing.SetSpanError(span, err)
```
**Trace Propagation:**
- Context passed through entire request lifecycle
- Spans created at key operations:
- HTTP request handling
- Database queries
- External API calls
- Docker operations
- Council/team operations
**Jaeger UI:**
- Access at: `http://localhost:16686`
- View traces by service, operation, duration
- Analyze performance bottlenecks
- Debug distributed operations
### Health Checks
**Basic Health Check (`/health`):**
```json
{
"status": "ok",
"service": "whoosh",
"version": "0.1.0-mvp",
"backbeat": {
"enabled": true,
"connected": true,
"current_beat": 12345
}
}
```
**Readiness Check (`/health/ready`):**
```json
{
"status": "ready",
"database": "connected"
}
```
**Detailed Health (`/admin/health/details`):**
```json
{
"service": "whoosh",
"version": "0.1.1-debug",
"timestamp": 1696118400,
"status": "healthy",
"components": {
"database": {
"status": "healthy",
"type": "postgresql",
"statistics": {
"max_conns": 25,
"acquired_conns": 3,
"idle_conns": 5
}
},
"gitea": {
"status": "healthy",
"endpoint": "http://ironwood:3000"
},
"backbeat": {
"status": "healthy",
"connected": true,
"current_tempo": 2
},
"docker_swarm": {
"status": "unknown",
"note": "Health check not implemented"
}
}
}
```
### Metrics
**Database Metrics:**
- Connection pool statistics
- Active connections
- Idle connections
- Query duration
**Deployment Metrics (via ScalingMetricsCollector):**
- Wave execution count
- Agent deployment success/failure rate
- Average deployment duration
- Error rate
**BACKBEAT Metrics:**
- Current beat index
- Tempo (BPM)
- Tempo drift
- Reconnection count
- Active operations
---
## Development Workflow
### Running Locally
**Prerequisites:**
```bash
# Install Go 1.22+
go version
# Install PostgreSQL 14+
psql --version
# Install Docker (for Swarm testing)
docker version
```
**Setup:**
```bash
# 1. Clone repository
git clone https://gitea.chorus.services/tony/WHOOSH.git
cd WHOOSH
# 2. Copy environment configuration
cp .env.example .env
# Edit .env with local values
# 3. Start PostgreSQL (Docker example)
docker run -d \
--name whoosh-postgres \
-e POSTGRES_DB=whoosh \
-e POSTGRES_USER=whoosh \
-e POSTGRES_PASSWORD=whoosh \
-p 5432:5432 \
postgres:15
# 4. Run migrations
make migrate
# Or manual:
# migrate -database "postgres://whoosh:whoosh@localhost:5432/whoosh?sslmode=disable" -path ./migrations up
# 5. Run the server
go run ./cmd/whoosh
# Or with hot reload:
# air (requires cosmtrek/air)
```
**Development Commands:**
```bash
# Run with live reload
air
# Run tests
go test ./...
# Run specific package tests
go test ./internal/composer/...
# Format code
go fmt ./...
# Vet code
go vet ./...
# Build binary
go build -o bin/whoosh ./cmd/whoosh
# Check version
./bin/whoosh --version
```
### Testing
**Unit Tests:**
```go
// internal/composer/service_test.go
func TestDetermineTaskType(t *testing.T) {
service := NewService(nil, nil)
taskType := service.DetermineTaskType("Fix bug in login", "...")
assert.Equal(t, TaskTypeBugFix, taskType)
}
```
**Integration Tests:**
```bash
# Requires running database
go test -tags=integration ./internal/database/...
```
**Database Setup for Tests:**
```bash
# Create test database
createdb whoosh_test
# Run migrations
migrate -database "postgres://whoosh:whoosh@localhost:5432/whoosh_test?sslmode=disable" -path ./migrations up
```
### Building for Production
**Docker Build:**
```bash
# Build binary
go build -o whoosh ./cmd/whoosh
# Build Docker image
docker build -t registry.home.deepblack.cloud/whoosh:v0.1.1 .
# Push to registry
docker push registry.home.deepblack.cloud/whoosh:v0.1.1
```
**Docker Compose:**
```yaml
version: '3.8'
services:
whoosh:
image: registry.home.deepblack.cloud/whoosh:v0.1.1
environment:
WHOOSH_DATABASE_HOST: postgres
WHOOSH_DATABASE_PORT: 5432
WHOOSH_DATABASE_DB_NAME: whoosh
WHOOSH_DATABASE_USERNAME: whoosh
WHOOSH_DATABASE_PASSWORD: ${DATABASE_PASSWORD}
WHOOSH_GITEA_BASE_URL: http://ironwood:3000
WHOOSH_GITEA_TOKEN: ${GITEA_TOKEN}
WHOOSH_AUTH_JWT_SECRET: ${JWT_SECRET}
ports:
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
- postgres
networks:
- chorus_default
postgres:
image: postgres:15
environment:
POSTGRES_DB: whoosh
POSTGRES_USER: whoosh
POSTGRES_PASSWORD: ${DATABASE_PASSWORD}
volumes:
- whoosh_data:/var/lib/postgresql/data
networks:
- chorus_default
volumes:
whoosh_data:
networks:
chorus_default:
external: true
```
**Docker Swarm Deploy:**
```bash
# Create secrets
echo "my_jwt_secret" | docker secret create whoosh_jwt_secret -
echo "my_gitea_token" | docker secret create whoosh_gitea_token -
# Deploy stack
docker stack deploy -c docker-compose.swarm.yml whoosh
# Check services
docker service ls
docker service ps whoosh_whoosh
# View logs
docker service logs whoosh_whoosh -f
```
### Debugging
**Enable Debug Logging:**
```bash
export WHOOSH_LOGGING_LEVEL=debug
export LOG_LEVEL=debug
go run ./cmd/whoosh
```
**Database Query Logging:**
```bash
# Set pgx log level
export WHOOSH_DATABASE_LOG_LEVEL=trace
```
**Gitea URL Debugging:**
```bash
export WHOOSH_GITEA_DEBUG_URLS=true
```
**Trace a Request:**
```bash
# View in Jaeger UI
curl -H "X-Request-ID: test-request-123" http://localhost:8080/api/v1/teams
# Find trace in Jaeger
open http://localhost:16686
# Search for: service=whoosh, tags=request.id=test-request-123
```
**Interactive Debugging (Delve):**
```bash
# Install delve
go install github.com/go-delve/delve/cmd/dlv@latest
# Debug main
dlv debug ./cmd/whoosh
# Set breakpoint
(dlv) break internal/server/server.go:200
(dlv) continue
```
---
## Appendix
### Directory Structure
```
WHOOSH/
├── cmd/
│ ├── whoosh/ # Main application entry point
│ └── test-llm/ # LLM testing utility
├── internal/
│ ├── agents/ # Agent registry service
│ ├── auth/ # Authentication & authorization
│ ├── backbeat/ # BACKBEAT timing integration
│ ├── composer/ # Team composition service
│ ├── config/ # Configuration management
│ ├── council/ # Council formation service
│ ├── database/ # Database connection & migrations
│ ├── gitea/ # Gitea API client
│ ├── licensing/ # Enterprise licensing (stub)
│ ├── monitor/ # Repository monitoring service
│ ├── orchestrator/ # Docker Swarm orchestration
│ ├── p2p/ # P2P discovery service
│ ├── server/ # HTTP server & routing
│ ├── tasks/ # Task management service
│ ├── tracing/ # OpenTelemetry tracing
│ └── validation/ # Input validation & security
├── migrations/ # Database migration files
├── ui/ # Frontend assets (if any)
├── docs/ # Documentation
├── scripts/ # Utility scripts
├── requirements/ # Requirements documents
├── BACKBEAT-prototype/ # BACKBEAT SDK integration
├── go.mod # Go module definition
├── go.sum # Go module checksums
├── .env.example # Environment variable template
├── Dockerfile # Container build definition
└── README.md # Project README
```
### Common Issues & Solutions
**Issue: Database connection failed**
```
Error: failed to ping database: dial tcp 127.0.0.1:5432: connect: connection refused
Solution:
1. Ensure PostgreSQL is running: systemctl status postgresql
2. Check connection parameters in .env
3. Verify firewall rules allow port 5432
4. Check PostgreSQL logs: journalctl -u postgresql
```
**Issue: Gitea API connection failed**
```
Error: connection test failed: API request failed with status 401
Solution:
1. Verify Gitea token is correct
2. Check token has required permissions (read:repository, write:issue)
3. Verify Gitea base URL is accessible
4. Test manually: curl -H "Authorization: token YOUR_TOKEN" http://ironwood:3000/api/v1/user
```
**Issue: Docker Swarm deployment failed**
```
Error: failed to deploy agent service: Error response from daemon: This node is not a swarm manager
Solution:
1. Initialize Docker Swarm: docker swarm init
2. Or join existing swarm: docker swarm join --token TOKEN MANAGER_IP:2377
3. Verify swarm status: docker info | grep Swarm
```
**Issue: Migrations not running**
```
Error: Database migration failed: Dirty database version 5
Solution:
1. Check migration status: migrate -database "..." -path ./migrations version
2. Force version: migrate -database "..." -path ./migrations force 5
3. Re-run migrations: migrate -database "..." -path ./migrations up
```
### Performance Tuning
**Database Connection Pool:**
```bash
# Increase for high concurrency
WHOOSH_DATABASE_MAX_OPEN_CONNS=50
WHOOSH_DATABASE_MAX_IDLE_CONNS=10
```
**HTTP Server Timeouts:**
```bash
# Increase for long-running operations
WHOOSH_SERVER_READ_TIMEOUT=60s
WHOOSH_SERVER_WRITE_TIMEOUT=60s
```
**Rate Limiting:**
```go
// Adjust in server initialization
rateLimiter := auth.NewRateLimiter(200, time.Minute) // 200 req/min
```
**Composer Analysis Timeout:**
```bash
# Reduce for faster failover to heuristics
WHOOSH_COMPOSER_ANALYSIS_TIMEOUT_SECS=30
```
### Contributing
**Code Style:**
- Follow standard Go conventions
- Run `go fmt` before committing
- Use `go vet` to check for issues
- Add comments for exported functions
- Write tests for new features
**Git Workflow:**
```bash
# 1. Create feature branch
git checkout -b feature/my-feature
# 2. Make changes and commit
git add .
git commit -m "Add feature: description"
# 3. Push to Gitea
git push origin feature/my-feature
# 4. Create pull request via Gitea UI
# 5. Address review comments
# 6. Merge when approved
```
**Database Migrations:**
```bash
# Create new migration
migrate create -ext sql -dir migrations -seq add_new_table
# Edit up and down files
# migrations/008_add_new_table.up.sql
# migrations/008_add_new_table.down.sql
# Test migration
migrate -database "postgres://..." -path ./migrations up
migrate -database "postgres://..." -path ./migrations down
```
---
## References
- **CHORUS Project**: Autonomous AI agent system (depends on WHOOSH for orchestration)
- **BACKBEAT**: Cluster-wide timing and coordination system
- **BZZZ**: Distributed task system integration
- **SLURP**: UCXL content address system
- **BUBBLE**: Decision tracking and policy management
**Related Documentation:**
- `/home/tony/chorus/CLAUDE.md` - Project instructions
- `/home/tony/chorus/GEMINI.md` - Cluster context
- `/home/tony/chorus/project-queues/active/WHOOSH/README.md` - Quick start
- `/home/tony/chorus/project-queues/active/WHOOSH/docs/progress/WHOOSH-roadmap.md` - Development roadmap
- `/home/tony/chorus/project-queues/active/WHOOSH/DEVELOPMENT_PLAN.md` - Implementation plan
**External Resources:**
- Docker Swarm Documentation: https://docs.docker.com/engine/swarm/
- PostgreSQL Documentation: https://www.postgresql.org/docs/
- Go Documentation: https://go.dev/doc/
- OpenTelemetry Go: https://opentelemetry.io/docs/instrumentation/go/
---
**Document Version**: 1.0
**Generated**: October 2025
**Maintained by**: WHOOSH Development Team