 c5b7311a8b
			
		
	
	c5b7311a8b
	
	
	
		
			
			Comprehensive documentation for coordination, messaging, discovery, and internal systems. Core Coordination Packages: - pkg/election - Democratic leader election (uptime-based, heartbeat mechanism, SLURP integration) - pkg/coordination - Meta-coordination with dependency detection (4 built-in rules) - coordinator/ - Task orchestration and assignment (AI-powered scoring) - discovery/ - mDNS peer discovery (automatic LAN detection) Messaging & P2P Infrastructure: - pubsub/ - GossipSub messaging (31 message types, role-based topics, HMMM integration) - p2p/ - libp2p networking (DHT modes, connection management, security) Monitoring & Health: - pkg/metrics - Prometheus metrics (80+ metrics across 12 categories) - pkg/health - Health monitoring (4 HTTP endpoints, enhanced checks, graceful degradation) Internal Systems: - internal/licensing - License validation (KACHING integration, cluster leases, fail-closed) - internal/hapui - Human Agent Portal UI (9 commands, HMMM wizard, UCXL browser, decision voting) - internal/backbeat - P2P operation telemetry (6 phases, beat synchronization, health reporting) Documentation Statistics (Phase 3): - 10 packages documented (~18,000 lines) - 31 PubSub message types cataloged - 80+ Prometheus metrics documented - Complete API references with examples - Integration patterns and best practices Key Features Documented: - Election: 5 triggers, candidate scoring (5 weighted components), stability windows - Coordination: AI-powered dependency detection, cross-repo sessions, escalation handling - PubSub: Topic patterns, message envelopes, SHHH redaction, Hypercore logging - Metrics: All metric types with labels, Prometheus scrape config, alert rules - Health: Liveness vs readiness, critical checks, Kubernetes integration - Licensing: Grace periods, circuit breaker, cluster lease management - HAP UI: Interactive terminal commands, HMMM composition wizard, web interface (beta) - BACKBEAT: 6-phase operation tracking, beat budget estimation, drift detection Implementation Status Marked: - ✅ Production: Election, metrics, health, licensing, pubsub, p2p, discovery, coordinator - 🔶 Beta: HAP web interface, BACKBEAT telemetry, advanced coordination - 🔷 Alpha: SLURP election scoring - ⚠️ Experimental: Meta-coordination, AI-powered dependency detection Progress: 22/62 files complete (35%) Next Phase: AI providers, SLURP system, API layer, reasoning engine 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
		
			
				
	
	
	
		
			20 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Package: coordinator
Location: /home/tony/chorus/project-queues/active/CHORUS/coordinator/
Overview
The coordinator package provides the TaskCoordinator - the main orchestrator for distributed task management in CHORUS. It handles task discovery, intelligent assignment, execution coordination, and real-time progress tracking across multiple repositories and agents. The coordinator integrates with the PubSub system for role-based collaboration and uses AI-powered execution engines for autonomous task completion.
Core Components
TaskCoordinator
The central orchestrator managing task lifecycle across the distributed CHORUS network.
type TaskCoordinator struct {
    pubsub     *pubsub.PubSub
    hlog       *logging.HypercoreLog
    ctx        context.Context
    config     *config.Config
    hmmmRouter *hmmm.Router
    // Repository management
    providers    map[int]repository.TaskProvider // projectID -> provider
    providerLock sync.RWMutex
    factory      repository.ProviderFactory
    // Task management
    activeTasks map[string]*ActiveTask // taskKey -> active task
    taskLock    sync.RWMutex
    taskMatcher repository.TaskMatcher
    taskTracker TaskProgressTracker
    // Task execution
    executionEngine execution.TaskExecutionEngine
    // Agent tracking
    nodeID    string
    agentInfo *repository.AgentInfo
    // Sync settings
    syncInterval time.Duration
    lastSync     map[int]time.Time
    syncLock     sync.RWMutex
}
Key Responsibilities:
- Discover available tasks across multiple repositories
- Score and assign tasks based on agent capabilities and expertise
- Coordinate task execution with AI-powered execution engines
- Track active tasks and broadcast progress updates
- Request and coordinate multi-agent collaboration
- Integrate with HMMM for meta-discussion and coordination
ActiveTask
Represents a task currently being worked on by an agent.
type ActiveTask struct {
    Task      *repository.Task
    Provider  repository.TaskProvider
    ProjectID int
    ClaimedAt time.Time
    Status    string // claimed, working, completed, failed
    AgentID   string
    Results   map[string]interface{}
}
Task Lifecycle States:
- claimed - Task has been claimed by an agent
- working - Agent is actively executing the task
- completed - Task finished successfully
- failed - Task execution failed
TaskProgressTracker Interface
Callback interface for tracking task progress and updating availability broadcasts.
type TaskProgressTracker interface {
    AddTask(taskID string)
    RemoveTask(taskID string)
}
This interface ensures availability broadcasts accurately reflect current workload.
Task Coordination Flow
1. Initialization
coordinator := NewTaskCoordinator(
    ctx,
    ps,           // PubSub instance
    hlog,         // Hypercore log
    cfg,          // Agent configuration
    nodeID,       // P2P node ID
    hmmmRouter,   // HMMM router for meta-discussion
    tracker,      // Task progress tracker
)
coordinator.Start()
Initialization Process:
- Creates agent info from configuration
- Sets up task execution engine with AI providers
- Announces agent role and capabilities via PubSub
- Starts task discovery loop
- Begins listening for role-based messages
2. Task Discovery and Assignment
Discovery Loop (runs every 30 seconds):
taskDiscoveryLoop() ->
  (Discovery now handled by WHOOSH integration)
Task Evaluation (shouldProcessTask):
func (tc *TaskCoordinator) shouldProcessTask(task *repository.Task) bool {
    // 1. Check capacity: currentTasks < maxTasks
    // 2. Check if already assigned to this agent
    // 3. Score task fit for agent capabilities
    // 4. Return true if score > 0.5 threshold
}
Task Scoring:
- Agent role matches required role
- Agent expertise matches required expertise
- Current workload vs capacity
- Task priority level
- Historical performance scores
3. Task Claiming and Processing
processTask() flow:
  1. Evaluate if collaboration needed (shouldRequestCollaboration)
  2. Request collaboration via PubSub if needed
  3. Claim task through repository provider
  4. Create ActiveTask and store in activeTasks map
  5. Log claim to Hypercore
  6. Announce claim via PubSub (TaskProgress message)
  7. Seed HMMM meta-discussion room for task
  8. Start execution in background goroutine
Collaboration Request Criteria:
- Task priority >= 8 (high priority)
- Task requires expertise agent doesn't have
- Complex multi-component tasks
4. Task Execution
AI-Powered Execution (executeTaskWithAI):
executionRequest := &execution.TaskExecutionRequest{
    ID:          "repo:taskNumber",
    Type:        determineTaskType(task), // bug_fix, feature_development, etc.
    Description: buildTaskDescription(task),
    Context:     buildTaskContext(task),
    Requirements: &execution.TaskRequirements{
        AIModel:        "", // Auto-selected based on role
        SandboxType:    "docker",
        RequiredTools:  []string{"git", "curl"},
        EnvironmentVars: map[string]string{
            "TASK_ID":    taskID,
            "REPOSITORY": repoName,
            "AGENT_ID":   agentID,
            "AGENT_ROLE": agentRole,
        },
    },
    Timeout: 10 * time.Minute,
}
result := tc.executionEngine.ExecuteTask(ctx, executionRequest)
Task Type Detection:
- bug_fix - Keywords: "bug", "fix"
- feature_development - Keywords: "feature", "implement"
- testing - Keywords: "test"
- documentation - Keywords: "doc", "documentation"
- refactoring - Keywords: "refactor"
- code_review - Keywords: "review"
- development - Default for general tasks
Fallback Mock Execution: If AI execution engine is unavailable or fails, falls back to mock execution with simulated work time.
5. Task Completion
executeTask() completion flow:
  1. Update ActiveTask status to "completed"
  2. Complete task through repository provider
  3. Remove from activeTasks map
  4. Update TaskProgressTracker
  5. Log completion to Hypercore
  6. Announce completion via PubSub
Task Result Structure:
type TaskResult struct {
    Success  bool
    Message  string
    Metadata map[string]interface{} // Includes:
                                     // - execution_type (ai_powered/mock)
                                     // - duration
                                     // - commands_executed
                                     // - files_generated
                                     // - resource_usage
                                     // - artifacts
}
PubSub Integration
Published Message Types
1. RoleAnnouncement
Topic: hmmm/meta-discussion/v1
Frequency: Once on startup, when capabilities change
{
  "type": "role_announcement",
  "from": "peer_id",
  "from_role": "Senior Backend Developer",
  "data": {
    "agent_id": "agent-001",
    "node_id": "Qm...",
    "role": "Senior Backend Developer",
    "expertise": ["Go", "PostgreSQL", "Kubernetes"],
    "capabilities": ["code", "test", "deploy"],
    "max_tasks": 3,
    "current_tasks": 0,
    "status": "ready",
    "specialization": "microservices"
  }
}
2. TaskProgress
Topic: CHORUS/coordination/v1
Frequency: On claim, start, completion
Task Claim:
{
  "type": "task_progress",
  "from": "peer_id",
  "from_role": "Senior Backend Developer",
  "thread_id": "task-myrepo-42",
  "data": {
    "task_number": 42,
    "repository": "myrepo",
    "title": "Add authentication endpoint",
    "agent_id": "agent-001",
    "agent_role": "Senior Backend Developer",
    "claim_time": "2025-09-30T10:00:00Z",
    "estimated_completion": "2025-09-30T11:00:00Z"
  }
}
Task Status Update:
{
  "type": "task_progress",
  "from": "peer_id",
  "from_role": "Senior Backend Developer",
  "thread_id": "task-myrepo-42",
  "data": {
    "task_number": 42,
    "repository": "myrepo",
    "agent_id": "agent-001",
    "agent_role": "Senior Backend Developer",
    "status": "started" | "completed",
    "timestamp": "2025-09-30T10:05:00Z"
  }
}
3. TaskHelpRequest
Topic: hmmm/meta-discussion/v1
Frequency: When collaboration needed
{
  "type": "task_help_request",
  "from": "peer_id",
  "from_role": "Senior Backend Developer",
  "to_roles": ["Database Specialist"],
  "required_expertise": ["PostgreSQL", "Query Optimization"],
  "priority": "high",
  "thread_id": "task-myrepo-42",
  "data": {
    "task_number": 42,
    "repository": "myrepo",
    "title": "Optimize database queries",
    "required_role": "Database Specialist",
    "required_expertise": ["PostgreSQL", "Query Optimization"],
    "priority": 8,
    "requester_role": "Senior Backend Developer",
    "reason": "expertise_gap"
  }
}
Received Message Types
1. TaskHelpRequest
Handler: handleTaskHelpRequest
Response Logic:
- Check if agent has required expertise
- Verify agent has available capacity (currentTasks < maxTasks)
- If can help, send TaskHelpResponse
- Reflect offer into HMMM per-issue room
Response Message:
{
  "type": "task_help_response",
  "from": "peer_id",
  "from_role": "Database Specialist",
  "thread_id": "task-myrepo-42",
  "data": {
    "agent_id": "agent-002",
    "agent_role": "Database Specialist",
    "expertise": ["PostgreSQL", "Query Optimization", "Indexing"],
    "availability": 2,
    "offer_type": "collaboration",
    "response_to": { /* original help request data */ }
  }
}
2. ExpertiseRequest
Handler: handleExpertiseRequest
Processes requests for specific expertise areas.
3. CoordinationRequest
Handler: handleCoordinationRequest
Handles coordination requests for multi-agent tasks.
4. RoleAnnouncement
Handler: handleRoleAnnouncement
Logs when other agents announce their roles and capabilities.
HMMM Integration
Per-Issue Room Seeding
When a task is claimed, the coordinator seeds a HMMM meta-discussion room:
seedMsg := hmmm.Message{
    Version:   1,
    Type:      "meta_msg",
    IssueID:   int64(taskNumber),
    ThreadID:  fmt.Sprintf("issue-%d", taskNumber),
    MsgID:     uuid.New().String(),
    NodeID:    nodeID,
    HopCount:  0,
    Timestamp: time.Now().UTC(),
    Message:   "Seed: Task 'title' claimed. Description: ...",
}
hmmmRouter.Publish(ctx, seedMsg)
Purpose:
- Creates dedicated discussion space for task
- Enables agents to coordinate on specific tasks
- Integrates with broader meta-coordination system
- Provides context for SLURP event generation
Help Offer Reflection
When agents offer help, the offer is reflected into the HMMM room:
hmsg := hmmm.Message{
    Version:   1,
    Type:      "meta_msg",
    IssueID:   issueID,
    ThreadID:  fmt.Sprintf("issue-%d", issueID),
    MsgID:     uuid.New().String(),
    NodeID:    nodeID,
    HopCount:  0,
    Timestamp: time.Now().UTC(),
    Message:   fmt.Sprintf("Help offer from %s (availability %d)",
                          agentRole, availableSlots),
}
Availability Tracking
The coordinator tracks task progress to keep availability broadcasts accurate:
// When task is claimed:
if tc.taskTracker != nil {
    tc.taskTracker.AddTask(taskKey)
}
// When task completes:
if tc.taskTracker != nil {
    tc.taskTracker.RemoveTask(taskKey)
}
This ensures the availability broadcaster (in internal/runtime) has accurate real-time data:
{
  "type": "availability_broadcast",
  "data": {
    "node_id": "Qm...",
    "available_for_work": true,
    "current_tasks": 1,
    "max_tasks": 3,
    "last_activity": 1727692800,
    "status": "working",
    "timestamp": 1727692800
  }
}
Task Assignment Algorithm
Scoring System
The TaskMatcher scores tasks for agents based on multiple factors:
Score = (roleMatch * 0.4) +
        (expertiseMatch * 0.3) +
        (availabilityScore * 0.2) +
        (performanceScore * 0.1)
Where:
- roleMatch: 1.0 if agent role matches required role, 0.5 for partial match
- expertiseMatch: percentage of required expertise agent possesses
- availabilityScore: (maxTasks - currentTasks) / maxTasks
- performanceScore: agent's historical performance metric (0.0-1.0)
Threshold: Tasks with score > 0.5 are considered for assignment.
Assignment Priority
Tasks are prioritized by:
- Priority Level (task.Priority field, 0-10)
- Task Score (calculated by matcher)
- Age (older tasks first)
- Dependencies (tasks blocking others)
Claim Race Condition Handling
Multiple agents may attempt to claim the same task:
1. Agent A evaluates task: score = 0.8, attempts claim
2. Agent B evaluates task: score = 0.7, attempts claim
3. Repository provider uses atomic claim operation
4. First successful claim wins
5. Other agents receive claim failure
6. Failed agents continue to next task
Error Handling
Task Execution Failures
// On AI execution failure:
if err := tc.executeTaskWithAI(activeTask); err != nil {
    // Fall back to mock execution
    taskResult = tc.executeMockTask(activeTask)
}
// On completion failure:
if err := provider.CompleteTask(task, result); err != nil {
    // Update status to failed
    activeTask.Status = "failed"
    activeTask.Results = map[string]interface{}{
        "error": err.Error(),
    }
}
Collaboration Request Failures
err := tc.pubsub.PublishRoleBasedMessage(
    pubsub.TaskHelpRequest, data, opts)
if err != nil {
    // Log error but continue with task
    fmt.Printf("⚠️ Failed to request collaboration: %v\n", err)
    // Task execution proceeds without collaboration
}
HMMM Seeding Failures
if err := tc.hmmmRouter.Publish(ctx, seedMsg); err != nil {
    // Log error to Hypercore
    tc.hlog.AppendString("system_error", map[string]interface{}{
        "error":       "hmmm_seed_failed",
        "task_number": taskNumber,
        "repository":  repository,
        "message":     err.Error(),
    })
    // Task execution continues without HMMM room
}
Agent Configuration
Required Configuration
agent:
  id: "agent-001"
  role: "Senior Backend Developer"
  expertise:
    - "Go"
    - "PostgreSQL"
    - "Docker"
    - "Kubernetes"
  capabilities:
    - "code"
    - "test"
    - "deploy"
  max_tasks: 3
  specialization: "microservices"
  models:
    - name: "llama3.1:70b"
      provider: "ollama"
      endpoint: "http://192.168.1.72:11434"
AgentInfo Structure
type AgentInfo struct {
    ID           string
    Role         string
    Expertise    []string
    CurrentTasks int
    MaxTasks     int
    Status       string // ready, working, busy, offline
    LastSeen     time.Time
    Performance  map[string]interface{} // score: 0.8
    Availability string // available, busy, offline
}
Hypercore Logging
All coordination events are logged to Hypercore:
Task Claimed
hlog.Append(logging.TaskClaimed, map[string]interface{}{
    "task_number":   taskNumber,
    "repository":    repository,
    "title":         title,
    "required_role": requiredRole,
    "priority":      priority,
})
Task Completed
hlog.Append(logging.TaskCompleted, map[string]interface{}{
    "task_number": taskNumber,
    "repository":  repository,
    "duration":    durationSeconds,
    "results":     resultsMap,
})
Status Reporting
Coordinator Status
status := coordinator.GetStatus()
// Returns:
{
    "agent_id":         "agent-001",
    "role":             "Senior Backend Developer",
    "expertise":        ["Go", "PostgreSQL", "Docker"],
    "current_tasks":    1,
    "max_tasks":        3,
    "active_providers": 2,
    "status":           "working",
    "active_tasks": [
        {
            "repository": "myrepo",
            "number":     42,
            "title":      "Add authentication",
            "status":     "working",
            "claimed_at": "2025-09-30T10:00:00Z"
        }
    ]
}
Best Practices
Task Coordinator Usage
- Initialize Early: Create coordinator during agent startup
- Set Task Tracker: Always provide TaskProgressTracker for accurate availability
- Configure HMMM: Wire up hmmmRouter for meta-discussion integration
- Monitor Status: Periodically check GetStatus() for health monitoring
- Handle Failures: Implement proper error handling for degraded operation
Configuration Tuning
- Max Tasks: Set based on agent resources (CPU, memory, AI model capacity)
- Sync Interval: Balance between responsiveness and network overhead (default: 30s)
- Task Scoring: Adjust threshold (default: 0.5) based on task availability
- Collaboration: Enable for high-priority or expertise-gap tasks
Performance Optimization
- Task Discovery: Delegate to WHOOSH for efficient search and indexing
- Concurrent Execution: Use goroutines for parallel task execution
- Lock Granularity: Minimize lock contention with separate locks for providers/tasks
- Caching: Cache agent info and provider connections
Integration Points
With PubSub
- Publishes: RoleAnnouncement, TaskProgress, TaskHelpRequest
- Subscribes: TaskHelpRequest, ExpertiseRequest, CoordinationRequest
- Topics: CHORUS/coordination/v1, hmmm/meta-discussion/v1
With HMMM
- Seeds per-issue discussion rooms
- Reflects help offers into rooms
- Enables agent coordination on specific tasks
With Repository Providers
- Claims tasks atomically
- Fetches task details
- Updates task status
- Completes tasks with results
With Execution Engine
- Converts repository tasks to execution requests
- Executes tasks with AI providers
- Handles sandbox environments
- Collects execution metrics and artifacts
With Hypercore
- Logs task claims
- Logs task completions
- Logs coordination errors
- Provides audit trail
Task Message Format
PubSub Task Messages
All task-related messages follow the standard PubSub Message format:
type Message struct {
    Type              MessageType            // e.g., "task_progress"
    From              string                 // Peer ID
    Timestamp         time.Time
    Data              map[string]interface{} // Message payload
    HopCount          int
    FromRole          string                 // Agent role
    ToRoles           []string               // Target roles
    RequiredExpertise []string               // Required expertise
    ProjectID         string
    Priority          string                 // low, medium, high, urgent
    ThreadID          string                 // Conversation thread
}
Task Assignment Message Flow
1. TaskAnnouncement (WHOOSH → PubSub)
   ├─ Available task discovered
   └─ Broadcast to coordination topic
2. Task Evaluation (Local)
   ├─ Score task for agent
   └─ Decide whether to claim
3. TaskClaim (Agent → Repository)
   ├─ Atomic claim operation
   └─ Only one agent succeeds
4. TaskProgress (Agent → PubSub)
   ├─ Announce claim to network
   └─ Status: "claimed"
5. TaskHelpRequest (Optional, Agent → PubSub)
   ├─ Request collaboration if needed
   └─ Target specific roles/expertise
6. TaskHelpResponse (Other Agents → PubSub)
   ├─ Offer assistance
   └─ Include availability info
7. TaskProgress (Agent → PubSub)
   ├─ Announce work started
   └─ Status: "started"
8. Task Execution (Local with AI Engine)
   ├─ Execute task in sandbox
   └─ Generate artifacts
9. TaskProgress (Agent → PubSub)
   ├─ Announce completion
   └─ Status: "completed"
See Also
- discovery/ - mDNS peer discovery for local network
- pkg/coordination/ - Coordination primitives and dependency detection
- pubsub/ - PubSub messaging system
- pkg/execution/ - Task execution engine
- pkg/hmmm/ - Meta-discussion and coordination
- internal/runtime - Agent runtime and availability broadcasting