Files
CHORUS/docs/development/task-execution-engine-plan.md
anthonyrawlins 9fc9a2e3a2 docs: Add comprehensive implementation roadmap to task execution engine plan
- Add detailed phase-by-phase implementation strategy
- Define semantic versioning and Git workflow standards
- Specify quality gates and testing requirements
- Include risk mitigation and deployment strategies
- Provide clear deliverables and timelines for each phase
2025-09-25 10:40:30 +10:00

15 KiB

CHORUS Task Execution Engine Development Plan

Overview

This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations.

Current State Analysis

What's Implemented

  • Task Coordinator Framework (coordinator/task_coordinator.go): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration
  • Agent Role System: Role announcements, capability broadcasting, and expertise matching
  • P2P Infrastructure: Nodes can discover each other and communicate via pubsub
  • Health Monitoring: Comprehensive health checks and graceful shutdown

Critical Gaps Identified

  • Task Execution Engine: executeTask() only has a 10-second sleep simulation - no actual work performed
  • Repository Integration: Mock providers only - no real GitHub/GitLab task pulling
  • Agent-to-Task Binding: Task discovery relies on WHOOSH but agents don't connect to real work
  • Role-Based Execution: Agents announce roles but don't execute tasks according to their specialization
  • AI Integration: No LLM/reasoning integration for task completion

Architecture Requirements

Model and Provider Abstraction

The execution engine must support multiple AI model providers and execution environments:

Model Provider Types:

  • Local Ollama: Default for most roles (llama3.1:8b, codellama, etc.)
  • OpenAI API: For specialized models (chatgpt-5, gpt-4o, etc.)
  • ResetData API: For testing and fallback (llama3.1:8b via LaaS)
  • Custom Endpoints: Support for other provider APIs

Role-Model Mapping:

  • Each role has a default model configuration
  • Specialized roles may require specific models/providers
  • Model selection transparent to execution logic
  • Support for MCP calls and tool usage regardless of provider

Execution Environment Abstraction

Tasks must execute in secure, isolated environments while maintaining transparency:

Sandbox Types:

  • Docker Containers: Isolated execution environment per task
  • Specialized VMs: For tasks requiring full OS isolation
  • Process Sandboxing: Lightweight isolation for simple tasks

Transparency Requirements:

  • Model perceives it's working on a local repository
  • Development tools available within sandbox
  • File system operations work normally from model's perspective
  • Network access controlled but transparent
  • Resource limits enforced but invisible

Development Plan

Phase 1: Model Provider Abstraction Layer

1.1 Create Provider Interface

// pkg/ai/provider.go
type ModelProvider interface {
    ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error)
    SupportsMCP() bool
    SupportsTools() bool
    GetCapabilities() []string
}

1.2 Implement Provider Types

  • OllamaProvider: Local model execution
  • OpenAIProvider: OpenAI API integration
  • ResetDataProvider: ResetData LaaS integration
  • ProviderFactory: Creates appropriate provider based on model config

1.3 Role-Model Configuration

# Config structure for role-model mapping
roles:
  developer:
    default_model: "codellama:13b"
    provider: "ollama"
    fallback_model: "llama3.1:8b"
    fallback_provider: "resetdata"

  architect:
    default_model: "gpt-4o"
    provider: "openai"
    fallback_model: "llama3.1:8b"
    fallback_provider: "ollama"

Phase 2: Execution Environment Abstraction

2.1 Create Sandbox Interface

// pkg/execution/sandbox.go
type ExecutionSandbox interface {
    Initialize(ctx context.Context, config *SandboxConfig) error
    ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error)
    CopyFiles(ctx context.Context, source, dest string) error
    Cleanup() error
}

2.2 Implement Sandbox Types

  • DockerSandbox: Container-based isolation
  • VMSandbox: Full VM isolation for sensitive tasks
  • ProcessSandbox: Lightweight process-based isolation

2.3 Repository Mounting

  • Clone repository into sandbox environment
  • Mount as local filesystem from model's perspective
  • Implement secure file I/O operations
  • Handle git operations within sandbox

Phase 3: Core Task Execution Engine

3.1 Replace Mock Implementation

Replace the current simulation in coordinator/task_coordinator.go:314:

// Current mock implementation
time.Sleep(10 * time.Second) // Simulate work

// New implementation
result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{
    Task: activeTask.Task,
    Agent: tc.agentInfo,
    Sandbox: sandboxConfig,
    ModelProvider: providerConfig,
})

3.2 Task Execution Strategies

Create role-specific execution patterns:

  • DeveloperStrategy: Code implementation, bug fixes, feature development
  • ReviewerStrategy: Code review, quality analysis, test coverage assessment
  • ArchitectStrategy: System design, technical decision making
  • TesterStrategy: Test creation, validation, quality assurance

3.3 Execution Workflow

  1. Task Analysis: Parse task requirements and complexity
  2. Environment Setup: Initialize appropriate sandbox
  3. Repository Preparation: Clone and mount repository
  4. Model Selection: Choose appropriate model/provider
  5. Task Execution: Run role-specific execution strategy
  6. Result Validation: Verify output quality and completeness
  7. Cleanup: Teardown sandbox and collect artifacts

Phase 4: Repository Provider Implementation

4.1 Real Repository Integration

Replace MockTaskProvider with actual implementations:

  • GiteaProvider: Integration with GITEA API
  • GitHubProvider: GitHub API integration
  • GitLabProvider: GitLab API integration

4.2 Task Lifecycle Management

  • Task claiming and status updates
  • Progress reporting back to repositories
  • Artifact attachment (patches, documentation, etc.)
  • Automated PR/MR creation for completed tasks

Phase 5: AI Integration and Tool Support

5.1 LLM Integration

  • Context-aware task analysis based on repository content
  • Code generation and problem-solving capabilities
  • Natural language processing for task descriptions
  • Multi-step reasoning for complex tasks

5.2 Tool Integration

  • MCP server connectivity within sandbox
  • Development tool access (compilers, linters, formatters)
  • Testing framework integration
  • Documentation generation tools

5.3 Quality Assurance

  • Automated testing of generated code
  • Code quality metrics and analysis
  • Security vulnerability scanning
  • Performance impact assessment

Phase 6: Testing and Validation

6.1 Unit Testing

  • Provider abstraction layer testing
  • Sandbox isolation verification
  • Task execution strategy validation
  • Error handling and recovery testing

6.2 Integration Testing

  • End-to-end task execution workflows
  • Agent-to-WHOOSH communication testing
  • Multi-provider failover scenarios
  • Concurrent task execution testing

6.3 Security Testing

  • Sandbox escape prevention
  • Resource limit enforcement
  • Network isolation validation
  • Secrets and credential protection

Phase 7: Production Deployment

7.1 Configuration Management

  • Environment-specific model configurations
  • Sandbox resource limit definitions
  • Provider API key management
  • Monitoring and logging setup

7.2 Monitoring and Observability

  • Task execution metrics and dashboards
  • Performance monitoring and alerting
  • Resource utilization tracking
  • Error rate and success metrics

Implementation Priorities

Critical Path (Week 1-2)

  1. Model Provider Abstraction Layer
  2. Basic Docker Sandbox Implementation
  3. Replace Mock Task Execution
  4. Role-Based Execution Strategies

High Priority (Week 3-4)

  1. Real Repository Provider Implementation
  2. AI Integration with Ollama/OpenAI
  3. MCP Tool Integration
  4. Basic Testing Framework

Medium Priority (Week 5-6)

  1. Advanced Sandbox Types (VM, Process)
  2. Quality Assurance Pipeline
  3. Comprehensive Testing Suite
  4. Performance Optimization

Future Enhancements

  • Multi-language model support
  • Advanced reasoning capabilities
  • Distributed task execution
  • Machine learning model fine-tuning

Success Metrics

  • Task Completion Rate: >90% of assigned tasks successfully completed
  • Code Quality: Generated code passes all existing tests and linting
  • Security: Zero sandbox escapes or security violations
  • Performance: Task execution time within acceptable bounds
  • Reliability: <5% execution failure rate due to engine issues

Risk Mitigation

Security Risks

  • Sandbox escape → Multiple isolation layers, security audits
  • Credential exposure → Secure credential management, rotation
  • Resource exhaustion → Resource limits, monitoring, auto-scaling

Technical Risks

  • Model provider outages → Multi-provider failover, local fallbacks
  • Execution failures → Robust error handling, retry mechanisms
  • Performance bottlenecks → Profiling, optimization, horizontal scaling

Integration Risks

  • WHOOSH compatibility → Extensive integration testing, versioning
  • Repository provider changes → Provider abstraction, API versioning
  • Model compatibility → Provider abstraction, capability detection

This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment.

Implementation Roadmap

Development Standards & Workflow

Semantic Versioning Strategy:

  • Patch (0.N.X): Bug fixes, small improvements, documentation updates
  • Minor (0.N.0): New features, phase completions, non-breaking changes
  • Major (N.0.0): Breaking changes, major architectural shifts

Git Workflow:

  1. Branch Creation: git checkout -b feature/phase-N-description
  2. Development: Implement with frequent commits using conventional commit format
  3. Testing: Run full test suite with make test before PR
  4. Code Review: Create PR with detailed description and test results
  5. Integration: Squash merge to main after approval
  6. Release: Tag with git tag v0.N.0 and update Makefile version

Quality Gates: Each phase must meet these criteria before merge:

  • Unit tests with >80% coverage
  • Integration tests for external dependencies
  • Security review for new attack surfaces
  • Performance benchmarks within acceptable bounds
  • Documentation updates (code comments + README)
  • Backward compatibility verification

Phase-by-Phase Implementation

Phase 1: Model Provider Abstraction (v0.2.0)

Branch: feature/phase-1-model-providers Duration: 3-5 days Deliverables:

pkg/ai/
├── provider.go        # Core provider interface & request/response types
├── ollama.go          # Local Ollama model integration
├── openai.go          # OpenAI API client wrapper
├── resetdata.go       # ResetData LaaS integration
├── factory.go         # Provider factory with auto-selection
└── provider_test.go   # Comprehensive provider tests

configs/
└── models.yaml        # Role-model mapping configuration

Key Features:

  • Abstract AI providers behind unified interface
  • Support multiple providers with automatic failover
  • Configuration-driven model selection per agent role
  • Proper error handling and retry logic

Phase 2: Execution Environment Abstraction (v0.3.0)

Branch: feature/phase-2-execution-sandbox Duration: 5-7 days Deliverables:

pkg/execution/
├── sandbox.go         # Core sandbox interface & types
├── docker.go          # Docker container implementation
├── security.go        # Security policies & enforcement
├── resources.go       # Resource monitoring & limits
└── sandbox_test.go    # Sandbox security & isolation tests

Key Features:

  • Docker-based task isolation with transparent repository access
  • Resource limits (CPU, memory, network, disk) with monitoring
  • Security boundary enforcement and escape prevention
  • Clean teardown and artifact collection

Phase 3: Core Task Execution Engine (v0.4.0)

Branch: feature/phase-3-task-execution Duration: 7-10 days Modified Files:

  • coordinator/task_coordinator.go:314 - Replace mock with real execution
  • pkg/repository/types.go - Extend interfaces for execution context

New Files:

pkg/strategies/
├── developer.go       # Code implementation & bug fixes
├── reviewer.go        # Code review & quality analysis
├── architect.go       # System design & tech decisions
└── tester.go          # Test creation & validation

pkg/engine/
├── executor.go        # Main execution orchestrator
├── workflow.go        # 7-step execution workflow
└── validation.go      # Result quality verification

Key Features:

  • Real task execution replacing 10-second sleep simulation
  • Role-specific execution strategies with appropriate tooling
  • Integration between AI providers, sandboxes, and task lifecycle
  • Comprehensive result validation and quality metrics

Phase 4: Repository Provider Implementation (v0.5.0)

Branch: feature/phase-4-real-providers Duration: 10-14 days Deliverables:

pkg/providers/
├── gitea.go           # Gitea API integration (primary)
├── github.go          # GitHub API integration
├── gitlab.go          # GitLab API integration
└── provider_test.go   # API integration tests

Key Features:

  • Replace MockTaskProvider with production implementations
  • Task claiming, status updates, and progress reporting via APIs
  • Automated PR/MR creation with proper branch management
  • Repository-specific configuration and credential management

Testing Strategy

Unit Testing:

  • Each provider/sandbox implementation has dedicated test suite
  • Mock external dependencies (APIs, Docker, etc.) for isolated testing
  • Property-based testing for core interfaces
  • Error condition and edge case coverage

Integration Testing:

  • End-to-end task execution workflows
  • Multi-provider failover scenarios
  • Agent-to-WHOOSH communication validation
  • Concurrent task execution under load

Security Testing:

  • Sandbox escape prevention validation
  • Resource exhaustion protection
  • Network isolation verification
  • Secrets and credential protection audits

Deployment & Monitoring

Configuration Management:

  • Environment-specific model configurations
  • Sandbox resource limits per environment
  • Provider API credentials via secure secret management
  • Feature flags for gradual rollout

Observability:

  • Task execution metrics (completion rate, duration, success/failure)
  • Resource utilization tracking (CPU, memory, network per task)
  • Error rate monitoring with alerting thresholds
  • Performance dashboards for capacity planning

Risk Mitigation

Technical Risks:

  • Provider Outages: Multi-provider failover with health checks
  • Resource Exhaustion: Strict limits with monitoring and auto-scaling
  • Execution Failures: Retry mechanisms with exponential backoff

Security Risks:

  • Sandbox Escapes: Multiple isolation layers and regular security audits
  • Credential Exposure: Secure rotation and least-privilege access
  • Data Exfiltration: Network isolation and egress monitoring

Integration Risks:

  • API Changes: Provider abstraction with versioning support
  • Performance Degradation: Comprehensive benchmarking at each phase
  • Compatibility Issues: Extensive integration testing with existing systems