2025-09-26 06:10:01 +00:00
1 changed files with 435 additions and 0 deletions
--- a/docs/development/task-execution-engine-plan.md
+++ b/docs/development/task-execution-engine-plan.md
@@ -0,0 +1,435 @@
 # CHORUS Task Execution Engine Development Plan
 ## Overview
 This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations.
 ## Current State Analysis
 ### What's Implemented ✅
 - **Task Coordinator Framework** (`coordinator/task_coordinator.go`): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration
 - **Agent Role System**: Role announcements, capability broadcasting, and expertise matching
 - **P2P Infrastructure**: Nodes can discover each other and communicate via pubsub
 - **Health Monitoring**: Comprehensive health checks and graceful shutdown
 ### Critical Gaps Identified ❌
 - **Task Execution Engine**: `executeTask()` only has a 10-second sleep simulation - no actual work performed
 - **Repository Integration**: Mock providers only - no real GitHub/GitLab task pulling
 - **Agent-to-Task Binding**: Task discovery relies on WHOOSH but agents don't connect to real work
 - **Role-Based Execution**: Agents announce roles but don't execute tasks according to their specialization
 - **AI Integration**: No LLM/reasoning integration for task completion
 ## Architecture Requirements
 ### Model and Provider Abstraction
 The execution engine must support multiple AI model providers and execution environments:
 **Model Provider Types:**
 - **Local Ollama**: Default for most roles (llama3.1:8b, codellama, etc.)
 - **OpenAI API**: For specialized models (chatgpt-5, gpt-4o, etc.)
 - **ResetData API**: For testing and fallback (llama3.1:8b via LaaS)
 - **Custom Endpoints**: Support for other provider APIs
 **Role-Model Mapping:**
 - Each role has a default model configuration
 - Specialized roles may require specific models/providers
 - Model selection transparent to execution logic
 - Support for MCP calls and tool usage regardless of provider
 ### Execution Environment Abstraction
 Tasks must execute in secure, isolated environments while maintaining transparency:
 **Sandbox Types:**
 - **Docker Containers**: Isolated execution environment per task
 - **Specialized VMs**: For tasks requiring full OS isolation
 - **Process Sandboxing**: Lightweight isolation for simple tasks
 **Transparency Requirements:**
 - Model perceives it's working on a local repository
 - Development tools available within sandbox
 - File system operations work normally from model's perspective
 - Network access controlled but transparent
 - Resource limits enforced but invisible
 ## Development Plan
 ### Phase 1: Model Provider Abstraction Layer
 #### 1.1 Create Provider Interface
 ```go
 // pkg/ai/provider.go
 type ModelProvider interface {
    ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error)
    SupportsMCP() bool
    SupportsTools() bool
    GetCapabilities() []string
 }
 ```
 #### 1.2 Implement Provider Types
 - **OllamaProvider**: Local model execution
 - **OpenAIProvider**: OpenAI API integration
 - **ResetDataProvider**: ResetData LaaS integration
 - **ProviderFactory**: Creates appropriate provider based on model config
 #### 1.3 Role-Model Configuration
 ```yaml
 # Config structure for role-model mapping
 roles:
  developer:
    default_model: "codellama:13b"
    provider: "ollama"
    fallback_model: "llama3.1:8b"
    fallback_provider: "resetdata"
  architect:
    default_model: "gpt-4o"
    provider: "openai"
    fallback_model: "llama3.1:8b"
    fallback_provider: "ollama"
 ```
 ### Phase 2: Execution Environment Abstraction
 #### 2.1 Create Sandbox Interface
 ```go
 // pkg/execution/sandbox.go
 type ExecutionSandbox interface {
    Initialize(ctx context.Context, config *SandboxConfig) error
    ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error)
    CopyFiles(ctx context.Context, source, dest string) error
    Cleanup() error
 }
 ```
 #### 2.2 Implement Sandbox Types
 - **DockerSandbox**: Container-based isolation
 - **VMSandbox**: Full VM isolation for sensitive tasks
 - **ProcessSandbox**: Lightweight process-based isolation
 #### 2.3 Repository Mounting
 - Clone repository into sandbox environment
 - Mount as local filesystem from model's perspective
 - Implement secure file I/O operations
 - Handle git operations within sandbox
 ### Phase 3: Core Task Execution Engine
 #### 3.1 Replace Mock Implementation
 Replace the current simulation in `coordinator/task_coordinator.go:314`:
 ```go
 // Current mock implementation
 time.Sleep(10 * time.Second) // Simulate work
 // New implementation
 result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{
    Task: activeTask.Task,
    Agent: tc.agentInfo,
    Sandbox: sandboxConfig,
    ModelProvider: providerConfig,
 })
 ```
 #### 3.2 Task Execution Strategies
 Create role-specific execution patterns:
 - **DeveloperStrategy**: Code implementation, bug fixes, feature development
 - **ReviewerStrategy**: Code review, quality analysis, test coverage assessment
 - **ArchitectStrategy**: System design, technical decision making
 - **TesterStrategy**: Test creation, validation, quality assurance
 #### 3.3 Execution Workflow
 1. **Task Analysis**: Parse task requirements and complexity
 2. **Environment Setup**: Initialize appropriate sandbox
 3. **Repository Preparation**: Clone and mount repository
 4. **Model Selection**: Choose appropriate model/provider
 5. **Task Execution**: Run role-specific execution strategy
 6. **Result Validation**: Verify output quality and completeness
 7. **Cleanup**: Teardown sandbox and collect artifacts
 ### Phase 4: Repository Provider Implementation
 #### 4.1 Real Repository Integration
 Replace `MockTaskProvider` with actual implementations:
 - **GiteaProvider**: Integration with GITEA API
 - **GitHubProvider**: GitHub API integration
 - **GitLabProvider**: GitLab API integration
 #### 4.2 Task Lifecycle Management
 - Task claiming and status updates
 - Progress reporting back to repositories
 - Artifact attachment (patches, documentation, etc.)
 - Automated PR/MR creation for completed tasks
 ### Phase 5: AI Integration and Tool Support
 #### 5.1 LLM Integration
 - Context-aware task analysis based on repository content
 - Code generation and problem-solving capabilities
 - Natural language processing for task descriptions
 - Multi-step reasoning for complex tasks
 #### 5.2 Tool Integration
 - MCP server connectivity within sandbox
 - Development tool access (compilers, linters, formatters)
 - Testing framework integration
 - Documentation generation tools
 #### 5.3 Quality Assurance
 - Automated testing of generated code
 - Code quality metrics and analysis
 - Security vulnerability scanning
 - Performance impact assessment
 ### Phase 6: Testing and Validation
 #### 6.1 Unit Testing
 - Provider abstraction layer testing
 - Sandbox isolation verification
 - Task execution strategy validation
 - Error handling and recovery testing
 #### 6.2 Integration Testing
 - End-to-end task execution workflows
 - Agent-to-WHOOSH communication testing
 - Multi-provider failover scenarios
 - Concurrent task execution testing
 #### 6.3 Security Testing
 - Sandbox escape prevention
 - Resource limit enforcement
 - Network isolation validation
 - Secrets and credential protection
 ### Phase 7: Production Deployment
 #### 7.1 Configuration Management
 - Environment-specific model configurations
 - Sandbox resource limit definitions
 - Provider API key management
 - Monitoring and logging setup
 #### 7.2 Monitoring and Observability
 - Task execution metrics and dashboards
 - Performance monitoring and alerting
 - Resource utilization tracking
 - Error rate and success metrics
 ## Implementation Priorities
 ### Critical Path (Week 1-2)
 1. Model Provider Abstraction Layer
 2. Basic Docker Sandbox Implementation
 3. Replace Mock Task Execution
 4. Role-Based Execution Strategies
 ### High Priority (Week 3-4)
 5. Real Repository Provider Implementation
 6. AI Integration with Ollama/OpenAI
 7. MCP Tool Integration
 8. Basic Testing Framework
 ### Medium Priority (Week 5-6)
 9. Advanced Sandbox Types (VM, Process)
 10. Quality Assurance Pipeline
 11. Comprehensive Testing Suite
 12. Performance Optimization
 ### Future Enhancements
 - Multi-language model support
 - Advanced reasoning capabilities
 - Distributed task execution
 - Machine learning model fine-tuning
 ## Success Metrics
 - **Task Completion Rate**: >90% of assigned tasks successfully completed
 - **Code Quality**: Generated code passes all existing tests and linting
 - **Security**: Zero sandbox escapes or security violations
 - **Performance**: Task execution time within acceptable bounds
 - **Reliability**: <5% execution failure rate due to engine issues
 ## Risk Mitigation
 ### Security Risks
 - Sandbox escape → Multiple isolation layers, security audits
 - Credential exposure → Secure credential management, rotation
 - Resource exhaustion → Resource limits, monitoring, auto-scaling
 ### Technical Risks
 - Model provider outages → Multi-provider failover, local fallbacks
 - Execution failures → Robust error handling, retry mechanisms
 - Performance bottlenecks → Profiling, optimization, horizontal scaling
 ### Integration Risks
 - WHOOSH compatibility → Extensive integration testing, versioning
 - Repository provider changes → Provider abstraction, API versioning
 - Model compatibility → Provider abstraction, capability detection
 This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment.
 ## Implementation Roadmap
 ### Development Standards & Workflow
 **Semantic Versioning Strategy:**
 - **Patch (0.N.X)**: Bug fixes, small improvements, documentation updates
 - **Minor (0.N.0)**: New features, phase completions, non-breaking changes
 - **Major (N.0.0)**: Breaking changes, major architectural shifts
 **Git Workflow:**
 1. **Branch Creation**: `git checkout -b feature/phase-N-description`
 2. **Development**: Implement with frequent commits using conventional commit format
 3. **Testing**: Run full test suite with `make test` before PR
 4. **Code Review**: Create PR with detailed description and test results
 5. **Integration**: Squash merge to main after approval
 6. **Release**: Tag with `git tag v0.N.0` and update Makefile version
 **Quality Gates:**
 Each phase must meet these criteria before merge:
 - ✅ Unit tests with >80% coverage
 - ✅ Integration tests for external dependencies
 - ✅ Security review for new attack surfaces
 - ✅ Performance benchmarks within acceptable bounds
 - ✅ Documentation updates (code comments + README)
 - ✅ Backward compatibility verification
 ### Phase-by-Phase Implementation
 #### Phase 1: Model Provider Abstraction (v0.2.0)
 **Branch:** `feature/phase-1-model-providers`
 **Duration:** 3-5 days
 **Deliverables:**
 ```
 pkg/ai/
 ├── provider.go        # Core provider interface & request/response types
 ├── ollama.go          # Local Ollama model integration
 ├── openai.go          # OpenAI API client wrapper
 ├── resetdata.go       # ResetData LaaS integration
 ├── factory.go         # Provider factory with auto-selection
 └── provider_test.go   # Comprehensive provider tests
 configs/
 └── models.yaml        # Role-model mapping configuration
 ```
 **Key Features:**
 - Abstract AI providers behind unified interface
 - Support multiple providers with automatic failover
 - Configuration-driven model selection per agent role
 - Proper error handling and retry logic
 #### Phase 2: Execution Environment Abstraction (v0.3.0)
 **Branch:** `feature/phase-2-execution-sandbox`
 **Duration:** 5-7 days
 **Deliverables:**
 ```
 pkg/execution/
 ├── sandbox.go         # Core sandbox interface & types
 ├── docker.go          # Docker container implementation
 ├── security.go        # Security policies & enforcement
 ├── resources.go       # Resource monitoring & limits
 └── sandbox_test.go    # Sandbox security & isolation tests
 ```
 **Key Features:**
 - Docker-based task isolation with transparent repository access
 - Resource limits (CPU, memory, network, disk) with monitoring
 - Security boundary enforcement and escape prevention
 - Clean teardown and artifact collection
 #### Phase 3: Core Task Execution Engine (v0.4.0)
 **Branch:** `feature/phase-3-task-execution`
 **Duration:** 7-10 days
 **Modified Files:**
 - `coordinator/task_coordinator.go:314` - Replace mock with real execution
 - `pkg/repository/types.go` - Extend interfaces for execution context
 **New Files:**
 ```
 pkg/strategies/
 ├── developer.go       # Code implementation & bug fixes
 ├── reviewer.go        # Code review & quality analysis
 ├── architect.go       # System design & tech decisions
 └── tester.go          # Test creation & validation
 pkg/engine/
 ├── executor.go        # Main execution orchestrator
 ├── workflow.go        # 7-step execution workflow
 └── validation.go      # Result quality verification
 ```
 **Key Features:**
 - Real task execution replacing 10-second sleep simulation
 - Role-specific execution strategies with appropriate tooling
 - Integration between AI providers, sandboxes, and task lifecycle
 - Comprehensive result validation and quality metrics
 #### Phase 4: Repository Provider Implementation (v0.5.0)
 **Branch:** `feature/phase-4-real-providers`
 **Duration:** 10-14 days
 **Deliverables:**
 ```
 pkg/providers/
 ├── gitea.go           # Gitea API integration (primary)
 ├── github.go          # GitHub API integration
 ├── gitlab.go          # GitLab API integration
 └── provider_test.go   # API integration tests
 ```
 **Key Features:**
 - Replace MockTaskProvider with production implementations
 - Task claiming, status updates, and progress reporting via APIs
 - Automated PR/MR creation with proper branch management
 - Repository-specific configuration and credential management
 ### Testing Strategy
 **Unit Testing:**
 - Each provider/sandbox implementation has dedicated test suite
 - Mock external dependencies (APIs, Docker, etc.) for isolated testing
 - Property-based testing for core interfaces
 - Error condition and edge case coverage
 **Integration Testing:**
 - End-to-end task execution workflows
 - Multi-provider failover scenarios
 - Agent-to-WHOOSH communication validation
 - Concurrent task execution under load
 **Security Testing:**
 - Sandbox escape prevention validation
 - Resource exhaustion protection
 - Network isolation verification
 - Secrets and credential protection audits
 ### Deployment & Monitoring
 **Configuration Management:**
 - Environment-specific model configurations
 - Sandbox resource limits per environment
 - Provider API credentials via secure secret management
 - Feature flags for gradual rollout
 **Observability:**
 - Task execution metrics (completion rate, duration, success/failure)
 - Resource utilization tracking (CPU, memory, network per task)
 - Error rate monitoring with alerting thresholds
 - Performance dashboards for capacity planning
 ### Risk Mitigation
 **Technical Risks:**
 - **Provider Outages**: Multi-provider failover with health checks
 - **Resource Exhaustion**: Strict limits with monitoring and auto-scaling
 - **Execution Failures**: Retry mechanisms with exponential backoff
 **Security Risks:**
 - **Sandbox Escapes**: Multiple isolation layers and regular security audits
 - **Credential Exposure**: Secure rotation and least-privilege access
 - **Data Exfiltration**: Network isolation and egress monitoring
 **Integration Risks:**
 - **API Changes**: Provider abstraction with versioning support
 - **Performance Degradation**: Comprehensive benchmarking at each phase
 - **Compatibility Issues**: Extensive integration testing with existing systems