# CHORUS Task Execution Engine Development Plan ## Overview This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations. ## Current State Analysis ### What's Implemented ✅ - **Task Coordinator Framework** (`coordinator/task_coordinator.go`): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration - **Agent Role System**: Role announcements, capability broadcasting, and expertise matching - **P2P Infrastructure**: Nodes can discover each other and communicate via pubsub - **Health Monitoring**: Comprehensive health checks and graceful shutdown ### Critical Gaps Identified ❌ - **Task Execution Engine**: `executeTask()` only has a 10-second sleep simulation - no actual work performed - **Repository Integration**: Mock providers only - no real GitHub/GitLab task pulling - **Agent-to-Task Binding**: Task discovery relies on WHOOSH but agents don't connect to real work - **Role-Based Execution**: Agents announce roles but don't execute tasks according to their specialization - **AI Integration**: No LLM/reasoning integration for task completion ## Architecture Requirements ### Model and Provider Abstraction The execution engine must support multiple AI model providers and execution environments: **Model Provider Types:** - **Local Ollama**: Default for most roles (llama3.1:8b, codellama, etc.) - **OpenAI API**: For specialized models (chatgpt-5, gpt-4o, etc.) - **ResetData API**: For testing and fallback (llama3.1:8b via LaaS) - **Custom Endpoints**: Support for other provider APIs **Role-Model Mapping:** - Each role has a default model configuration - Specialized roles may require specific models/providers - Model selection transparent to execution logic - Support for MCP calls and tool usage regardless of provider ### Execution Environment Abstraction Tasks must execute in secure, isolated environments while maintaining transparency: **Sandbox Types:** - **Docker Containers**: Isolated execution environment per task - **Specialized VMs**: For tasks requiring full OS isolation - **Process Sandboxing**: Lightweight isolation for simple tasks **Transparency Requirements:** - Model perceives it's working on a local repository - Development tools available within sandbox - File system operations work normally from model's perspective - Network access controlled but transparent - Resource limits enforced but invisible ## Development Plan ### Phase 1: Model Provider Abstraction Layer #### 1.1 Create Provider Interface ```go // pkg/ai/provider.go type ModelProvider interface { ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error) SupportsMCP() bool SupportsTools() bool GetCapabilities() []string } ``` #### 1.2 Implement Provider Types - **OllamaProvider**: Local model execution - **OpenAIProvider**: OpenAI API integration - **ResetDataProvider**: ResetData LaaS integration - **ProviderFactory**: Creates appropriate provider based on model config #### 1.3 Role-Model Configuration ```yaml # Config structure for role-model mapping roles: developer: default_model: "codellama:13b" provider: "ollama" fallback_model: "llama3.1:8b" fallback_provider: "resetdata" architect: default_model: "gpt-4o" provider: "openai" fallback_model: "llama3.1:8b" fallback_provider: "ollama" ``` ### Phase 2: Execution Environment Abstraction #### 2.1 Create Sandbox Interface ```go // pkg/execution/sandbox.go type ExecutionSandbox interface { Initialize(ctx context.Context, config *SandboxConfig) error ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error) CopyFiles(ctx context.Context, source, dest string) error Cleanup() error } ``` #### 2.2 Implement Sandbox Types - **DockerSandbox**: Container-based isolation - **VMSandbox**: Full VM isolation for sensitive tasks - **ProcessSandbox**: Lightweight process-based isolation #### 2.3 Repository Mounting - Clone repository into sandbox environment - Mount as local filesystem from model's perspective - Implement secure file I/O operations - Handle git operations within sandbox ### Phase 3: Core Task Execution Engine #### 3.1 Replace Mock Implementation Replace the current simulation in `coordinator/task_coordinator.go:314`: ```go // Current mock implementation time.Sleep(10 * time.Second) // Simulate work // New implementation result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{ Task: activeTask.Task, Agent: tc.agentInfo, Sandbox: sandboxConfig, ModelProvider: providerConfig, }) ``` #### 3.2 Task Execution Strategies Create role-specific execution patterns: - **DeveloperStrategy**: Code implementation, bug fixes, feature development - **ReviewerStrategy**: Code review, quality analysis, test coverage assessment - **ArchitectStrategy**: System design, technical decision making - **TesterStrategy**: Test creation, validation, quality assurance #### 3.3 Execution Workflow 1. **Task Analysis**: Parse task requirements and complexity 2. **Environment Setup**: Initialize appropriate sandbox 3. **Repository Preparation**: Clone and mount repository 4. **Model Selection**: Choose appropriate model/provider 5. **Task Execution**: Run role-specific execution strategy 6. **Result Validation**: Verify output quality and completeness 7. **Cleanup**: Teardown sandbox and collect artifacts ### Phase 4: Repository Provider Implementation #### 4.1 Real Repository Integration Replace `MockTaskProvider` with actual implementations: - **GiteaProvider**: Integration with GITEA API - **GitHubProvider**: GitHub API integration - **GitLabProvider**: GitLab API integration #### 4.2 Task Lifecycle Management - Task claiming and status updates - Progress reporting back to repositories - Artifact attachment (patches, documentation, etc.) - Automated PR/MR creation for completed tasks ### Phase 5: AI Integration and Tool Support #### 5.1 LLM Integration - Context-aware task analysis based on repository content - Code generation and problem-solving capabilities - Natural language processing for task descriptions - Multi-step reasoning for complex tasks #### 5.2 Tool Integration - MCP server connectivity within sandbox - Development tool access (compilers, linters, formatters) - Testing framework integration - Documentation generation tools #### 5.3 Quality Assurance - Automated testing of generated code - Code quality metrics and analysis - Security vulnerability scanning - Performance impact assessment ### Phase 6: Testing and Validation #### 6.1 Unit Testing - Provider abstraction layer testing - Sandbox isolation verification - Task execution strategy validation - Error handling and recovery testing #### 6.2 Integration Testing - End-to-end task execution workflows - Agent-to-WHOOSH communication testing - Multi-provider failover scenarios - Concurrent task execution testing #### 6.3 Security Testing - Sandbox escape prevention - Resource limit enforcement - Network isolation validation - Secrets and credential protection ### Phase 7: Production Deployment #### 7.1 Configuration Management - Environment-specific model configurations - Sandbox resource limit definitions - Provider API key management - Monitoring and logging setup #### 7.2 Monitoring and Observability - Task execution metrics and dashboards - Performance monitoring and alerting - Resource utilization tracking - Error rate and success metrics ## Implementation Priorities ### Critical Path (Week 1-2) 1. Model Provider Abstraction Layer 2. Basic Docker Sandbox Implementation 3. Replace Mock Task Execution 4. Role-Based Execution Strategies ### High Priority (Week 3-4) 5. Real Repository Provider Implementation 6. AI Integration with Ollama/OpenAI 7. MCP Tool Integration 8. Basic Testing Framework ### Medium Priority (Week 5-6) 9. Advanced Sandbox Types (VM, Process) 10. Quality Assurance Pipeline 11. Comprehensive Testing Suite 12. Performance Optimization ### Future Enhancements - Multi-language model support - Advanced reasoning capabilities - Distributed task execution - Machine learning model fine-tuning ## Success Metrics - **Task Completion Rate**: >90% of assigned tasks successfully completed - **Code Quality**: Generated code passes all existing tests and linting - **Security**: Zero sandbox escapes or security violations - **Performance**: Task execution time within acceptable bounds - **Reliability**: <5% execution failure rate due to engine issues ## Risk Mitigation ### Security Risks - Sandbox escape → Multiple isolation layers, security audits - Credential exposure → Secure credential management, rotation - Resource exhaustion → Resource limits, monitoring, auto-scaling ### Technical Risks - Model provider outages → Multi-provider failover, local fallbacks - Execution failures → Robust error handling, retry mechanisms - Performance bottlenecks → Profiling, optimization, horizontal scaling ### Integration Risks - WHOOSH compatibility → Extensive integration testing, versioning - Repository provider changes → Provider abstraction, API versioning - Model compatibility → Provider abstraction, capability detection This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment. ## Implementation Roadmap ### Development Standards & Workflow **Semantic Versioning Strategy:** - **Patch (0.N.X)**: Bug fixes, small improvements, documentation updates - **Minor (0.N.0)**: New features, phase completions, non-breaking changes - **Major (N.0.0)**: Breaking changes, major architectural shifts **Git Workflow:** 1. **Branch Creation**: `git checkout -b feature/phase-N-description` 2. **Development**: Implement with frequent commits using conventional commit format 3. **Testing**: Run full test suite with `make test` before PR 4. **Code Review**: Create PR with detailed description and test results 5. **Integration**: Squash merge to main after approval 6. **Release**: Tag with `git tag v0.N.0` and update Makefile version **Quality Gates:** Each phase must meet these criteria before merge: - ✅ Unit tests with >80% coverage - ✅ Integration tests for external dependencies - ✅ Security review for new attack surfaces - ✅ Performance benchmarks within acceptable bounds - ✅ Documentation updates (code comments + README) - ✅ Backward compatibility verification ### Phase-by-Phase Implementation #### Phase 1: Model Provider Abstraction (v0.2.0) **Branch:** `feature/phase-1-model-providers` **Duration:** 3-5 days **Deliverables:** ``` pkg/ai/ ├── provider.go # Core provider interface & request/response types ├── ollama.go # Local Ollama model integration ├── openai.go # OpenAI API client wrapper ├── resetdata.go # ResetData LaaS integration ├── factory.go # Provider factory with auto-selection └── provider_test.go # Comprehensive provider tests configs/ └── models.yaml # Role-model mapping configuration ``` **Key Features:** - Abstract AI providers behind unified interface - Support multiple providers with automatic failover - Configuration-driven model selection per agent role - Proper error handling and retry logic #### Phase 2: Execution Environment Abstraction (v0.3.0) **Branch:** `feature/phase-2-execution-sandbox` **Duration:** 5-7 days **Deliverables:** ``` pkg/execution/ ├── sandbox.go # Core sandbox interface & types ├── docker.go # Docker container implementation ├── security.go # Security policies & enforcement ├── resources.go # Resource monitoring & limits └── sandbox_test.go # Sandbox security & isolation tests ``` **Key Features:** - Docker-based task isolation with transparent repository access - Resource limits (CPU, memory, network, disk) with monitoring - Security boundary enforcement and escape prevention - Clean teardown and artifact collection #### Phase 3: Core Task Execution Engine (v0.4.0) **Branch:** `feature/phase-3-task-execution` **Duration:** 7-10 days **Modified Files:** - `coordinator/task_coordinator.go:314` - Replace mock with real execution - `pkg/repository/types.go` - Extend interfaces for execution context **New Files:** ``` pkg/strategies/ ├── developer.go # Code implementation & bug fixes ├── reviewer.go # Code review & quality analysis ├── architect.go # System design & tech decisions └── tester.go # Test creation & validation pkg/engine/ ├── executor.go # Main execution orchestrator ├── workflow.go # 7-step execution workflow └── validation.go # Result quality verification ``` **Key Features:** - Real task execution replacing 10-second sleep simulation - Role-specific execution strategies with appropriate tooling - Integration between AI providers, sandboxes, and task lifecycle - Comprehensive result validation and quality metrics #### Phase 4: Repository Provider Implementation (v0.5.0) **Branch:** `feature/phase-4-real-providers` **Duration:** 10-14 days **Deliverables:** ``` pkg/providers/ ├── gitea.go # Gitea API integration (primary) ├── github.go # GitHub API integration ├── gitlab.go # GitLab API integration └── provider_test.go # API integration tests ``` **Key Features:** - Replace MockTaskProvider with production implementations - Task claiming, status updates, and progress reporting via APIs - Automated PR/MR creation with proper branch management - Repository-specific configuration and credential management ### Testing Strategy **Unit Testing:** - Each provider/sandbox implementation has dedicated test suite - Mock external dependencies (APIs, Docker, etc.) for isolated testing - Property-based testing for core interfaces - Error condition and edge case coverage **Integration Testing:** - End-to-end task execution workflows - Multi-provider failover scenarios - Agent-to-WHOOSH communication validation - Concurrent task execution under load **Security Testing:** - Sandbox escape prevention validation - Resource exhaustion protection - Network isolation verification - Secrets and credential protection audits ### Deployment & Monitoring **Configuration Management:** - Environment-specific model configurations - Sandbox resource limits per environment - Provider API credentials via secure secret management - Feature flags for gradual rollout **Observability:** - Task execution metrics (completion rate, duration, success/failure) - Resource utilization tracking (CPU, memory, network per task) - Error rate monitoring with alerting thresholds - Performance dashboards for capacity planning ### Risk Mitigation **Technical Risks:** - **Provider Outages**: Multi-provider failover with health checks - **Resource Exhaustion**: Strict limits with monitoring and auto-scaling - **Execution Failures**: Retry mechanisms with exponential backoff **Security Risks:** - **Sandbox Escapes**: Multiple isolation layers and regular security audits - **Credential Exposure**: Secure rotation and least-privilege access - **Data Exfiltration**: Network isolation and egress monitoring **Integration Risks:** - **API Changes**: Provider abstraction with versioning support - **Performance Degradation**: Comprehensive benchmarking at each phase - **Compatibility Issues**: Extensive integration testing with existing systems