diff --git a/docs/development/task-execution-engine-plan.md b/docs/development/task-execution-engine-plan.md new file mode 100644 index 0000000..fe0bf74 --- /dev/null +++ b/docs/development/task-execution-engine-plan.md @@ -0,0 +1,435 @@ +# CHORUS Task Execution Engine Development Plan + +## Overview +This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations. + +## Current State Analysis + +### What's Implemented ✅ +- **Task Coordinator Framework** (`coordinator/task_coordinator.go`): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration +- **Agent Role System**: Role announcements, capability broadcasting, and expertise matching +- **P2P Infrastructure**: Nodes can discover each other and communicate via pubsub +- **Health Monitoring**: Comprehensive health checks and graceful shutdown + +### Critical Gaps Identified ❌ +- **Task Execution Engine**: `executeTask()` only has a 10-second sleep simulation - no actual work performed +- **Repository Integration**: Mock providers only - no real GitHub/GitLab task pulling +- **Agent-to-Task Binding**: Task discovery relies on WHOOSH but agents don't connect to real work +- **Role-Based Execution**: Agents announce roles but don't execute tasks according to their specialization +- **AI Integration**: No LLM/reasoning integration for task completion + +## Architecture Requirements + +### Model and Provider Abstraction +The execution engine must support multiple AI model providers and execution environments: + +**Model Provider Types:** +- **Local Ollama**: Default for most roles (llama3.1:8b, codellama, etc.) +- **OpenAI API**: For specialized models (chatgpt-5, gpt-4o, etc.) +- **ResetData API**: For testing and fallback (llama3.1:8b via LaaS) +- **Custom Endpoints**: Support for other provider APIs + +**Role-Model Mapping:** +- Each role has a default model configuration +- Specialized roles may require specific models/providers +- Model selection transparent to execution logic +- Support for MCP calls and tool usage regardless of provider + +### Execution Environment Abstraction +Tasks must execute in secure, isolated environments while maintaining transparency: + +**Sandbox Types:** +- **Docker Containers**: Isolated execution environment per task +- **Specialized VMs**: For tasks requiring full OS isolation +- **Process Sandboxing**: Lightweight isolation for simple tasks + +**Transparency Requirements:** +- Model perceives it's working on a local repository +- Development tools available within sandbox +- File system operations work normally from model's perspective +- Network access controlled but transparent +- Resource limits enforced but invisible + +## Development Plan + +### Phase 1: Model Provider Abstraction Layer + +#### 1.1 Create Provider Interface +```go +// pkg/ai/provider.go +type ModelProvider interface { + ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error) + SupportsMCP() bool + SupportsTools() bool + GetCapabilities() []string +} +``` + +#### 1.2 Implement Provider Types +- **OllamaProvider**: Local model execution +- **OpenAIProvider**: OpenAI API integration +- **ResetDataProvider**: ResetData LaaS integration +- **ProviderFactory**: Creates appropriate provider based on model config + +#### 1.3 Role-Model Configuration +```yaml +# Config structure for role-model mapping +roles: + developer: + default_model: "codellama:13b" + provider: "ollama" + fallback_model: "llama3.1:8b" + fallback_provider: "resetdata" + + architect: + default_model: "gpt-4o" + provider: "openai" + fallback_model: "llama3.1:8b" + fallback_provider: "ollama" +``` + +### Phase 2: Execution Environment Abstraction + +#### 2.1 Create Sandbox Interface +```go +// pkg/execution/sandbox.go +type ExecutionSandbox interface { + Initialize(ctx context.Context, config *SandboxConfig) error + ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error) + CopyFiles(ctx context.Context, source, dest string) error + Cleanup() error +} +``` + +#### 2.2 Implement Sandbox Types +- **DockerSandbox**: Container-based isolation +- **VMSandbox**: Full VM isolation for sensitive tasks +- **ProcessSandbox**: Lightweight process-based isolation + +#### 2.3 Repository Mounting +- Clone repository into sandbox environment +- Mount as local filesystem from model's perspective +- Implement secure file I/O operations +- Handle git operations within sandbox + +### Phase 3: Core Task Execution Engine + +#### 3.1 Replace Mock Implementation +Replace the current simulation in `coordinator/task_coordinator.go:314`: + +```go +// Current mock implementation +time.Sleep(10 * time.Second) // Simulate work + +// New implementation +result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{ + Task: activeTask.Task, + Agent: tc.agentInfo, + Sandbox: sandboxConfig, + ModelProvider: providerConfig, +}) +``` + +#### 3.2 Task Execution Strategies +Create role-specific execution patterns: + +- **DeveloperStrategy**: Code implementation, bug fixes, feature development +- **ReviewerStrategy**: Code review, quality analysis, test coverage assessment +- **ArchitectStrategy**: System design, technical decision making +- **TesterStrategy**: Test creation, validation, quality assurance + +#### 3.3 Execution Workflow +1. **Task Analysis**: Parse task requirements and complexity +2. **Environment Setup**: Initialize appropriate sandbox +3. **Repository Preparation**: Clone and mount repository +4. **Model Selection**: Choose appropriate model/provider +5. **Task Execution**: Run role-specific execution strategy +6. **Result Validation**: Verify output quality and completeness +7. **Cleanup**: Teardown sandbox and collect artifacts + +### Phase 4: Repository Provider Implementation + +#### 4.1 Real Repository Integration +Replace `MockTaskProvider` with actual implementations: +- **GiteaProvider**: Integration with GITEA API +- **GitHubProvider**: GitHub API integration +- **GitLabProvider**: GitLab API integration + +#### 4.2 Task Lifecycle Management +- Task claiming and status updates +- Progress reporting back to repositories +- Artifact attachment (patches, documentation, etc.) +- Automated PR/MR creation for completed tasks + +### Phase 5: AI Integration and Tool Support + +#### 5.1 LLM Integration +- Context-aware task analysis based on repository content +- Code generation and problem-solving capabilities +- Natural language processing for task descriptions +- Multi-step reasoning for complex tasks + +#### 5.2 Tool Integration +- MCP server connectivity within sandbox +- Development tool access (compilers, linters, formatters) +- Testing framework integration +- Documentation generation tools + +#### 5.3 Quality Assurance +- Automated testing of generated code +- Code quality metrics and analysis +- Security vulnerability scanning +- Performance impact assessment + +### Phase 6: Testing and Validation + +#### 6.1 Unit Testing +- Provider abstraction layer testing +- Sandbox isolation verification +- Task execution strategy validation +- Error handling and recovery testing + +#### 6.2 Integration Testing +- End-to-end task execution workflows +- Agent-to-WHOOSH communication testing +- Multi-provider failover scenarios +- Concurrent task execution testing + +#### 6.3 Security Testing +- Sandbox escape prevention +- Resource limit enforcement +- Network isolation validation +- Secrets and credential protection + +### Phase 7: Production Deployment + +#### 7.1 Configuration Management +- Environment-specific model configurations +- Sandbox resource limit definitions +- Provider API key management +- Monitoring and logging setup + +#### 7.2 Monitoring and Observability +- Task execution metrics and dashboards +- Performance monitoring and alerting +- Resource utilization tracking +- Error rate and success metrics + +## Implementation Priorities + +### Critical Path (Week 1-2) +1. Model Provider Abstraction Layer +2. Basic Docker Sandbox Implementation +3. Replace Mock Task Execution +4. Role-Based Execution Strategies + +### High Priority (Week 3-4) +5. Real Repository Provider Implementation +6. AI Integration with Ollama/OpenAI +7. MCP Tool Integration +8. Basic Testing Framework + +### Medium Priority (Week 5-6) +9. Advanced Sandbox Types (VM, Process) +10. Quality Assurance Pipeline +11. Comprehensive Testing Suite +12. Performance Optimization + +### Future Enhancements +- Multi-language model support +- Advanced reasoning capabilities +- Distributed task execution +- Machine learning model fine-tuning + +## Success Metrics + +- **Task Completion Rate**: >90% of assigned tasks successfully completed +- **Code Quality**: Generated code passes all existing tests and linting +- **Security**: Zero sandbox escapes or security violations +- **Performance**: Task execution time within acceptable bounds +- **Reliability**: <5% execution failure rate due to engine issues + +## Risk Mitigation + +### Security Risks +- Sandbox escape → Multiple isolation layers, security audits +- Credential exposure → Secure credential management, rotation +- Resource exhaustion → Resource limits, monitoring, auto-scaling + +### Technical Risks +- Model provider outages → Multi-provider failover, local fallbacks +- Execution failures → Robust error handling, retry mechanisms +- Performance bottlenecks → Profiling, optimization, horizontal scaling + +### Integration Risks +- WHOOSH compatibility → Extensive integration testing, versioning +- Repository provider changes → Provider abstraction, API versioning +- Model compatibility → Provider abstraction, capability detection + +This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment. + +## Implementation Roadmap + +### Development Standards & Workflow + +**Semantic Versioning Strategy:** +- **Patch (0.N.X)**: Bug fixes, small improvements, documentation updates +- **Minor (0.N.0)**: New features, phase completions, non-breaking changes +- **Major (N.0.0)**: Breaking changes, major architectural shifts + +**Git Workflow:** +1. **Branch Creation**: `git checkout -b feature/phase-N-description` +2. **Development**: Implement with frequent commits using conventional commit format +3. **Testing**: Run full test suite with `make test` before PR +4. **Code Review**: Create PR with detailed description and test results +5. **Integration**: Squash merge to main after approval +6. **Release**: Tag with `git tag v0.N.0` and update Makefile version + +**Quality Gates:** +Each phase must meet these criteria before merge: +- ✅ Unit tests with >80% coverage +- ✅ Integration tests for external dependencies +- ✅ Security review for new attack surfaces +- ✅ Performance benchmarks within acceptable bounds +- ✅ Documentation updates (code comments + README) +- ✅ Backward compatibility verification + +### Phase-by-Phase Implementation + +#### Phase 1: Model Provider Abstraction (v0.2.0) +**Branch:** `feature/phase-1-model-providers` +**Duration:** 3-5 days +**Deliverables:** +``` +pkg/ai/ +├── provider.go # Core provider interface & request/response types +├── ollama.go # Local Ollama model integration +├── openai.go # OpenAI API client wrapper +├── resetdata.go # ResetData LaaS integration +├── factory.go # Provider factory with auto-selection +└── provider_test.go # Comprehensive provider tests + +configs/ +└── models.yaml # Role-model mapping configuration +``` + +**Key Features:** +- Abstract AI providers behind unified interface +- Support multiple providers with automatic failover +- Configuration-driven model selection per agent role +- Proper error handling and retry logic + +#### Phase 2: Execution Environment Abstraction (v0.3.0) +**Branch:** `feature/phase-2-execution-sandbox` +**Duration:** 5-7 days +**Deliverables:** +``` +pkg/execution/ +├── sandbox.go # Core sandbox interface & types +├── docker.go # Docker container implementation +├── security.go # Security policies & enforcement +├── resources.go # Resource monitoring & limits +└── sandbox_test.go # Sandbox security & isolation tests +``` + +**Key Features:** +- Docker-based task isolation with transparent repository access +- Resource limits (CPU, memory, network, disk) with monitoring +- Security boundary enforcement and escape prevention +- Clean teardown and artifact collection + +#### Phase 3: Core Task Execution Engine (v0.4.0) +**Branch:** `feature/phase-3-task-execution` +**Duration:** 7-10 days +**Modified Files:** +- `coordinator/task_coordinator.go:314` - Replace mock with real execution +- `pkg/repository/types.go` - Extend interfaces for execution context + +**New Files:** +``` +pkg/strategies/ +├── developer.go # Code implementation & bug fixes +├── reviewer.go # Code review & quality analysis +├── architect.go # System design & tech decisions +└── tester.go # Test creation & validation + +pkg/engine/ +├── executor.go # Main execution orchestrator +├── workflow.go # 7-step execution workflow +└── validation.go # Result quality verification +``` + +**Key Features:** +- Real task execution replacing 10-second sleep simulation +- Role-specific execution strategies with appropriate tooling +- Integration between AI providers, sandboxes, and task lifecycle +- Comprehensive result validation and quality metrics + +#### Phase 4: Repository Provider Implementation (v0.5.0) +**Branch:** `feature/phase-4-real-providers` +**Duration:** 10-14 days +**Deliverables:** +``` +pkg/providers/ +├── gitea.go # Gitea API integration (primary) +├── github.go # GitHub API integration +├── gitlab.go # GitLab API integration +└── provider_test.go # API integration tests +``` + +**Key Features:** +- Replace MockTaskProvider with production implementations +- Task claiming, status updates, and progress reporting via APIs +- Automated PR/MR creation with proper branch management +- Repository-specific configuration and credential management + +### Testing Strategy + +**Unit Testing:** +- Each provider/sandbox implementation has dedicated test suite +- Mock external dependencies (APIs, Docker, etc.) for isolated testing +- Property-based testing for core interfaces +- Error condition and edge case coverage + +**Integration Testing:** +- End-to-end task execution workflows +- Multi-provider failover scenarios +- Agent-to-WHOOSH communication validation +- Concurrent task execution under load + +**Security Testing:** +- Sandbox escape prevention validation +- Resource exhaustion protection +- Network isolation verification +- Secrets and credential protection audits + +### Deployment & Monitoring + +**Configuration Management:** +- Environment-specific model configurations +- Sandbox resource limits per environment +- Provider API credentials via secure secret management +- Feature flags for gradual rollout + +**Observability:** +- Task execution metrics (completion rate, duration, success/failure) +- Resource utilization tracking (CPU, memory, network per task) +- Error rate monitoring with alerting thresholds +- Performance dashboards for capacity planning + +### Risk Mitigation + +**Technical Risks:** +- **Provider Outages**: Multi-provider failover with health checks +- **Resource Exhaustion**: Strict limits with monitoring and auto-scaling +- **Execution Failures**: Retry mechanisms with exponential backoff + +**Security Risks:** +- **Sandbox Escapes**: Multiple isolation layers and regular security audits +- **Credential Exposure**: Secure rotation and least-privilege access +- **Data Exfiltration**: Network isolation and egress monitoring + +**Integration Risks:** +- **API Changes**: Provider abstraction with versioning support +- **Performance Degradation**: Comprehensive benchmarking at each phase +- **Compatibility Issues**: Extensive integration testing with existing systems \ No newline at end of file