docs: Add comprehensive implementation roadmap to task execution engine plan

- Add detailed phase-by-phase implementation strategy - Define semantic versioning and Git workflow standards - Specify quality gates and testing requirements - Include risk mitigation and deployment strategies - Provide clear deliverables and timelines for each phase
2025-09-25 10:40:30 +10:00
parent 14b5125c12
commit 9fc9a2e3a2
1 changed files with 435 additions and 0 deletions
--- a/docs/development/task-execution-engine-plan.md
+++ b/docs/development/task-execution-engine-plan.md
@@ -0,0 +1,435 @@
+# CHORUS Task Execution Engine Development Plan
+
+## Overview
+This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations.
+
+## Current State Analysis
+
+### What's Implemented ✅
+- **Task Coordinator Framework** (`coordinator/task_coordinator.go`): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration
+- **Agent Role System**: Role announcements, capability broadcasting, and expertise matching
+- **P2P Infrastructure**: Nodes can discover each other and communicate via pubsub
+- **Health Monitoring**: Comprehensive health checks and graceful shutdown
+
+### Critical Gaps Identified ❌
+- **Task Execution Engine**: `executeTask()` only has a 10-second sleep simulation - no actual work performed
+- **Repository Integration**: Mock providers only - no real GitHub/GitLab task pulling
+- **Agent-to-Task Binding**: Task discovery relies on WHOOSH but agents don't connect to real work
+- **Role-Based Execution**: Agents announce roles but don't execute tasks according to their specialization
+- **AI Integration**: No LLM/reasoning integration for task completion
+
+## Architecture Requirements
+
+### Model and Provider Abstraction
+The execution engine must support multiple AI model providers and execution environments:
+
+**Model Provider Types:**
+- **Local Ollama**: Default for most roles (llama3.1:8b, codellama, etc.)
+- **OpenAI API**: For specialized models (chatgpt-5, gpt-4o, etc.)
+- **ResetData API**: For testing and fallback (llama3.1:8b via LaaS)
+- **Custom Endpoints**: Support for other provider APIs
+
+**Role-Model Mapping:**
+- Each role has a default model configuration
+- Specialized roles may require specific models/providers
+- Model selection transparent to execution logic
+- Support for MCP calls and tool usage regardless of provider
+
+### Execution Environment Abstraction
+Tasks must execute in secure, isolated environments while maintaining transparency:
+
+**Sandbox Types:**
+- **Docker Containers**: Isolated execution environment per task
+- **Specialized VMs**: For tasks requiring full OS isolation
+- **Process Sandboxing**: Lightweight isolation for simple tasks
+
+**Transparency Requirements:**
+- Model perceives it's working on a local repository
+- Development tools available within sandbox
+- File system operations work normally from model's perspective
+- Network access controlled but transparent
+- Resource limits enforced but invisible
+
+## Development Plan
+
+### Phase 1: Model Provider Abstraction Layer
+
+#### 1.1 Create Provider Interface
+```go
+// pkg/ai/provider.go
+type ModelProvider interface {
+    ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error)
+    SupportsMCP() bool
+    SupportsTools() bool
+    GetCapabilities() []string
+}
+```
+
+#### 1.2 Implement Provider Types
+- **OllamaProvider**: Local model execution
+- **OpenAIProvider**: OpenAI API integration
+- **ResetDataProvider**: ResetData LaaS integration
+- **ProviderFactory**: Creates appropriate provider based on model config
+
+#### 1.3 Role-Model Configuration
+```yaml
+# Config structure for role-model mapping
+roles:
+  developer:
+    default_model: "codellama:13b"
+    provider: "ollama"
+    fallback_model: "llama3.1:8b"
+    fallback_provider: "resetdata"
+
+  architect:
+    default_model: "gpt-4o"
+    provider: "openai"
+    fallback_model: "llama3.1:8b"
+    fallback_provider: "ollama"
+```
+
+### Phase 2: Execution Environment Abstraction
+
+#### 2.1 Create Sandbox Interface
+```go
+// pkg/execution/sandbox.go
+type ExecutionSandbox interface {
+    Initialize(ctx context.Context, config *SandboxConfig) error
+    ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error)
+    CopyFiles(ctx context.Context, source, dest string) error
+    Cleanup() error
+}
+```
+
+#### 2.2 Implement Sandbox Types
+- **DockerSandbox**: Container-based isolation
+- **VMSandbox**: Full VM isolation for sensitive tasks
+- **ProcessSandbox**: Lightweight process-based isolation
+
+#### 2.3 Repository Mounting
+- Clone repository into sandbox environment
+- Mount as local filesystem from model's perspective
+- Implement secure file I/O operations
+- Handle git operations within sandbox
+
+### Phase 3: Core Task Execution Engine
+
+#### 3.1 Replace Mock Implementation
+Replace the current simulation in `coordinator/task_coordinator.go:314`:
+
+```go
+// Current mock implementation
+time.Sleep(10 * time.Second) // Simulate work
+
+// New implementation
+result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{
+    Task: activeTask.Task,
+    Agent: tc.agentInfo,
+    Sandbox: sandboxConfig,
+    ModelProvider: providerConfig,
+})
+```
+
+#### 3.2 Task Execution Strategies
+Create role-specific execution patterns:
+
+- **DeveloperStrategy**: Code implementation, bug fixes, feature development
+- **ReviewerStrategy**: Code review, quality analysis, test coverage assessment
+- **ArchitectStrategy**: System design, technical decision making
+- **TesterStrategy**: Test creation, validation, quality assurance
+
+#### 3.3 Execution Workflow
+1. **Task Analysis**: Parse task requirements and complexity
+2. **Environment Setup**: Initialize appropriate sandbox
+3. **Repository Preparation**: Clone and mount repository
+4. **Model Selection**: Choose appropriate model/provider
+5. **Task Execution**: Run role-specific execution strategy
+6. **Result Validation**: Verify output quality and completeness
+7. **Cleanup**: Teardown sandbox and collect artifacts
+
+### Phase 4: Repository Provider Implementation
+
+#### 4.1 Real Repository Integration
+Replace `MockTaskProvider` with actual implementations:
+- **GiteaProvider**: Integration with GITEA API
+- **GitHubProvider**: GitHub API integration
+- **GitLabProvider**: GitLab API integration
+
+#### 4.2 Task Lifecycle Management
+- Task claiming and status updates
+- Progress reporting back to repositories
+- Artifact attachment (patches, documentation, etc.)
+- Automated PR/MR creation for completed tasks
+
+### Phase 5: AI Integration and Tool Support
+
+#### 5.1 LLM Integration
+- Context-aware task analysis based on repository content
+- Code generation and problem-solving capabilities
+- Natural language processing for task descriptions
+- Multi-step reasoning for complex tasks
+
+#### 5.2 Tool Integration
+- MCP server connectivity within sandbox
+- Development tool access (compilers, linters, formatters)
+- Testing framework integration
+- Documentation generation tools
+
+#### 5.3 Quality Assurance
+- Automated testing of generated code
+- Code quality metrics and analysis
+- Security vulnerability scanning
+- Performance impact assessment
+
+### Phase 6: Testing and Validation
+
+#### 6.1 Unit Testing
+- Provider abstraction layer testing
+- Sandbox isolation verification
+- Task execution strategy validation
+- Error handling and recovery testing
+
+#### 6.2 Integration Testing
+- End-to-end task execution workflows
+- Agent-to-WHOOSH communication testing
+- Multi-provider failover scenarios
+- Concurrent task execution testing
+
+#### 6.3 Security Testing
+- Sandbox escape prevention
+- Resource limit enforcement
+- Network isolation validation
+- Secrets and credential protection
+
+### Phase 7: Production Deployment
+
+#### 7.1 Configuration Management
+- Environment-specific model configurations
+- Sandbox resource limit definitions
+- Provider API key management
+- Monitoring and logging setup
+
+#### 7.2 Monitoring and Observability
+- Task execution metrics and dashboards
+- Performance monitoring and alerting
+- Resource utilization tracking
+- Error rate and success metrics
+
+## Implementation Priorities
+
+### Critical Path (Week 1-2)
+1. Model Provider Abstraction Layer
+2. Basic Docker Sandbox Implementation
+3. Replace Mock Task Execution
+4. Role-Based Execution Strategies
+
+### High Priority (Week 3-4)
+5. Real Repository Provider Implementation
+6. AI Integration with Ollama/OpenAI
+7. MCP Tool Integration
+8. Basic Testing Framework
+
+### Medium Priority (Week 5-6)
+9. Advanced Sandbox Types (VM, Process)
+10. Quality Assurance Pipeline
+11. Comprehensive Testing Suite
+12. Performance Optimization
+
+### Future Enhancements
+- Multi-language model support
+- Advanced reasoning capabilities
+- Distributed task execution
+- Machine learning model fine-tuning
+
+## Success Metrics
+
+- **Task Completion Rate**: >90% of assigned tasks successfully completed
+- **Code Quality**: Generated code passes all existing tests and linting
+- **Security**: Zero sandbox escapes or security violations
+- **Performance**: Task execution time within acceptable bounds
+- **Reliability**: <5% execution failure rate due to engine issues
+
+## Risk Mitigation
+
+### Security Risks
+- Sandbox escape → Multiple isolation layers, security audits
+- Credential exposure → Secure credential management, rotation
+- Resource exhaustion → Resource limits, monitoring, auto-scaling
+
+### Technical Risks
+- Model provider outages → Multi-provider failover, local fallbacks
+- Execution failures → Robust error handling, retry mechanisms
+- Performance bottlenecks → Profiling, optimization, horizontal scaling
+
+### Integration Risks
+- WHOOSH compatibility → Extensive integration testing, versioning
+- Repository provider changes → Provider abstraction, API versioning
+- Model compatibility → Provider abstraction, capability detection
+
+This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment.
+
+## Implementation Roadmap
+
+### Development Standards & Workflow
+
+**Semantic Versioning Strategy:**
+- **Patch (0.N.X)**: Bug fixes, small improvements, documentation updates
+- **Minor (0.N.0)**: New features, phase completions, non-breaking changes
+- **Major (N.0.0)**: Breaking changes, major architectural shifts
+
+**Git Workflow:**
+1. **Branch Creation**: `git checkout -b feature/phase-N-description`
+2. **Development**: Implement with frequent commits using conventional commit format
+3. **Testing**: Run full test suite with `make test` before PR
+4. **Code Review**: Create PR with detailed description and test results
+5. **Integration**: Squash merge to main after approval
+6. **Release**: Tag with `git tag v0.N.0` and update Makefile version
+
+**Quality Gates:**
+Each phase must meet these criteria before merge:
+- ✅ Unit tests with >80% coverage
+- ✅ Integration tests for external dependencies
+- ✅ Security review for new attack surfaces
+- ✅ Performance benchmarks within acceptable bounds
+- ✅ Documentation updates (code comments + README)
+- ✅ Backward compatibility verification
+
+### Phase-by-Phase Implementation
+
+#### Phase 1: Model Provider Abstraction (v0.2.0)
+**Branch:** `feature/phase-1-model-providers`
+**Duration:** 3-5 days
+**Deliverables:**
+```
+pkg/ai/
+├── provider.go        # Core provider interface & request/response types
+├── ollama.go          # Local Ollama model integration
+├── openai.go          # OpenAI API client wrapper
+├── resetdata.go       # ResetData LaaS integration
+├── factory.go         # Provider factory with auto-selection
+└── provider_test.go   # Comprehensive provider tests
+
+configs/
+└── models.yaml        # Role-model mapping configuration
+```
+
+**Key Features:**
+- Abstract AI providers behind unified interface
+- Support multiple providers with automatic failover
+- Configuration-driven model selection per agent role
+- Proper error handling and retry logic
+
+#### Phase 2: Execution Environment Abstraction (v0.3.0)
+**Branch:** `feature/phase-2-execution-sandbox`
+**Duration:** 5-7 days
+**Deliverables:**
+```
+pkg/execution/
+├── sandbox.go         # Core sandbox interface & types
+├── docker.go          # Docker container implementation
+├── security.go        # Security policies & enforcement
+├── resources.go       # Resource monitoring & limits
+└── sandbox_test.go    # Sandbox security & isolation tests
+```
+
+**Key Features:**
+- Docker-based task isolation with transparent repository access
+- Resource limits (CPU, memory, network, disk) with monitoring
+- Security boundary enforcement and escape prevention
+- Clean teardown and artifact collection
+
+#### Phase 3: Core Task Execution Engine (v0.4.0)
+**Branch:** `feature/phase-3-task-execution`
+**Duration:** 7-10 days
+**Modified Files:**
+- `coordinator/task_coordinator.go:314` - Replace mock with real execution
+- `pkg/repository/types.go` - Extend interfaces for execution context
+
+**New Files:**
+```
+pkg/strategies/
+├── developer.go       # Code implementation & bug fixes
+├── reviewer.go        # Code review & quality analysis
+├── architect.go       # System design & tech decisions
+└── tester.go          # Test creation & validation
+
+pkg/engine/
+├── executor.go        # Main execution orchestrator
+├── workflow.go        # 7-step execution workflow
+└── validation.go      # Result quality verification
+```
+
+**Key Features:**
+- Real task execution replacing 10-second sleep simulation
+- Role-specific execution strategies with appropriate tooling
+- Integration between AI providers, sandboxes, and task lifecycle
+- Comprehensive result validation and quality metrics
+
+#### Phase 4: Repository Provider Implementation (v0.5.0)
+**Branch:** `feature/phase-4-real-providers`
+**Duration:** 10-14 days
+**Deliverables:**
+```
+pkg/providers/
+├── gitea.go           # Gitea API integration (primary)
+├── github.go          # GitHub API integration
+├── gitlab.go          # GitLab API integration
+└── provider_test.go   # API integration tests
+```
+
+**Key Features:**
+- Replace MockTaskProvider with production implementations
+- Task claiming, status updates, and progress reporting via APIs
+- Automated PR/MR creation with proper branch management
+- Repository-specific configuration and credential management
+
+### Testing Strategy
+
+**Unit Testing:**
+- Each provider/sandbox implementation has dedicated test suite
+- Mock external dependencies (APIs, Docker, etc.) for isolated testing
+- Property-based testing for core interfaces
+- Error condition and edge case coverage
+
+**Integration Testing:**
+- End-to-end task execution workflows
+- Multi-provider failover scenarios
+- Agent-to-WHOOSH communication validation
+- Concurrent task execution under load
+
+**Security Testing:**
+- Sandbox escape prevention validation
+- Resource exhaustion protection
+- Network isolation verification
+- Secrets and credential protection audits
+
+### Deployment & Monitoring
+
+**Configuration Management:**
+- Environment-specific model configurations
+- Sandbox resource limits per environment
+- Provider API credentials via secure secret management
+- Feature flags for gradual rollout
+
+**Observability:**
+- Task execution metrics (completion rate, duration, success/failure)
+- Resource utilization tracking (CPU, memory, network per task)
+- Error rate monitoring with alerting thresholds
+- Performance dashboards for capacity planning
+
+### Risk Mitigation
+
+**Technical Risks:**
+- **Provider Outages**: Multi-provider failover with health checks
+- **Resource Exhaustion**: Strict limits with monitoring and auto-scaling
+- **Execution Failures**: Retry mechanisms with exponential backoff
+
+**Security Risks:**
+- **Sandbox Escapes**: Multiple isolation layers and regular security audits
+- **Credential Exposure**: Secure rotation and least-privilege access
+- **Data Exfiltration**: Network isolation and egress monitoring
+
+**Integration Risks:**
+- **API Changes**: Provider abstraction with versioning support
+- **Performance Degradation**: Comprehensive benchmarking at each phase
+- **Compatibility Issues**: Extensive integration testing with existing systems