 9fc9a2e3a2
			
		
	
	9fc9a2e3a2
	
	
	
		
			
			- Add detailed phase-by-phase implementation strategy - Define semantic versioning and Git workflow standards - Specify quality gates and testing requirements - Include risk mitigation and deployment strategies - Provide clear deliverables and timelines for each phase
		
			
				
	
	
		
			435 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			435 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # CHORUS Task Execution Engine Development Plan
 | |
| 
 | |
| ## Overview
 | |
| This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations.
 | |
| 
 | |
| ## Current State Analysis
 | |
| 
 | |
| ### What's Implemented ✅
 | |
| - **Task Coordinator Framework** (`coordinator/task_coordinator.go`): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration
 | |
| - **Agent Role System**: Role announcements, capability broadcasting, and expertise matching
 | |
| - **P2P Infrastructure**: Nodes can discover each other and communicate via pubsub
 | |
| - **Health Monitoring**: Comprehensive health checks and graceful shutdown
 | |
| 
 | |
| ### Critical Gaps Identified ❌
 | |
| - **Task Execution Engine**: `executeTask()` only has a 10-second sleep simulation - no actual work performed
 | |
| - **Repository Integration**: Mock providers only - no real GitHub/GitLab task pulling
 | |
| - **Agent-to-Task Binding**: Task discovery relies on WHOOSH but agents don't connect to real work
 | |
| - **Role-Based Execution**: Agents announce roles but don't execute tasks according to their specialization
 | |
| - **AI Integration**: No LLM/reasoning integration for task completion
 | |
| 
 | |
| ## Architecture Requirements
 | |
| 
 | |
| ### Model and Provider Abstraction
 | |
| The execution engine must support multiple AI model providers and execution environments:
 | |
| 
 | |
| **Model Provider Types:**
 | |
| - **Local Ollama**: Default for most roles (llama3.1:8b, codellama, etc.)
 | |
| - **OpenAI API**: For specialized models (chatgpt-5, gpt-4o, etc.)
 | |
| - **ResetData API**: For testing and fallback (llama3.1:8b via LaaS)
 | |
| - **Custom Endpoints**: Support for other provider APIs
 | |
| 
 | |
| **Role-Model Mapping:**
 | |
| - Each role has a default model configuration
 | |
| - Specialized roles may require specific models/providers
 | |
| - Model selection transparent to execution logic
 | |
| - Support for MCP calls and tool usage regardless of provider
 | |
| 
 | |
| ### Execution Environment Abstraction
 | |
| Tasks must execute in secure, isolated environments while maintaining transparency:
 | |
| 
 | |
| **Sandbox Types:**
 | |
| - **Docker Containers**: Isolated execution environment per task
 | |
| - **Specialized VMs**: For tasks requiring full OS isolation
 | |
| - **Process Sandboxing**: Lightweight isolation for simple tasks
 | |
| 
 | |
| **Transparency Requirements:**
 | |
| - Model perceives it's working on a local repository
 | |
| - Development tools available within sandbox
 | |
| - File system operations work normally from model's perspective
 | |
| - Network access controlled but transparent
 | |
| - Resource limits enforced but invisible
 | |
| 
 | |
| ## Development Plan
 | |
| 
 | |
| ### Phase 1: Model Provider Abstraction Layer
 | |
| 
 | |
| #### 1.1 Create Provider Interface
 | |
| ```go
 | |
| // pkg/ai/provider.go
 | |
| type ModelProvider interface {
 | |
|     ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error)
 | |
|     SupportsMCP() bool
 | |
|     SupportsTools() bool
 | |
|     GetCapabilities() []string
 | |
| }
 | |
| ```
 | |
| 
 | |
| #### 1.2 Implement Provider Types
 | |
| - **OllamaProvider**: Local model execution
 | |
| - **OpenAIProvider**: OpenAI API integration
 | |
| - **ResetDataProvider**: ResetData LaaS integration
 | |
| - **ProviderFactory**: Creates appropriate provider based on model config
 | |
| 
 | |
| #### 1.3 Role-Model Configuration
 | |
| ```yaml
 | |
| # Config structure for role-model mapping
 | |
| roles:
 | |
|   developer:
 | |
|     default_model: "codellama:13b"
 | |
|     provider: "ollama"
 | |
|     fallback_model: "llama3.1:8b"
 | |
|     fallback_provider: "resetdata"
 | |
| 
 | |
|   architect:
 | |
|     default_model: "gpt-4o"
 | |
|     provider: "openai"
 | |
|     fallback_model: "llama3.1:8b"
 | |
|     fallback_provider: "ollama"
 | |
| ```
 | |
| 
 | |
| ### Phase 2: Execution Environment Abstraction
 | |
| 
 | |
| #### 2.1 Create Sandbox Interface
 | |
| ```go
 | |
| // pkg/execution/sandbox.go
 | |
| type ExecutionSandbox interface {
 | |
|     Initialize(ctx context.Context, config *SandboxConfig) error
 | |
|     ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error)
 | |
|     CopyFiles(ctx context.Context, source, dest string) error
 | |
|     Cleanup() error
 | |
| }
 | |
| ```
 | |
| 
 | |
| #### 2.2 Implement Sandbox Types
 | |
| - **DockerSandbox**: Container-based isolation
 | |
| - **VMSandbox**: Full VM isolation for sensitive tasks
 | |
| - **ProcessSandbox**: Lightweight process-based isolation
 | |
| 
 | |
| #### 2.3 Repository Mounting
 | |
| - Clone repository into sandbox environment
 | |
| - Mount as local filesystem from model's perspective
 | |
| - Implement secure file I/O operations
 | |
| - Handle git operations within sandbox
 | |
| 
 | |
| ### Phase 3: Core Task Execution Engine
 | |
| 
 | |
| #### 3.1 Replace Mock Implementation
 | |
| Replace the current simulation in `coordinator/task_coordinator.go:314`:
 | |
| 
 | |
| ```go
 | |
| // Current mock implementation
 | |
| time.Sleep(10 * time.Second) // Simulate work
 | |
| 
 | |
| // New implementation
 | |
| result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{
 | |
|     Task: activeTask.Task,
 | |
|     Agent: tc.agentInfo,
 | |
|     Sandbox: sandboxConfig,
 | |
|     ModelProvider: providerConfig,
 | |
| })
 | |
| ```
 | |
| 
 | |
| #### 3.2 Task Execution Strategies
 | |
| Create role-specific execution patterns:
 | |
| 
 | |
| - **DeveloperStrategy**: Code implementation, bug fixes, feature development
 | |
| - **ReviewerStrategy**: Code review, quality analysis, test coverage assessment
 | |
| - **ArchitectStrategy**: System design, technical decision making
 | |
| - **TesterStrategy**: Test creation, validation, quality assurance
 | |
| 
 | |
| #### 3.3 Execution Workflow
 | |
| 1. **Task Analysis**: Parse task requirements and complexity
 | |
| 2. **Environment Setup**: Initialize appropriate sandbox
 | |
| 3. **Repository Preparation**: Clone and mount repository
 | |
| 4. **Model Selection**: Choose appropriate model/provider
 | |
| 5. **Task Execution**: Run role-specific execution strategy
 | |
| 6. **Result Validation**: Verify output quality and completeness
 | |
| 7. **Cleanup**: Teardown sandbox and collect artifacts
 | |
| 
 | |
| ### Phase 4: Repository Provider Implementation
 | |
| 
 | |
| #### 4.1 Real Repository Integration
 | |
| Replace `MockTaskProvider` with actual implementations:
 | |
| - **GiteaProvider**: Integration with GITEA API
 | |
| - **GitHubProvider**: GitHub API integration
 | |
| - **GitLabProvider**: GitLab API integration
 | |
| 
 | |
| #### 4.2 Task Lifecycle Management
 | |
| - Task claiming and status updates
 | |
| - Progress reporting back to repositories
 | |
| - Artifact attachment (patches, documentation, etc.)
 | |
| - Automated PR/MR creation for completed tasks
 | |
| 
 | |
| ### Phase 5: AI Integration and Tool Support
 | |
| 
 | |
| #### 5.1 LLM Integration
 | |
| - Context-aware task analysis based on repository content
 | |
| - Code generation and problem-solving capabilities
 | |
| - Natural language processing for task descriptions
 | |
| - Multi-step reasoning for complex tasks
 | |
| 
 | |
| #### 5.2 Tool Integration
 | |
| - MCP server connectivity within sandbox
 | |
| - Development tool access (compilers, linters, formatters)
 | |
| - Testing framework integration
 | |
| - Documentation generation tools
 | |
| 
 | |
| #### 5.3 Quality Assurance
 | |
| - Automated testing of generated code
 | |
| - Code quality metrics and analysis
 | |
| - Security vulnerability scanning
 | |
| - Performance impact assessment
 | |
| 
 | |
| ### Phase 6: Testing and Validation
 | |
| 
 | |
| #### 6.1 Unit Testing
 | |
| - Provider abstraction layer testing
 | |
| - Sandbox isolation verification
 | |
| - Task execution strategy validation
 | |
| - Error handling and recovery testing
 | |
| 
 | |
| #### 6.2 Integration Testing
 | |
| - End-to-end task execution workflows
 | |
| - Agent-to-WHOOSH communication testing
 | |
| - Multi-provider failover scenarios
 | |
| - Concurrent task execution testing
 | |
| 
 | |
| #### 6.3 Security Testing
 | |
| - Sandbox escape prevention
 | |
| - Resource limit enforcement
 | |
| - Network isolation validation
 | |
| - Secrets and credential protection
 | |
| 
 | |
| ### Phase 7: Production Deployment
 | |
| 
 | |
| #### 7.1 Configuration Management
 | |
| - Environment-specific model configurations
 | |
| - Sandbox resource limit definitions
 | |
| - Provider API key management
 | |
| - Monitoring and logging setup
 | |
| 
 | |
| #### 7.2 Monitoring and Observability
 | |
| - Task execution metrics and dashboards
 | |
| - Performance monitoring and alerting
 | |
| - Resource utilization tracking
 | |
| - Error rate and success metrics
 | |
| 
 | |
| ## Implementation Priorities
 | |
| 
 | |
| ### Critical Path (Week 1-2)
 | |
| 1. Model Provider Abstraction Layer
 | |
| 2. Basic Docker Sandbox Implementation
 | |
| 3. Replace Mock Task Execution
 | |
| 4. Role-Based Execution Strategies
 | |
| 
 | |
| ### High Priority (Week 3-4)
 | |
| 5. Real Repository Provider Implementation
 | |
| 6. AI Integration with Ollama/OpenAI
 | |
| 7. MCP Tool Integration
 | |
| 8. Basic Testing Framework
 | |
| 
 | |
| ### Medium Priority (Week 5-6)
 | |
| 9. Advanced Sandbox Types (VM, Process)
 | |
| 10. Quality Assurance Pipeline
 | |
| 11. Comprehensive Testing Suite
 | |
| 12. Performance Optimization
 | |
| 
 | |
| ### Future Enhancements
 | |
| - Multi-language model support
 | |
| - Advanced reasoning capabilities
 | |
| - Distributed task execution
 | |
| - Machine learning model fine-tuning
 | |
| 
 | |
| ## Success Metrics
 | |
| 
 | |
| - **Task Completion Rate**: >90% of assigned tasks successfully completed
 | |
| - **Code Quality**: Generated code passes all existing tests and linting
 | |
| - **Security**: Zero sandbox escapes or security violations
 | |
| - **Performance**: Task execution time within acceptable bounds
 | |
| - **Reliability**: <5% execution failure rate due to engine issues
 | |
| 
 | |
| ## Risk Mitigation
 | |
| 
 | |
| ### Security Risks
 | |
| - Sandbox escape → Multiple isolation layers, security audits
 | |
| - Credential exposure → Secure credential management, rotation
 | |
| - Resource exhaustion → Resource limits, monitoring, auto-scaling
 | |
| 
 | |
| ### Technical Risks
 | |
| - Model provider outages → Multi-provider failover, local fallbacks
 | |
| - Execution failures → Robust error handling, retry mechanisms
 | |
| - Performance bottlenecks → Profiling, optimization, horizontal scaling
 | |
| 
 | |
| ### Integration Risks
 | |
| - WHOOSH compatibility → Extensive integration testing, versioning
 | |
| - Repository provider changes → Provider abstraction, API versioning
 | |
| - Model compatibility → Provider abstraction, capability detection
 | |
| 
 | |
| This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment.
 | |
| 
 | |
| ## Implementation Roadmap
 | |
| 
 | |
| ### Development Standards & Workflow
 | |
| 
 | |
| **Semantic Versioning Strategy:**
 | |
| - **Patch (0.N.X)**: Bug fixes, small improvements, documentation updates
 | |
| - **Minor (0.N.0)**: New features, phase completions, non-breaking changes
 | |
| - **Major (N.0.0)**: Breaking changes, major architectural shifts
 | |
| 
 | |
| **Git Workflow:**
 | |
| 1. **Branch Creation**: `git checkout -b feature/phase-N-description`
 | |
| 2. **Development**: Implement with frequent commits using conventional commit format
 | |
| 3. **Testing**: Run full test suite with `make test` before PR
 | |
| 4. **Code Review**: Create PR with detailed description and test results
 | |
| 5. **Integration**: Squash merge to main after approval
 | |
| 6. **Release**: Tag with `git tag v0.N.0` and update Makefile version
 | |
| 
 | |
| **Quality Gates:**
 | |
| Each phase must meet these criteria before merge:
 | |
| - ✅ Unit tests with >80% coverage
 | |
| - ✅ Integration tests for external dependencies
 | |
| - ✅ Security review for new attack surfaces
 | |
| - ✅ Performance benchmarks within acceptable bounds
 | |
| - ✅ Documentation updates (code comments + README)
 | |
| - ✅ Backward compatibility verification
 | |
| 
 | |
| ### Phase-by-Phase Implementation
 | |
| 
 | |
| #### Phase 1: Model Provider Abstraction (v0.2.0)
 | |
| **Branch:** `feature/phase-1-model-providers`
 | |
| **Duration:** 3-5 days
 | |
| **Deliverables:**
 | |
| ```
 | |
| pkg/ai/
 | |
| ├── provider.go        # Core provider interface & request/response types
 | |
| ├── ollama.go          # Local Ollama model integration
 | |
| ├── openai.go          # OpenAI API client wrapper
 | |
| ├── resetdata.go       # ResetData LaaS integration
 | |
| ├── factory.go         # Provider factory with auto-selection
 | |
| └── provider_test.go   # Comprehensive provider tests
 | |
| 
 | |
| configs/
 | |
| └── models.yaml        # Role-model mapping configuration
 | |
| ```
 | |
| 
 | |
| **Key Features:**
 | |
| - Abstract AI providers behind unified interface
 | |
| - Support multiple providers with automatic failover
 | |
| - Configuration-driven model selection per agent role
 | |
| - Proper error handling and retry logic
 | |
| 
 | |
| #### Phase 2: Execution Environment Abstraction (v0.3.0)
 | |
| **Branch:** `feature/phase-2-execution-sandbox`
 | |
| **Duration:** 5-7 days
 | |
| **Deliverables:**
 | |
| ```
 | |
| pkg/execution/
 | |
| ├── sandbox.go         # Core sandbox interface & types
 | |
| ├── docker.go          # Docker container implementation
 | |
| ├── security.go        # Security policies & enforcement
 | |
| ├── resources.go       # Resource monitoring & limits
 | |
| └── sandbox_test.go    # Sandbox security & isolation tests
 | |
| ```
 | |
| 
 | |
| **Key Features:**
 | |
| - Docker-based task isolation with transparent repository access
 | |
| - Resource limits (CPU, memory, network, disk) with monitoring
 | |
| - Security boundary enforcement and escape prevention
 | |
| - Clean teardown and artifact collection
 | |
| 
 | |
| #### Phase 3: Core Task Execution Engine (v0.4.0)
 | |
| **Branch:** `feature/phase-3-task-execution`
 | |
| **Duration:** 7-10 days
 | |
| **Modified Files:**
 | |
| - `coordinator/task_coordinator.go:314` - Replace mock with real execution
 | |
| - `pkg/repository/types.go` - Extend interfaces for execution context
 | |
| 
 | |
| **New Files:**
 | |
| ```
 | |
| pkg/strategies/
 | |
| ├── developer.go       # Code implementation & bug fixes
 | |
| ├── reviewer.go        # Code review & quality analysis
 | |
| ├── architect.go       # System design & tech decisions
 | |
| └── tester.go          # Test creation & validation
 | |
| 
 | |
| pkg/engine/
 | |
| ├── executor.go        # Main execution orchestrator
 | |
| ├── workflow.go        # 7-step execution workflow
 | |
| └── validation.go      # Result quality verification
 | |
| ```
 | |
| 
 | |
| **Key Features:**
 | |
| - Real task execution replacing 10-second sleep simulation
 | |
| - Role-specific execution strategies with appropriate tooling
 | |
| - Integration between AI providers, sandboxes, and task lifecycle
 | |
| - Comprehensive result validation and quality metrics
 | |
| 
 | |
| #### Phase 4: Repository Provider Implementation (v0.5.0)
 | |
| **Branch:** `feature/phase-4-real-providers`
 | |
| **Duration:** 10-14 days
 | |
| **Deliverables:**
 | |
| ```
 | |
| pkg/providers/
 | |
| ├── gitea.go           # Gitea API integration (primary)
 | |
| ├── github.go          # GitHub API integration
 | |
| ├── gitlab.go          # GitLab API integration
 | |
| └── provider_test.go   # API integration tests
 | |
| ```
 | |
| 
 | |
| **Key Features:**
 | |
| - Replace MockTaskProvider with production implementations
 | |
| - Task claiming, status updates, and progress reporting via APIs
 | |
| - Automated PR/MR creation with proper branch management
 | |
| - Repository-specific configuration and credential management
 | |
| 
 | |
| ### Testing Strategy
 | |
| 
 | |
| **Unit Testing:**
 | |
| - Each provider/sandbox implementation has dedicated test suite
 | |
| - Mock external dependencies (APIs, Docker, etc.) for isolated testing
 | |
| - Property-based testing for core interfaces
 | |
| - Error condition and edge case coverage
 | |
| 
 | |
| **Integration Testing:**
 | |
| - End-to-end task execution workflows
 | |
| - Multi-provider failover scenarios
 | |
| - Agent-to-WHOOSH communication validation
 | |
| - Concurrent task execution under load
 | |
| 
 | |
| **Security Testing:**
 | |
| - Sandbox escape prevention validation
 | |
| - Resource exhaustion protection
 | |
| - Network isolation verification
 | |
| - Secrets and credential protection audits
 | |
| 
 | |
| ### Deployment & Monitoring
 | |
| 
 | |
| **Configuration Management:**
 | |
| - Environment-specific model configurations
 | |
| - Sandbox resource limits per environment
 | |
| - Provider API credentials via secure secret management
 | |
| - Feature flags for gradual rollout
 | |
| 
 | |
| **Observability:**
 | |
| - Task execution metrics (completion rate, duration, success/failure)
 | |
| - Resource utilization tracking (CPU, memory, network per task)
 | |
| - Error rate monitoring with alerting thresholds
 | |
| - Performance dashboards for capacity planning
 | |
| 
 | |
| ### Risk Mitigation
 | |
| 
 | |
| **Technical Risks:**
 | |
| - **Provider Outages**: Multi-provider failover with health checks
 | |
| - **Resource Exhaustion**: Strict limits with monitoring and auto-scaling
 | |
| - **Execution Failures**: Retry mechanisms with exponential backoff
 | |
| 
 | |
| **Security Risks:**
 | |
| - **Sandbox Escapes**: Multiple isolation layers and regular security audits
 | |
| - **Credential Exposure**: Secure rotation and least-privilege access
 | |
| - **Data Exfiltration**: Network isolation and egress monitoring
 | |
| 
 | |
| **Integration Risks:**
 | |
| - **API Changes**: Provider abstraction with versioning support
 | |
| - **Performance Degradation**: Comprehensive benchmarking at each phase
 | |
| - **Compatibility Issues**: Extensive integration testing with existing systems |