Fix P2P Connectivity Regression + Dynamic Versioning System #12
435
docs/development/task-execution-engine-plan.md
Normal file
435
docs/development/task-execution-engine-plan.md
Normal file
@@ -0,0 +1,435 @@
|
|||||||
|
# CHORUS Task Execution Engine Development Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations.
|
||||||
|
|
||||||
|
## Current State Analysis
|
||||||
|
|
||||||
|
### What's Implemented ✅
|
||||||
|
- **Task Coordinator Framework** (`coordinator/task_coordinator.go`): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration
|
||||||
|
- **Agent Role System**: Role announcements, capability broadcasting, and expertise matching
|
||||||
|
- **P2P Infrastructure**: Nodes can discover each other and communicate via pubsub
|
||||||
|
- **Health Monitoring**: Comprehensive health checks and graceful shutdown
|
||||||
|
|
||||||
|
### Critical Gaps Identified ❌
|
||||||
|
- **Task Execution Engine**: `executeTask()` only has a 10-second sleep simulation - no actual work performed
|
||||||
|
- **Repository Integration**: Mock providers only - no real GitHub/GitLab task pulling
|
||||||
|
- **Agent-to-Task Binding**: Task discovery relies on WHOOSH but agents don't connect to real work
|
||||||
|
- **Role-Based Execution**: Agents announce roles but don't execute tasks according to their specialization
|
||||||
|
- **AI Integration**: No LLM/reasoning integration for task completion
|
||||||
|
|
||||||
|
## Architecture Requirements
|
||||||
|
|
||||||
|
### Model and Provider Abstraction
|
||||||
|
The execution engine must support multiple AI model providers and execution environments:
|
||||||
|
|
||||||
|
**Model Provider Types:**
|
||||||
|
- **Local Ollama**: Default for most roles (llama3.1:8b, codellama, etc.)
|
||||||
|
- **OpenAI API**: For specialized models (chatgpt-5, gpt-4o, etc.)
|
||||||
|
- **ResetData API**: For testing and fallback (llama3.1:8b via LaaS)
|
||||||
|
- **Custom Endpoints**: Support for other provider APIs
|
||||||
|
|
||||||
|
**Role-Model Mapping:**
|
||||||
|
- Each role has a default model configuration
|
||||||
|
- Specialized roles may require specific models/providers
|
||||||
|
- Model selection transparent to execution logic
|
||||||
|
- Support for MCP calls and tool usage regardless of provider
|
||||||
|
|
||||||
|
### Execution Environment Abstraction
|
||||||
|
Tasks must execute in secure, isolated environments while maintaining transparency:
|
||||||
|
|
||||||
|
**Sandbox Types:**
|
||||||
|
- **Docker Containers**: Isolated execution environment per task
|
||||||
|
- **Specialized VMs**: For tasks requiring full OS isolation
|
||||||
|
- **Process Sandboxing**: Lightweight isolation for simple tasks
|
||||||
|
|
||||||
|
**Transparency Requirements:**
|
||||||
|
- Model perceives it's working on a local repository
|
||||||
|
- Development tools available within sandbox
|
||||||
|
- File system operations work normally from model's perspective
|
||||||
|
- Network access controlled but transparent
|
||||||
|
- Resource limits enforced but invisible
|
||||||
|
|
||||||
|
## Development Plan
|
||||||
|
|
||||||
|
### Phase 1: Model Provider Abstraction Layer
|
||||||
|
|
||||||
|
#### 1.1 Create Provider Interface
|
||||||
|
```go
|
||||||
|
// pkg/ai/provider.go
|
||||||
|
type ModelProvider interface {
|
||||||
|
ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error)
|
||||||
|
SupportsMCP() bool
|
||||||
|
SupportsTools() bool
|
||||||
|
GetCapabilities() []string
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 1.2 Implement Provider Types
|
||||||
|
- **OllamaProvider**: Local model execution
|
||||||
|
- **OpenAIProvider**: OpenAI API integration
|
||||||
|
- **ResetDataProvider**: ResetData LaaS integration
|
||||||
|
- **ProviderFactory**: Creates appropriate provider based on model config
|
||||||
|
|
||||||
|
#### 1.3 Role-Model Configuration
|
||||||
|
```yaml
|
||||||
|
# Config structure for role-model mapping
|
||||||
|
roles:
|
||||||
|
developer:
|
||||||
|
default_model: "codellama:13b"
|
||||||
|
provider: "ollama"
|
||||||
|
fallback_model: "llama3.1:8b"
|
||||||
|
fallback_provider: "resetdata"
|
||||||
|
|
||||||
|
architect:
|
||||||
|
default_model: "gpt-4o"
|
||||||
|
provider: "openai"
|
||||||
|
fallback_model: "llama3.1:8b"
|
||||||
|
fallback_provider: "ollama"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: Execution Environment Abstraction
|
||||||
|
|
||||||
|
#### 2.1 Create Sandbox Interface
|
||||||
|
```go
|
||||||
|
// pkg/execution/sandbox.go
|
||||||
|
type ExecutionSandbox interface {
|
||||||
|
Initialize(ctx context.Context, config *SandboxConfig) error
|
||||||
|
ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error)
|
||||||
|
CopyFiles(ctx context.Context, source, dest string) error
|
||||||
|
Cleanup() error
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2.2 Implement Sandbox Types
|
||||||
|
- **DockerSandbox**: Container-based isolation
|
||||||
|
- **VMSandbox**: Full VM isolation for sensitive tasks
|
||||||
|
- **ProcessSandbox**: Lightweight process-based isolation
|
||||||
|
|
||||||
|
#### 2.3 Repository Mounting
|
||||||
|
- Clone repository into sandbox environment
|
||||||
|
- Mount as local filesystem from model's perspective
|
||||||
|
- Implement secure file I/O operations
|
||||||
|
- Handle git operations within sandbox
|
||||||
|
|
||||||
|
### Phase 3: Core Task Execution Engine
|
||||||
|
|
||||||
|
#### 3.1 Replace Mock Implementation
|
||||||
|
Replace the current simulation in `coordinator/task_coordinator.go:314`:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Current mock implementation
|
||||||
|
time.Sleep(10 * time.Second) // Simulate work
|
||||||
|
|
||||||
|
// New implementation
|
||||||
|
result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{
|
||||||
|
Task: activeTask.Task,
|
||||||
|
Agent: tc.agentInfo,
|
||||||
|
Sandbox: sandboxConfig,
|
||||||
|
ModelProvider: providerConfig,
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3.2 Task Execution Strategies
|
||||||
|
Create role-specific execution patterns:
|
||||||
|
|
||||||
|
- **DeveloperStrategy**: Code implementation, bug fixes, feature development
|
||||||
|
- **ReviewerStrategy**: Code review, quality analysis, test coverage assessment
|
||||||
|
- **ArchitectStrategy**: System design, technical decision making
|
||||||
|
- **TesterStrategy**: Test creation, validation, quality assurance
|
||||||
|
|
||||||
|
#### 3.3 Execution Workflow
|
||||||
|
1. **Task Analysis**: Parse task requirements and complexity
|
||||||
|
2. **Environment Setup**: Initialize appropriate sandbox
|
||||||
|
3. **Repository Preparation**: Clone and mount repository
|
||||||
|
4. **Model Selection**: Choose appropriate model/provider
|
||||||
|
5. **Task Execution**: Run role-specific execution strategy
|
||||||
|
6. **Result Validation**: Verify output quality and completeness
|
||||||
|
7. **Cleanup**: Teardown sandbox and collect artifacts
|
||||||
|
|
||||||
|
### Phase 4: Repository Provider Implementation
|
||||||
|
|
||||||
|
#### 4.1 Real Repository Integration
|
||||||
|
Replace `MockTaskProvider` with actual implementations:
|
||||||
|
- **GiteaProvider**: Integration with GITEA API
|
||||||
|
- **GitHubProvider**: GitHub API integration
|
||||||
|
- **GitLabProvider**: GitLab API integration
|
||||||
|
|
||||||
|
#### 4.2 Task Lifecycle Management
|
||||||
|
- Task claiming and status updates
|
||||||
|
- Progress reporting back to repositories
|
||||||
|
- Artifact attachment (patches, documentation, etc.)
|
||||||
|
- Automated PR/MR creation for completed tasks
|
||||||
|
|
||||||
|
### Phase 5: AI Integration and Tool Support
|
||||||
|
|
||||||
|
#### 5.1 LLM Integration
|
||||||
|
- Context-aware task analysis based on repository content
|
||||||
|
- Code generation and problem-solving capabilities
|
||||||
|
- Natural language processing for task descriptions
|
||||||
|
- Multi-step reasoning for complex tasks
|
||||||
|
|
||||||
|
#### 5.2 Tool Integration
|
||||||
|
- MCP server connectivity within sandbox
|
||||||
|
- Development tool access (compilers, linters, formatters)
|
||||||
|
- Testing framework integration
|
||||||
|
- Documentation generation tools
|
||||||
|
|
||||||
|
#### 5.3 Quality Assurance
|
||||||
|
- Automated testing of generated code
|
||||||
|
- Code quality metrics and analysis
|
||||||
|
- Security vulnerability scanning
|
||||||
|
- Performance impact assessment
|
||||||
|
|
||||||
|
### Phase 6: Testing and Validation
|
||||||
|
|
||||||
|
#### 6.1 Unit Testing
|
||||||
|
- Provider abstraction layer testing
|
||||||
|
- Sandbox isolation verification
|
||||||
|
- Task execution strategy validation
|
||||||
|
- Error handling and recovery testing
|
||||||
|
|
||||||
|
#### 6.2 Integration Testing
|
||||||
|
- End-to-end task execution workflows
|
||||||
|
- Agent-to-WHOOSH communication testing
|
||||||
|
- Multi-provider failover scenarios
|
||||||
|
- Concurrent task execution testing
|
||||||
|
|
||||||
|
#### 6.3 Security Testing
|
||||||
|
- Sandbox escape prevention
|
||||||
|
- Resource limit enforcement
|
||||||
|
- Network isolation validation
|
||||||
|
- Secrets and credential protection
|
||||||
|
|
||||||
|
### Phase 7: Production Deployment
|
||||||
|
|
||||||
|
#### 7.1 Configuration Management
|
||||||
|
- Environment-specific model configurations
|
||||||
|
- Sandbox resource limit definitions
|
||||||
|
- Provider API key management
|
||||||
|
- Monitoring and logging setup
|
||||||
|
|
||||||
|
#### 7.2 Monitoring and Observability
|
||||||
|
- Task execution metrics and dashboards
|
||||||
|
- Performance monitoring and alerting
|
||||||
|
- Resource utilization tracking
|
||||||
|
- Error rate and success metrics
|
||||||
|
|
||||||
|
## Implementation Priorities
|
||||||
|
|
||||||
|
### Critical Path (Week 1-2)
|
||||||
|
1. Model Provider Abstraction Layer
|
||||||
|
2. Basic Docker Sandbox Implementation
|
||||||
|
3. Replace Mock Task Execution
|
||||||
|
4. Role-Based Execution Strategies
|
||||||
|
|
||||||
|
### High Priority (Week 3-4)
|
||||||
|
5. Real Repository Provider Implementation
|
||||||
|
6. AI Integration with Ollama/OpenAI
|
||||||
|
7. MCP Tool Integration
|
||||||
|
8. Basic Testing Framework
|
||||||
|
|
||||||
|
### Medium Priority (Week 5-6)
|
||||||
|
9. Advanced Sandbox Types (VM, Process)
|
||||||
|
10. Quality Assurance Pipeline
|
||||||
|
11. Comprehensive Testing Suite
|
||||||
|
12. Performance Optimization
|
||||||
|
|
||||||
|
### Future Enhancements
|
||||||
|
- Multi-language model support
|
||||||
|
- Advanced reasoning capabilities
|
||||||
|
- Distributed task execution
|
||||||
|
- Machine learning model fine-tuning
|
||||||
|
|
||||||
|
## Success Metrics
|
||||||
|
|
||||||
|
- **Task Completion Rate**: >90% of assigned tasks successfully completed
|
||||||
|
- **Code Quality**: Generated code passes all existing tests and linting
|
||||||
|
- **Security**: Zero sandbox escapes or security violations
|
||||||
|
- **Performance**: Task execution time within acceptable bounds
|
||||||
|
- **Reliability**: <5% execution failure rate due to engine issues
|
||||||
|
|
||||||
|
## Risk Mitigation
|
||||||
|
|
||||||
|
### Security Risks
|
||||||
|
- Sandbox escape → Multiple isolation layers, security audits
|
||||||
|
- Credential exposure → Secure credential management, rotation
|
||||||
|
- Resource exhaustion → Resource limits, monitoring, auto-scaling
|
||||||
|
|
||||||
|
### Technical Risks
|
||||||
|
- Model provider outages → Multi-provider failover, local fallbacks
|
||||||
|
- Execution failures → Robust error handling, retry mechanisms
|
||||||
|
- Performance bottlenecks → Profiling, optimization, horizontal scaling
|
||||||
|
|
||||||
|
### Integration Risks
|
||||||
|
- WHOOSH compatibility → Extensive integration testing, versioning
|
||||||
|
- Repository provider changes → Provider abstraction, API versioning
|
||||||
|
- Model compatibility → Provider abstraction, capability detection
|
||||||
|
|
||||||
|
This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment.
|
||||||
|
|
||||||
|
## Implementation Roadmap
|
||||||
|
|
||||||
|
### Development Standards & Workflow
|
||||||
|
|
||||||
|
**Semantic Versioning Strategy:**
|
||||||
|
- **Patch (0.N.X)**: Bug fixes, small improvements, documentation updates
|
||||||
|
- **Minor (0.N.0)**: New features, phase completions, non-breaking changes
|
||||||
|
- **Major (N.0.0)**: Breaking changes, major architectural shifts
|
||||||
|
|
||||||
|
**Git Workflow:**
|
||||||
|
1. **Branch Creation**: `git checkout -b feature/phase-N-description`
|
||||||
|
2. **Development**: Implement with frequent commits using conventional commit format
|
||||||
|
3. **Testing**: Run full test suite with `make test` before PR
|
||||||
|
4. **Code Review**: Create PR with detailed description and test results
|
||||||
|
5. **Integration**: Squash merge to main after approval
|
||||||
|
6. **Release**: Tag with `git tag v0.N.0` and update Makefile version
|
||||||
|
|
||||||
|
**Quality Gates:**
|
||||||
|
Each phase must meet these criteria before merge:
|
||||||
|
- ✅ Unit tests with >80% coverage
|
||||||
|
- ✅ Integration tests for external dependencies
|
||||||
|
- ✅ Security review for new attack surfaces
|
||||||
|
- ✅ Performance benchmarks within acceptable bounds
|
||||||
|
- ✅ Documentation updates (code comments + README)
|
||||||
|
- ✅ Backward compatibility verification
|
||||||
|
|
||||||
|
### Phase-by-Phase Implementation
|
||||||
|
|
||||||
|
#### Phase 1: Model Provider Abstraction (v0.2.0)
|
||||||
|
**Branch:** `feature/phase-1-model-providers`
|
||||||
|
**Duration:** 3-5 days
|
||||||
|
**Deliverables:**
|
||||||
|
```
|
||||||
|
pkg/ai/
|
||||||
|
├── provider.go # Core provider interface & request/response types
|
||||||
|
├── ollama.go # Local Ollama model integration
|
||||||
|
├── openai.go # OpenAI API client wrapper
|
||||||
|
├── resetdata.go # ResetData LaaS integration
|
||||||
|
├── factory.go # Provider factory with auto-selection
|
||||||
|
└── provider_test.go # Comprehensive provider tests
|
||||||
|
|
||||||
|
configs/
|
||||||
|
└── models.yaml # Role-model mapping configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- Abstract AI providers behind unified interface
|
||||||
|
- Support multiple providers with automatic failover
|
||||||
|
- Configuration-driven model selection per agent role
|
||||||
|
- Proper error handling and retry logic
|
||||||
|
|
||||||
|
#### Phase 2: Execution Environment Abstraction (v0.3.0)
|
||||||
|
**Branch:** `feature/phase-2-execution-sandbox`
|
||||||
|
**Duration:** 5-7 days
|
||||||
|
**Deliverables:**
|
||||||
|
```
|
||||||
|
pkg/execution/
|
||||||
|
├── sandbox.go # Core sandbox interface & types
|
||||||
|
├── docker.go # Docker container implementation
|
||||||
|
├── security.go # Security policies & enforcement
|
||||||
|
├── resources.go # Resource monitoring & limits
|
||||||
|
└── sandbox_test.go # Sandbox security & isolation tests
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- Docker-based task isolation with transparent repository access
|
||||||
|
- Resource limits (CPU, memory, network, disk) with monitoring
|
||||||
|
- Security boundary enforcement and escape prevention
|
||||||
|
- Clean teardown and artifact collection
|
||||||
|
|
||||||
|
#### Phase 3: Core Task Execution Engine (v0.4.0)
|
||||||
|
**Branch:** `feature/phase-3-task-execution`
|
||||||
|
**Duration:** 7-10 days
|
||||||
|
**Modified Files:**
|
||||||
|
- `coordinator/task_coordinator.go:314` - Replace mock with real execution
|
||||||
|
- `pkg/repository/types.go` - Extend interfaces for execution context
|
||||||
|
|
||||||
|
**New Files:**
|
||||||
|
```
|
||||||
|
pkg/strategies/
|
||||||
|
├── developer.go # Code implementation & bug fixes
|
||||||
|
├── reviewer.go # Code review & quality analysis
|
||||||
|
├── architect.go # System design & tech decisions
|
||||||
|
└── tester.go # Test creation & validation
|
||||||
|
|
||||||
|
pkg/engine/
|
||||||
|
├── executor.go # Main execution orchestrator
|
||||||
|
├── workflow.go # 7-step execution workflow
|
||||||
|
└── validation.go # Result quality verification
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- Real task execution replacing 10-second sleep simulation
|
||||||
|
- Role-specific execution strategies with appropriate tooling
|
||||||
|
- Integration between AI providers, sandboxes, and task lifecycle
|
||||||
|
- Comprehensive result validation and quality metrics
|
||||||
|
|
||||||
|
#### Phase 4: Repository Provider Implementation (v0.5.0)
|
||||||
|
**Branch:** `feature/phase-4-real-providers`
|
||||||
|
**Duration:** 10-14 days
|
||||||
|
**Deliverables:**
|
||||||
|
```
|
||||||
|
pkg/providers/
|
||||||
|
├── gitea.go # Gitea API integration (primary)
|
||||||
|
├── github.go # GitHub API integration
|
||||||
|
├── gitlab.go # GitLab API integration
|
||||||
|
└── provider_test.go # API integration tests
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- Replace MockTaskProvider with production implementations
|
||||||
|
- Task claiming, status updates, and progress reporting via APIs
|
||||||
|
- Automated PR/MR creation with proper branch management
|
||||||
|
- Repository-specific configuration and credential management
|
||||||
|
|
||||||
|
### Testing Strategy
|
||||||
|
|
||||||
|
**Unit Testing:**
|
||||||
|
- Each provider/sandbox implementation has dedicated test suite
|
||||||
|
- Mock external dependencies (APIs, Docker, etc.) for isolated testing
|
||||||
|
- Property-based testing for core interfaces
|
||||||
|
- Error condition and edge case coverage
|
||||||
|
|
||||||
|
**Integration Testing:**
|
||||||
|
- End-to-end task execution workflows
|
||||||
|
- Multi-provider failover scenarios
|
||||||
|
- Agent-to-WHOOSH communication validation
|
||||||
|
- Concurrent task execution under load
|
||||||
|
|
||||||
|
**Security Testing:**
|
||||||
|
- Sandbox escape prevention validation
|
||||||
|
- Resource exhaustion protection
|
||||||
|
- Network isolation verification
|
||||||
|
- Secrets and credential protection audits
|
||||||
|
|
||||||
|
### Deployment & Monitoring
|
||||||
|
|
||||||
|
**Configuration Management:**
|
||||||
|
- Environment-specific model configurations
|
||||||
|
- Sandbox resource limits per environment
|
||||||
|
- Provider API credentials via secure secret management
|
||||||
|
- Feature flags for gradual rollout
|
||||||
|
|
||||||
|
**Observability:**
|
||||||
|
- Task execution metrics (completion rate, duration, success/failure)
|
||||||
|
- Resource utilization tracking (CPU, memory, network per task)
|
||||||
|
- Error rate monitoring with alerting thresholds
|
||||||
|
- Performance dashboards for capacity planning
|
||||||
|
|
||||||
|
### Risk Mitigation
|
||||||
|
|
||||||
|
**Technical Risks:**
|
||||||
|
- **Provider Outages**: Multi-provider failover with health checks
|
||||||
|
- **Resource Exhaustion**: Strict limits with monitoring and auto-scaling
|
||||||
|
- **Execution Failures**: Retry mechanisms with exponential backoff
|
||||||
|
|
||||||
|
**Security Risks:**
|
||||||
|
- **Sandbox Escapes**: Multiple isolation layers and regular security audits
|
||||||
|
- **Credential Exposure**: Secure rotation and least-privilege access
|
||||||
|
- **Data Exfiltration**: Network isolation and egress monitoring
|
||||||
|
|
||||||
|
**Integration Risks:**
|
||||||
|
- **API Changes**: Provider abstraction with versioning support
|
||||||
|
- **Performance Degradation**: Comprehensive benchmarking at each phase
|
||||||
|
- **Compatibility Issues**: Extensive integration testing with existing systems
|
||||||
Reference in New Issue
Block a user