CHORUS/docs/development/task-execution-engine-plan.md

# CHORUS Task Execution Engine Development Plan

## Overview
This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations.

## Current State Analysis

### What's Implemented ✅
- **Task Coordinator Framework** (`coordinator/task_coordinator.go`): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration
- **Agent Role System**: Role announcements, capability broadcasting, and expertise matching
- **P2P Infrastructure**: Nodes can discover each other and communicate via pubsub
- **Health Monitoring**: Comprehensive health checks and graceful shutdown

### Critical Gaps Identified ❌
- **Task Execution Engine**: `executeTask()` only has a 10-second sleep simulation - no actual work performed
- **Repository Integration**: Mock providers only - no real GitHub/GitLab task pulling
- **Agent-to-Task Binding**: Task discovery relies on WHOOSH but agents don't connect to real work
- **Role-Based Execution**: Agents announce roles but don't execute tasks according to their specialization
- **AI Integration**: No LLM/reasoning integration for task completion

## Architecture Requirements

### Model and Provider Abstraction
The execution engine must support multiple AI model providers and execution environments:

**Model Provider Types:**
- **Local Ollama**: Default for most roles (llama3.1:8b, codellama, etc.)
- **OpenAI API**: For specialized models (chatgpt-5, gpt-4o, etc.)
- **ResetData API**: For testing and fallback (llama3.1:8b via LaaS)
- **Custom Endpoints**: Support for other provider APIs

**Role-Model Mapping:**
- Each role has a default model configuration
- Specialized roles may require specific models/providers
- Model selection transparent to execution logic
- Support for MCP calls and tool usage regardless of provider

### Execution Environment Abstraction
Tasks must execute in secure, isolated environments while maintaining transparency:

**Sandbox Types:**
- **Docker Containers**: Isolated execution environment per task
- **Specialized VMs**: For tasks requiring full OS isolation
- **Process Sandboxing**: Lightweight isolation for simple tasks

**Transparency Requirements:**
- Model perceives it's working on a local repository
- Development tools available within sandbox
- File system operations work normally from model's perspective
- Network access controlled but transparent
- Resource limits enforced but invisible

## Development Plan

### Phase 1: Model Provider Abstraction Layer

#### 1.1 Create Provider Interface
```go
// pkg/ai/provider.go
type ModelProvider interface {
    ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error)
    SupportsMCP() bool
    SupportsTools() bool
    GetCapabilities() []string
}
```

#### 1.2 Implement Provider Types
- **OllamaProvider**: Local model execution
- **OpenAIProvider**: OpenAI API integration
- **ResetDataProvider**: ResetData LaaS integration
- **ProviderFactory**: Creates appropriate provider based on model config

#### 1.3 Role-Model Configuration
```yaml
# Config structure for role-model mapping
roles:
  developer:
    default_model: "codellama:13b"
    provider: "ollama"
    fallback_model: "llama3.1:8b"
    fallback_provider: "resetdata"

  architect:
    default_model: "gpt-4o"
    provider: "openai"
    fallback_model: "llama3.1:8b"
    fallback_provider: "ollama"
```

### Phase 2: Execution Environment Abstraction

#### 2.1 Create Sandbox Interface
```go
// pkg/execution/sandbox.go
type ExecutionSandbox interface {
    Initialize(ctx context.Context, config *SandboxConfig) error
    ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error)
    CopyFiles(ctx context.Context, source, dest string) error
    Cleanup() error
}
```

#### 2.2 Implement Sandbox Types
- **DockerSandbox**: Container-based isolation
- **VMSandbox**: Full VM isolation for sensitive tasks
- **ProcessSandbox**: Lightweight process-based isolation

#### 2.3 Repository Mounting
- Clone repository into sandbox environment
- Mount as local filesystem from model's perspective
- Implement secure file I/O operations
- Handle git operations within sandbox

### Phase 3: Core Task Execution Engine

#### 3.1 Replace Mock Implementation
Replace the current simulation in `coordinator/task_coordinator.go:314`:

```go
// Current mock implementation
time.Sleep(10 * time.Second) // Simulate work

// New implementation
result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{
    Task: activeTask.Task,
    Agent: tc.agentInfo,
    Sandbox: sandboxConfig,
    ModelProvider: providerConfig,
})
```

#### 3.2 Task Execution Strategies
Create role-specific execution patterns:

- **DeveloperStrategy**: Code implementation, bug fixes, feature development
- **ReviewerStrategy**: Code review, quality analysis, test coverage assessment
- **ArchitectStrategy**: System design, technical decision making
- **TesterStrategy**: Test creation, validation, quality assurance

#### 3.3 Execution Workflow
1. **Task Analysis**: Parse task requirements and complexity
2. **Environment Setup**: Initialize appropriate sandbox
3. **Repository Preparation**: Clone and mount repository
4. **Model Selection**: Choose appropriate model/provider
5. **Task Execution**: Run role-specific execution strategy
6. **Result Validation**: Verify output quality and completeness
7. **Cleanup**: Teardown sandbox and collect artifacts

### Phase 4: Repository Provider Implementation

#### 4.1 Real Repository Integration
Replace `MockTaskProvider` with actual implementations:
- **GiteaProvider**: Integration with GITEA API
- **GitHubProvider**: GitHub API integration
- **GitLabProvider**: GitLab API integration

#### 4.2 Task Lifecycle Management
- Task claiming and status updates
- Progress reporting back to repositories
- Artifact attachment (patches, documentation, etc.)
- Automated PR/MR creation for completed tasks

### Phase 5: AI Integration and Tool Support

#### 5.1 LLM Integration
- Context-aware task analysis based on repository content
- Code generation and problem-solving capabilities
- Natural language processing for task descriptions
- Multi-step reasoning for complex tasks

#### 5.2 Tool Integration
- MCP server connectivity within sandbox
- Development tool access (compilers, linters, formatters)
- Testing framework integration
- Documentation generation tools

#### 5.3 Quality Assurance
- Automated testing of generated code
- Code quality metrics and analysis
- Security vulnerability scanning
- Performance impact assessment

### Phase 6: Testing and Validation

#### 6.1 Unit Testing
- Provider abstraction layer testing
- Sandbox isolation verification
- Task execution strategy validation
- Error handling and recovery testing

#### 6.2 Integration Testing
- End-to-end task execution workflows
- Agent-to-WHOOSH communication testing
- Multi-provider failover scenarios
- Concurrent task execution testing

#### 6.3 Security Testing
- Sandbox escape prevention
- Resource limit enforcement
- Network isolation validation
- Secrets and credential protection

### Phase 7: Production Deployment

#### 7.1 Configuration Management
- Environment-specific model configurations
- Sandbox resource limit definitions
- Provider API key management
- Monitoring and logging setup

#### 7.2 Monitoring and Observability
- Task execution metrics and dashboards
- Performance monitoring and alerting
- Resource utilization tracking
- Error rate and success metrics

## Implementation Priorities

### Critical Path (Week 1-2)
1. Model Provider Abstraction Layer
2. Basic Docker Sandbox Implementation
3. Replace Mock Task Execution
4. Role-Based Execution Strategies

### High Priority (Week 3-4)
5. Real Repository Provider Implementation
6. AI Integration with Ollama/OpenAI
7. MCP Tool Integration
8. Basic Testing Framework

### Medium Priority (Week 5-6)
9. Advanced Sandbox Types (VM, Process)
10. Quality Assurance Pipeline
11. Comprehensive Testing Suite
12. Performance Optimization

### Future Enhancements
- Multi-language model support
- Advanced reasoning capabilities
- Distributed task execution
- Machine learning model fine-tuning

## Success Metrics

- **Task Completion Rate**: >90% of assigned tasks successfully completed
- **Code Quality**: Generated code passes all existing tests and linting
- **Security**: Zero sandbox escapes or security violations
- **Performance**: Task execution time within acceptable bounds
- **Reliability**: <5% execution failure rate due to engine issues

## Risk Mitigation

### Security Risks
- Sandbox escape → Multiple isolation layers, security audits
- Credential exposure → Secure credential management, rotation
- Resource exhaustion → Resource limits, monitoring, auto-scaling

### Technical Risks
- Model provider outages → Multi-provider failover, local fallbacks
- Execution failures → Robust error handling, retry mechanisms
- Performance bottlenecks → Profiling, optimization, horizontal scaling

### Integration Risks
- WHOOSH compatibility → Extensive integration testing, versioning
- Repository provider changes → Provider abstraction, API versioning
- Model compatibility → Provider abstraction, capability detection

This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment.

## Implementation Roadmap

### Development Standards & Workflow

**Semantic Versioning Strategy:**
- **Patch (0.N.X)**: Bug fixes, small improvements, documentation updates
- **Minor (0.N.0)**: New features, phase completions, non-breaking changes
- **Major (N.0.0)**: Breaking changes, major architectural shifts

**Git Workflow:**
1. **Branch Creation**: `git checkout -b feature/phase-N-description`
2. **Development**: Implement with frequent commits using conventional commit format
3. **Testing**: Run full test suite with `make test` before PR
4. **Code Review**: Create PR with detailed description and test results
5. **Integration**: Squash merge to main after approval
6. **Release**: Tag with `git tag v0.N.0` and update Makefile version

**Quality Gates:**
Each phase must meet these criteria before merge:
- ✅ Unit tests with >80% coverage
- ✅ Integration tests for external dependencies
- ✅ Security review for new attack surfaces
- ✅ Performance benchmarks within acceptable bounds
- ✅ Documentation updates (code comments + README)
- ✅ Backward compatibility verification

### Phase-by-Phase Implementation

#### Phase 1: Model Provider Abstraction (v0.2.0)
**Branch:** `feature/phase-1-model-providers`
**Duration:** 3-5 days
**Deliverables:**
```
pkg/ai/
├── provider.go        # Core provider interface & request/response types
├── ollama.go          # Local Ollama model integration
├── openai.go          # OpenAI API client wrapper
├── resetdata.go       # ResetData LaaS integration
├── factory.go         # Provider factory with auto-selection
└── provider_test.go   # Comprehensive provider tests

configs/
└── models.yaml        # Role-model mapping configuration
```

**Key Features:**
- Abstract AI providers behind unified interface
- Support multiple providers with automatic failover
- Configuration-driven model selection per agent role
- Proper error handling and retry logic

#### Phase 2: Execution Environment Abstraction (v0.3.0)
**Branch:** `feature/phase-2-execution-sandbox`
**Duration:** 5-7 days
**Deliverables:**
```
pkg/execution/
├── sandbox.go         # Core sandbox interface & types
├── docker.go          # Docker container implementation
├── security.go        # Security policies & enforcement
├── resources.go       # Resource monitoring & limits
└── sandbox_test.go    # Sandbox security & isolation tests
```

**Key Features:**
- Docker-based task isolation with transparent repository access
- Resource limits (CPU, memory, network, disk) with monitoring
- Security boundary enforcement and escape prevention
- Clean teardown and artifact collection

#### Phase 3: Core Task Execution Engine (v0.4.0)
**Branch:** `feature/phase-3-task-execution`
**Duration:** 7-10 days
**Modified Files:**
- `coordinator/task_coordinator.go:314` - Replace mock with real execution
- `pkg/repository/types.go` - Extend interfaces for execution context

**New Files:**
```
pkg/strategies/
├── developer.go       # Code implementation & bug fixes
├── reviewer.go        # Code review & quality analysis
├── architect.go       # System design & tech decisions
└── tester.go          # Test creation & validation

pkg/engine/
├── executor.go        # Main execution orchestrator
├── workflow.go        # 7-step execution workflow
└── validation.go      # Result quality verification
```

**Key Features:**
- Real task execution replacing 10-second sleep simulation
- Role-specific execution strategies with appropriate tooling
- Integration between AI providers, sandboxes, and task lifecycle
- Comprehensive result validation and quality metrics

#### Phase 4: Repository Provider Implementation (v0.5.0)
**Branch:** `feature/phase-4-real-providers`
**Duration:** 10-14 days
**Deliverables:**
```
pkg/providers/
├── gitea.go           # Gitea API integration (primary)
├── github.go          # GitHub API integration
├── gitlab.go          # GitLab API integration
└── provider_test.go   # API integration tests
```

**Key Features:**
- Replace MockTaskProvider with production implementations
- Task claiming, status updates, and progress reporting via APIs
- Automated PR/MR creation with proper branch management
- Repository-specific configuration and credential management

### Testing Strategy

**Unit Testing:**
- Each provider/sandbox implementation has dedicated test suite
- Mock external dependencies (APIs, Docker, etc.) for isolated testing
- Property-based testing for core interfaces
- Error condition and edge case coverage

**Integration Testing:**
- End-to-end task execution workflows
- Multi-provider failover scenarios
- Agent-to-WHOOSH communication validation
- Concurrent task execution under load

**Security Testing:**
- Sandbox escape prevention validation
- Resource exhaustion protection
- Network isolation verification
- Secrets and credential protection audits

### Deployment & Monitoring

**Configuration Management:**
- Environment-specific model configurations
- Sandbox resource limits per environment
- Provider API credentials via secure secret management
- Feature flags for gradual rollout

**Observability:**
- Task execution metrics (completion rate, duration, success/failure)
- Resource utilization tracking (CPU, memory, network per task)
- Error rate monitoring with alerting thresholds
- Performance dashboards for capacity planning

### Risk Mitigation

**Technical Risks:**
- **Provider Outages**: Multi-provider failover with health checks
- **Resource Exhaustion**: Strict limits with monitoring and auto-scaling
- **Execution Failures**: Retry mechanisms with exponential backoff

**Security Risks:**
- **Sandbox Escapes**: Multiple isolation layers and regular security audits
- **Credential Exposure**: Secure rotation and least-privilege access
- **Data Exfiltration**: Network isolation and egress monitoring

**Integration Risks:**
- **API Changes**: Provider abstraction with versioning support
- **Performance Degradation**: Comprehensive benchmarking at each phase
- **Compatibility Issues**: Extensive integration testing with existing systems