- Add detailed phase-by-phase implementation strategy - Define semantic versioning and Git workflow standards - Specify quality gates and testing requirements - Include risk mitigation and deployment strategies - Provide clear deliverables and timelines for each phase
15 KiB
CHORUS Task Execution Engine Development Plan
Overview
This plan outlines the development of a comprehensive task execution engine for CHORUS agents, replacing the current mock implementation with a fully functional system that can execute real work according to agent roles and specializations.
Current State Analysis
What's Implemented ✅
- Task Coordinator Framework (
coordinator/task_coordinator.go): Full task management lifecycle with role-based assignment, collaboration requests, and HMMM integration - Agent Role System: Role announcements, capability broadcasting, and expertise matching
- P2P Infrastructure: Nodes can discover each other and communicate via pubsub
- Health Monitoring: Comprehensive health checks and graceful shutdown
Critical Gaps Identified ❌
- Task Execution Engine:
executeTask()only has a 10-second sleep simulation - no actual work performed - Repository Integration: Mock providers only - no real GitHub/GitLab task pulling
- Agent-to-Task Binding: Task discovery relies on WHOOSH but agents don't connect to real work
- Role-Based Execution: Agents announce roles but don't execute tasks according to their specialization
- AI Integration: No LLM/reasoning integration for task completion
Architecture Requirements
Model and Provider Abstraction
The execution engine must support multiple AI model providers and execution environments:
Model Provider Types:
- Local Ollama: Default for most roles (llama3.1:8b, codellama, etc.)
- OpenAI API: For specialized models (chatgpt-5, gpt-4o, etc.)
- ResetData API: For testing and fallback (llama3.1:8b via LaaS)
- Custom Endpoints: Support for other provider APIs
Role-Model Mapping:
- Each role has a default model configuration
- Specialized roles may require specific models/providers
- Model selection transparent to execution logic
- Support for MCP calls and tool usage regardless of provider
Execution Environment Abstraction
Tasks must execute in secure, isolated environments while maintaining transparency:
Sandbox Types:
- Docker Containers: Isolated execution environment per task
- Specialized VMs: For tasks requiring full OS isolation
- Process Sandboxing: Lightweight isolation for simple tasks
Transparency Requirements:
- Model perceives it's working on a local repository
- Development tools available within sandbox
- File system operations work normally from model's perspective
- Network access controlled but transparent
- Resource limits enforced but invisible
Development Plan
Phase 1: Model Provider Abstraction Layer
1.1 Create Provider Interface
// pkg/ai/provider.go
type ModelProvider interface {
ExecuteTask(ctx context.Context, request *TaskRequest) (*TaskResponse, error)
SupportsMCP() bool
SupportsTools() bool
GetCapabilities() []string
}
1.2 Implement Provider Types
- OllamaProvider: Local model execution
- OpenAIProvider: OpenAI API integration
- ResetDataProvider: ResetData LaaS integration
- ProviderFactory: Creates appropriate provider based on model config
1.3 Role-Model Configuration
# Config structure for role-model mapping
roles:
developer:
default_model: "codellama:13b"
provider: "ollama"
fallback_model: "llama3.1:8b"
fallback_provider: "resetdata"
architect:
default_model: "gpt-4o"
provider: "openai"
fallback_model: "llama3.1:8b"
fallback_provider: "ollama"
Phase 2: Execution Environment Abstraction
2.1 Create Sandbox Interface
// pkg/execution/sandbox.go
type ExecutionSandbox interface {
Initialize(ctx context.Context, config *SandboxConfig) error
ExecuteCommand(ctx context.Context, cmd *Command) (*CommandResult, error)
CopyFiles(ctx context.Context, source, dest string) error
Cleanup() error
}
2.2 Implement Sandbox Types
- DockerSandbox: Container-based isolation
- VMSandbox: Full VM isolation for sensitive tasks
- ProcessSandbox: Lightweight process-based isolation
2.3 Repository Mounting
- Clone repository into sandbox environment
- Mount as local filesystem from model's perspective
- Implement secure file I/O operations
- Handle git operations within sandbox
Phase 3: Core Task Execution Engine
3.1 Replace Mock Implementation
Replace the current simulation in coordinator/task_coordinator.go:314:
// Current mock implementation
time.Sleep(10 * time.Second) // Simulate work
// New implementation
result, err := tc.executionEngine.ExecuteTask(ctx, &TaskExecutionRequest{
Task: activeTask.Task,
Agent: tc.agentInfo,
Sandbox: sandboxConfig,
ModelProvider: providerConfig,
})
3.2 Task Execution Strategies
Create role-specific execution patterns:
- DeveloperStrategy: Code implementation, bug fixes, feature development
- ReviewerStrategy: Code review, quality analysis, test coverage assessment
- ArchitectStrategy: System design, technical decision making
- TesterStrategy: Test creation, validation, quality assurance
3.3 Execution Workflow
- Task Analysis: Parse task requirements and complexity
- Environment Setup: Initialize appropriate sandbox
- Repository Preparation: Clone and mount repository
- Model Selection: Choose appropriate model/provider
- Task Execution: Run role-specific execution strategy
- Result Validation: Verify output quality and completeness
- Cleanup: Teardown sandbox and collect artifacts
Phase 4: Repository Provider Implementation
4.1 Real Repository Integration
Replace MockTaskProvider with actual implementations:
- GiteaProvider: Integration with GITEA API
- GitHubProvider: GitHub API integration
- GitLabProvider: GitLab API integration
4.2 Task Lifecycle Management
- Task claiming and status updates
- Progress reporting back to repositories
- Artifact attachment (patches, documentation, etc.)
- Automated PR/MR creation for completed tasks
Phase 5: AI Integration and Tool Support
5.1 LLM Integration
- Context-aware task analysis based on repository content
- Code generation and problem-solving capabilities
- Natural language processing for task descriptions
- Multi-step reasoning for complex tasks
5.2 Tool Integration
- MCP server connectivity within sandbox
- Development tool access (compilers, linters, formatters)
- Testing framework integration
- Documentation generation tools
5.3 Quality Assurance
- Automated testing of generated code
- Code quality metrics and analysis
- Security vulnerability scanning
- Performance impact assessment
Phase 6: Testing and Validation
6.1 Unit Testing
- Provider abstraction layer testing
- Sandbox isolation verification
- Task execution strategy validation
- Error handling and recovery testing
6.2 Integration Testing
- End-to-end task execution workflows
- Agent-to-WHOOSH communication testing
- Multi-provider failover scenarios
- Concurrent task execution testing
6.3 Security Testing
- Sandbox escape prevention
- Resource limit enforcement
- Network isolation validation
- Secrets and credential protection
Phase 7: Production Deployment
7.1 Configuration Management
- Environment-specific model configurations
- Sandbox resource limit definitions
- Provider API key management
- Monitoring and logging setup
7.2 Monitoring and Observability
- Task execution metrics and dashboards
- Performance monitoring and alerting
- Resource utilization tracking
- Error rate and success metrics
Implementation Priorities
Critical Path (Week 1-2)
- Model Provider Abstraction Layer
- Basic Docker Sandbox Implementation
- Replace Mock Task Execution
- Role-Based Execution Strategies
High Priority (Week 3-4)
- Real Repository Provider Implementation
- AI Integration with Ollama/OpenAI
- MCP Tool Integration
- Basic Testing Framework
Medium Priority (Week 5-6)
- Advanced Sandbox Types (VM, Process)
- Quality Assurance Pipeline
- Comprehensive Testing Suite
- Performance Optimization
Future Enhancements
- Multi-language model support
- Advanced reasoning capabilities
- Distributed task execution
- Machine learning model fine-tuning
Success Metrics
- Task Completion Rate: >90% of assigned tasks successfully completed
- Code Quality: Generated code passes all existing tests and linting
- Security: Zero sandbox escapes or security violations
- Performance: Task execution time within acceptable bounds
- Reliability: <5% execution failure rate due to engine issues
Risk Mitigation
Security Risks
- Sandbox escape → Multiple isolation layers, security audits
- Credential exposure → Secure credential management, rotation
- Resource exhaustion → Resource limits, monitoring, auto-scaling
Technical Risks
- Model provider outages → Multi-provider failover, local fallbacks
- Execution failures → Robust error handling, retry mechanisms
- Performance bottlenecks → Profiling, optimization, horizontal scaling
Integration Risks
- WHOOSH compatibility → Extensive integration testing, versioning
- Repository provider changes → Provider abstraction, API versioning
- Model compatibility → Provider abstraction, capability detection
This comprehensive plan addresses the core limitation that CHORUS agents currently lack real task execution capabilities while building a robust, secure, and scalable execution engine suitable for production deployment.
Implementation Roadmap
Development Standards & Workflow
Semantic Versioning Strategy:
- Patch (0.N.X): Bug fixes, small improvements, documentation updates
- Minor (0.N.0): New features, phase completions, non-breaking changes
- Major (N.0.0): Breaking changes, major architectural shifts
Git Workflow:
- Branch Creation:
git checkout -b feature/phase-N-description - Development: Implement with frequent commits using conventional commit format
- Testing: Run full test suite with
make testbefore PR - Code Review: Create PR with detailed description and test results
- Integration: Squash merge to main after approval
- Release: Tag with
git tag v0.N.0and update Makefile version
Quality Gates: Each phase must meet these criteria before merge:
- ✅ Unit tests with >80% coverage
- ✅ Integration tests for external dependencies
- ✅ Security review for new attack surfaces
- ✅ Performance benchmarks within acceptable bounds
- ✅ Documentation updates (code comments + README)
- ✅ Backward compatibility verification
Phase-by-Phase Implementation
Phase 1: Model Provider Abstraction (v0.2.0)
Branch: feature/phase-1-model-providers
Duration: 3-5 days
Deliverables:
pkg/ai/
├── provider.go # Core provider interface & request/response types
├── ollama.go # Local Ollama model integration
├── openai.go # OpenAI API client wrapper
├── resetdata.go # ResetData LaaS integration
├── factory.go # Provider factory with auto-selection
└── provider_test.go # Comprehensive provider tests
configs/
└── models.yaml # Role-model mapping configuration
Key Features:
- Abstract AI providers behind unified interface
- Support multiple providers with automatic failover
- Configuration-driven model selection per agent role
- Proper error handling and retry logic
Phase 2: Execution Environment Abstraction (v0.3.0)
Branch: feature/phase-2-execution-sandbox
Duration: 5-7 days
Deliverables:
pkg/execution/
├── sandbox.go # Core sandbox interface & types
├── docker.go # Docker container implementation
├── security.go # Security policies & enforcement
├── resources.go # Resource monitoring & limits
└── sandbox_test.go # Sandbox security & isolation tests
Key Features:
- Docker-based task isolation with transparent repository access
- Resource limits (CPU, memory, network, disk) with monitoring
- Security boundary enforcement and escape prevention
- Clean teardown and artifact collection
Phase 3: Core Task Execution Engine (v0.4.0)
Branch: feature/phase-3-task-execution
Duration: 7-10 days
Modified Files:
coordinator/task_coordinator.go:314- Replace mock with real executionpkg/repository/types.go- Extend interfaces for execution context
New Files:
pkg/strategies/
├── developer.go # Code implementation & bug fixes
├── reviewer.go # Code review & quality analysis
├── architect.go # System design & tech decisions
└── tester.go # Test creation & validation
pkg/engine/
├── executor.go # Main execution orchestrator
├── workflow.go # 7-step execution workflow
└── validation.go # Result quality verification
Key Features:
- Real task execution replacing 10-second sleep simulation
- Role-specific execution strategies with appropriate tooling
- Integration between AI providers, sandboxes, and task lifecycle
- Comprehensive result validation and quality metrics
Phase 4: Repository Provider Implementation (v0.5.0)
Branch: feature/phase-4-real-providers
Duration: 10-14 days
Deliverables:
pkg/providers/
├── gitea.go # Gitea API integration (primary)
├── github.go # GitHub API integration
├── gitlab.go # GitLab API integration
└── provider_test.go # API integration tests
Key Features:
- Replace MockTaskProvider with production implementations
- Task claiming, status updates, and progress reporting via APIs
- Automated PR/MR creation with proper branch management
- Repository-specific configuration and credential management
Testing Strategy
Unit Testing:
- Each provider/sandbox implementation has dedicated test suite
- Mock external dependencies (APIs, Docker, etc.) for isolated testing
- Property-based testing for core interfaces
- Error condition and edge case coverage
Integration Testing:
- End-to-end task execution workflows
- Multi-provider failover scenarios
- Agent-to-WHOOSH communication validation
- Concurrent task execution under load
Security Testing:
- Sandbox escape prevention validation
- Resource exhaustion protection
- Network isolation verification
- Secrets and credential protection audits
Deployment & Monitoring
Configuration Management:
- Environment-specific model configurations
- Sandbox resource limits per environment
- Provider API credentials via secure secret management
- Feature flags for gradual rollout
Observability:
- Task execution metrics (completion rate, duration, success/failure)
- Resource utilization tracking (CPU, memory, network per task)
- Error rate monitoring with alerting thresholds
- Performance dashboards for capacity planning
Risk Mitigation
Technical Risks:
- Provider Outages: Multi-provider failover with health checks
- Resource Exhaustion: Strict limits with monitoring and auto-scaling
- Execution Failures: Retry mechanisms with exponential backoff
Security Risks:
- Sandbox Escapes: Multiple isolation layers and regular security audits
- Credential Exposure: Secure rotation and least-privilege access
- Data Exfiltration: Network isolation and egress monitoring
Integration Risks:
- API Changes: Provider abstraction with versioning support
- Performance Degradation: Comprehensive benchmarking at each phase
- Compatibility Issues: Extensive integration testing with existing systems