Major WHOOSH system refactoring and feature enhancements

- Migrated from HIVE branding to WHOOSH across all components - Enhanced backend API with new services: AI models, BZZZ integration, templates, members - Added comprehensive testing suite with security, performance, and integration tests - Improved frontend with new components for project setup, AI models, and team management - Updated MCP server implementation with WHOOSH-specific tools and resources - Enhanced deployment configurations with production-ready Docker setups - Added comprehensive documentation and setup guides - Implemented age encryption service and UCXL integration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-27 08:34:48 +10:00
parent 0e9844ef13
commit 268214d971
399 changed files with 57390 additions and 2045 deletions
--- a/planning/TESTING_STRATEGY.md
+++ b/planning/TESTING_STRATEGY.md
@@ -0,0 +1,897 @@
+# 🧪 CCLI Testing Strategy
+
+**Project**: Gemini CLI Agent Integration  
+**Version**: 1.0  
+**Testing Philosophy**: **Fail Fast, Test Early, Protect Production**
+
+## 🎯 Testing Objectives
+
+### **Primary Goals**
+1. **Zero Impact**: Ensure CLI agent integration doesn't affect existing Ollama agents
+2. **Reliability**: Validate CLI agents work consistently under various conditions
+3. **Performance**: Ensure CLI agents meet performance requirements
+4. **Security**: Verify SSH connections and authentication are secure
+5. **Scalability**: Test concurrent execution and resource usage
+
+### **Quality Gates**
+- **Unit Tests**: ≥90% code coverage for CLI agent components
+- **Integration Tests**: 100% of CLI agent workflows tested end-to-end
+- **Performance Tests**: CLI agents perform within 150% of Ollama baseline
+- **Security Tests**: All SSH connections and authentication validated
+- **Load Tests**: System stable under 10x normal load with CLI agents
+
+---
+
+## 📋 Test Categories
+
+### **1. 🔧 Unit Tests**
+
+#### **1.1 CLI Agent Adapter Tests**
+```python
+# File: src/tests/test_gemini_cli_agent.py
+import pytest
+from unittest.mock import Mock, AsyncMock
+from src.agents.gemini_cli_agent import GeminiCliAgent, GeminiCliConfig
+
+class TestGeminiCliAgent:
+    @pytest.fixture
+    def agent_config(self):
+        return GeminiCliConfig(
+            host="test-host",
+            node_path="/test/node",
+            gemini_path="/test/gemini",
+            node_version="v22.14.0",
+            model="gemini-2.5-pro"
+        )
+    
+    @pytest.fixture
+    def agent(self, agent_config):
+        return GeminiCliAgent(agent_config, "test_specialty")
+    
+    async def test_execute_task_success(self, agent, mocker):
+        """Test successful task execution"""
+        mock_ssh_execute = mocker.patch.object(agent, '_ssh_execute')
+        mock_ssh_execute.return_value = Mock(
+            stdout="Test response",
+            returncode=0,
+            duration=1.5
+        )
+        
+        result = await agent.execute_task("Test prompt")
+        
+        assert result["status"] == "completed"
+        assert result["response"] == "Test response"
+        assert result["execution_time"] == 1.5
+        assert result["model"] == "gemini-2.5-pro"
+    
+    async def test_execute_task_failure(self, agent, mocker):
+        """Test task execution failure handling"""
+        mock_ssh_execute = mocker.patch.object(agent, '_ssh_execute')
+        mock_ssh_execute.side_effect = Exception("SSH connection failed")
+        
+        result = await agent.execute_task("Test prompt")
+        
+        assert result["status"] == "failed"
+        assert "SSH connection failed" in result["error"]
+    
+    async def test_concurrent_task_limit(self, agent):
+        """Test concurrent task execution limits"""
+        agent.config.max_concurrent = 2
+        
+        # Start 2 tasks
+        task1 = agent.execute_task("Task 1")
+        task2 = agent.execute_task("Task 2")
+        
+        # Third task should fail
+        with pytest.raises(Exception, match="maximum concurrent tasks"):
+            await agent.execute_task("Task 3")
+```
+
+#### **1.2 SSH Executor Tests**
+```python
+# File: src/tests/test_ssh_executor.py
+import pytest
+from src.executors.ssh_executor import SSHExecutor, SSHResult
+
+class TestSSHExecutor:
+    @pytest.fixture
+    def executor(self):
+        return SSHExecutor(connection_pool_size=2)
+    
+    async def test_connection_pooling(self, executor, mocker):
+        """Test SSH connection pooling"""
+        mock_connect = mocker.patch('asyncssh.connect')
+        mock_conn = AsyncMock()
+        mock_connect.return_value = mock_conn
+        
+        # Execute multiple commands on same host
+        await executor.execute("test-host", "command1")
+        await executor.execute("test-host", "command2")
+        
+        # Should reuse connection
+        assert mock_connect.call_count == 1
+    
+    async def test_command_timeout(self, executor, mocker):
+        """Test command timeout handling"""
+        mock_connect = mocker.patch('asyncssh.connect')
+        mock_conn = AsyncMock()
+        mock_conn.run.side_effect = asyncio.TimeoutError()
+        mock_connect.return_value = mock_conn
+        
+        with pytest.raises(Exception, match="SSH command timeout"):
+            await executor.execute("test-host", "slow-command", timeout=1)
+```
+
+#### **1.3 Agent Factory Tests**
+```python
+# File: src/tests/test_cli_agent_factory.py
+from src.agents.cli_agent_factory import CliAgentFactory
+
+class TestCliAgentFactory:
+    def test_create_known_agent(self):
+        """Test creating predefined agents"""
+        agent = CliAgentFactory.create_agent("walnut-gemini", "general_ai")
+        
+        assert agent.config.host == "walnut"
+        assert agent.config.node_version == "v22.14.0"
+        assert agent.specialization == "general_ai"
+    
+    def test_create_unknown_agent(self):
+        """Test error handling for unknown agents"""
+        with pytest.raises(ValueError, match="Unknown CLI agent"):
+            CliAgentFactory.create_agent("nonexistent-agent", "test")
+```
+
+### **2. 🔗 Integration Tests**
+
+#### **2.1 End-to-End CLI Agent Execution**
+```python
+# File: src/tests/integration/test_cli_agent_integration.py
+import pytest
+from backend.app.core.whoosh_coordinator import WHOOSHCoordinator
+from backend.app.models.agent import Agent, AgentType
+
+class TestCliAgentIntegration:
+    @pytest.fixture
+    async def coordinator(self):
+        coordinator = WHOOSHCoordinator()
+        await coordinator.initialize()
+        return coordinator
+    
+    @pytest.fixture
+    def cli_agent(self):
+        return Agent(
+            id="test-cli-agent",
+            endpoint="cli://test-host",
+            model="gemini-2.5-pro",
+            specialty="general_ai",
+            agent_type=AgentType.CLI_GEMINI,
+            cli_config={
+                "host": "test-host",
+                "node_path": "/test/node",
+                "gemini_path": "/test/gemini",
+                "node_version": "v22.14.0"
+            }
+        )
+    
+    async def test_cli_task_execution(self, coordinator, cli_agent):
+        """Test complete CLI task execution workflow"""
+        task = coordinator.create_task(
+            task_type=AgentType.CLI_GEMINI,
+            context={"prompt": "What is 2+2?"},
+            priority=3
+        )
+        
+        result = await coordinator.execute_task(task, cli_agent)
+        
+        assert result["status"] == "completed"
+        assert "response" in result
+        assert task.status == TaskStatus.COMPLETED
+```
+
+#### **2.2 Mixed Agent Type Coordination**
+```python
+# File: src/tests/integration/test_mixed_agent_coordination.py
+class TestMixedAgentCoordination:
+    async def test_ollama_and_cli_agents_together(self, coordinator):
+        """Test Ollama and CLI agents working together"""
+        # Create tasks for both agent types
+        ollama_task = coordinator.create_task(
+            task_type=AgentType.PYTORCH_DEV,
+            context={"prompt": "Generate Python code"},
+            priority=3
+        )
+        
+        cli_task = coordinator.create_task(
+            task_type=AgentType.CLI_GEMINI,
+            context={"prompt": "Analyze this code"},
+            priority=3
+        )
+        
+        # Execute tasks concurrently
+        ollama_result, cli_result = await asyncio.gather(
+            coordinator.process_task(ollama_task),
+            coordinator.process_task(cli_task)
+        )
+        
+        assert ollama_result["status"] == "completed"
+        assert cli_result["status"] == "completed"
+```
+
+#### **2.3 MCP Server CLI Agent Support**
+```typescript
+// File: mcp-server/src/tests/integration/test_cli_agent_mcp.test.ts
+describe('MCP CLI Agent Integration', () => {
+    let whooshTools: WHOOSHTools;
+    
+    beforeEach(() => {
+        whooshTools = new WHOOSHTools(mockWHOOSHClient);
+    });
+    
+    test('should execute task on CLI agent', async () => {
+        const result = await whooshTools.executeTool('whoosh_create_task', {
+            type: 'cli_gemini',
+            priority: 3,
+            objective: 'Test CLI agent execution'
+        });
+        
+        expect(result.isError).toBe(false);
+        expect(result.content[0].text).toContain('Task created successfully');
+    });
+    
+    test('should discover both Ollama and CLI agents', async () => {
+        const result = await whooshTools.executeTool('whoosh_get_agents', {});
+        
+        expect(result.isError).toBe(false);
+        const agents = JSON.parse(result.content[0].text);
+        
+        // Should include both types
+        expect(agents.some(a => a.agent_type === 'ollama')).toBe(true);
+        expect(agents.some(a => a.agent_type === 'cli_gemini')).toBe(true);
+    });
+});
+```
+
+### **3. 📊 Performance Tests**
+
+#### **3.1 Response Time Benchmarking**
+```bash
+# File: scripts/benchmark-response-times.sh
+#!/bin/bash
+
+echo "🏃 CLI Agent Response Time Benchmarking"
+
+# Test single task execution times
+benchmark_single_task() {
+    local agent_type=$1
+    local iterations=10
+    local total_time=0
+    
+    echo "Benchmarking $agent_type agent (${iterations} iterations)..."
+    
+    for i in $(seq 1 $iterations); do
+        start_time=$(date +%s.%N)
+        
+        curl -s -X POST http://localhost:8000/api/tasks \
+            -H "Content-Type: application/json" \
+            -d "{
+                \"agent_type\": \"$agent_type\",
+                \"prompt\": \"What is the capital of France?\",
+                \"priority\": 3
+            }" > /dev/null
+        
+        end_time=$(date +%s.%N)
+        duration=$(echo "$end_time - $start_time" | bc)
+        total_time=$(echo "$total_time + $duration" | bc)
+        
+        echo "Iteration $i: ${duration}s"
+    done
+    
+    average_time=$(echo "scale=2; $total_time / $iterations" | bc)
+    echo "$agent_type average response time: ${average_time}s"
+}
+
+# Run benchmarks
+benchmark_single_task "ollama"
+benchmark_single_task "cli_gemini"
+
+# Compare results
+echo "📊 Performance Comparison Complete"
+```
+
+#### **3.2 Concurrent Execution Testing**
+```python
+# File: scripts/test_concurrent_execution.py
+import asyncio
+import aiohttp
+import time
+from typing import List, Tuple
+
+async def test_concurrent_cli_agents():
+    """Test concurrent CLI agent execution under load"""
+    
+    async def execute_task(session: aiohttp.ClientSession, task_id: int) -> Tuple[int, float, str]:
+        start_time = time.time()
+        
+        async with session.post(
+            'http://localhost:8000/api/tasks',
+            json={
+                'agent_type': 'cli_gemini',
+                'prompt': f'Process task {task_id}',
+                'priority': 3
+            }
+        ) as response:
+            result = await response.json()
+            duration = time.time() - start_time
+            status = result.get('status', 'unknown')
+            
+            return task_id, duration, status
+    
+    # Test various concurrency levels
+    concurrency_levels = [1, 2, 4, 8, 16]
+    
+    for concurrency in concurrency_levels:
+        print(f"\n🔄 Testing {concurrency} concurrent CLI agent tasks...")
+        
+        async with aiohttp.ClientSession() as session:
+            tasks = [
+                execute_task(session, i) 
+                for i in range(concurrency)
+            ]
+            
+            results = await asyncio.gather(*tasks, return_exceptions=True)
+            
+            # Analyze results
+            successful_tasks = [r for r in results if isinstance(r, tuple) and r[2] == 'completed']
+            failed_tasks = [r for r in results if not isinstance(r, tuple) or r[2] != 'completed']
+            
+            if successful_tasks:
+                avg_duration = sum(r[1] for r in successful_tasks) / len(successful_tasks)
+                print(f"  ✅ {len(successful_tasks)}/{concurrency} tasks successful")
+                print(f"  ⏱️  Average duration: {avg_duration:.2f}s")
+            
+            if failed_tasks:
+                print(f"  ❌ {len(failed_tasks)} tasks failed")
+
+if __name__ == "__main__":
+    asyncio.run(test_concurrent_cli_agents())
+```
+
+#### **3.3 Resource Usage Monitoring**
+```python
+# File: scripts/monitor_resource_usage.py
+import psutil
+import time
+import asyncio
+from typing import Dict, List
+
+class ResourceMonitor:
+    def __init__(self):
+        self.baseline_metrics = self.get_system_metrics()
+        
+    def get_system_metrics(self) -> Dict:
+        """Get current system resource usage"""
+        return {
+            'cpu_percent': psutil.cpu_percent(interval=1),
+            'memory_percent': psutil.virtual_memory().percent,
+            'network_io': psutil.net_io_counters(),
+            'ssh_connections': self.count_ssh_connections()
+        }
+    
+    def count_ssh_connections(self) -> int:
+        """Count active SSH connections"""
+        connections = psutil.net_connections()
+        ssh_conns = [c for c in connections if c.laddr and c.laddr.port == 22]
+        return len(ssh_conns)
+    
+    async def monitor_during_cli_execution(self, duration_minutes: int = 10):
+        """Monitor resource usage during CLI agent execution"""
+        print(f"🔍 Monitoring resources for {duration_minutes} minutes...")
+        
+        metrics_history = []
+        end_time = time.time() + (duration_minutes * 60)
+        
+        while time.time() < end_time:
+            current_metrics = self.get_system_metrics()
+            metrics_history.append({
+                'timestamp': time.time(),
+                **current_metrics
+            })
+            
+            print(f"CPU: {current_metrics['cpu_percent']}%, "
+                  f"Memory: {current_metrics['memory_percent']}%, "
+                  f"SSH Connections: {current_metrics['ssh_connections']}")
+            
+            await asyncio.sleep(30)  # Sample every 30 seconds
+        
+        self.analyze_resource_usage(metrics_history)
+    
+    def analyze_resource_usage(self, metrics_history: List[Dict]):
+        """Analyze resource usage patterns"""
+        if not metrics_history:
+            return
+        
+        avg_cpu = sum(m['cpu_percent'] for m in metrics_history) / len(metrics_history)
+        max_cpu = max(m['cpu_percent'] for m in metrics_history)
+        
+        avg_memory = sum(m['memory_percent'] for m in metrics_history) / len(metrics_history)
+        max_memory = max(m['memory_percent'] for m in metrics_history)
+        
+        max_ssh_conns = max(m['ssh_connections'] for m in metrics_history)
+        
+        print(f"\n📊 Resource Usage Analysis:")
+        print(f"  CPU - Average: {avg_cpu:.1f}%, Peak: {max_cpu:.1f}%")
+        print(f"  Memory - Average: {avg_memory:.1f}%, Peak: {max_memory:.1f}%")
+        print(f"  SSH Connections - Peak: {max_ssh_conns}")
+        
+        # Check if within acceptable limits
+        if max_cpu > 80:
+            print("  ⚠️  High CPU usage detected")
+        if max_memory > 85:
+            print("  ⚠️  High memory usage detected")
+        if max_ssh_conns > 20:
+            print("  ⚠️  High SSH connection count")
+```
+
+### **4. 🔒 Security Tests**
+
+#### **4.1 SSH Authentication Testing**
+```python
+# File: src/tests/security/test_ssh_security.py
+import pytest
+from src.executors.ssh_executor import SSHExecutor
+
+class TestSSHSecurity:
+    async def test_key_based_authentication(self):
+        """Test SSH key-based authentication"""
+        executor = SSHExecutor()
+        
+        # Should succeed with proper key
+        result = await executor.execute("walnut", "echo 'test'")
+        assert result.returncode == 0
+    
+    async def test_connection_timeout(self):
+        """Test SSH connection timeout handling"""
+        executor = SSHExecutor()
+        
+        with pytest.raises(Exception, match="timeout"):
+            await executor.execute("invalid-host", "echo 'test'", timeout=5)
+    
+    async def test_command_injection_prevention(self):
+        """Test prevention of command injection"""
+        executor = SSHExecutor()
+        
+        # Malicious command should be properly escaped
+        malicious_input = "test'; rm -rf /; echo 'evil"
+        result = await executor.execute("walnut", f"echo '{malicious_input}'")
+        
+        # Should not execute the rm command
+        assert "evil" in result.stdout
+        assert result.returncode == 0
+```
+
+#### **4.2 Network Security Testing**
+```bash
+# File: scripts/test-network-security.sh
+#!/bin/bash
+
+echo "🔒 Network Security Testing for CLI Agents"
+
+# Test SSH connection encryption
+test_ssh_encryption() {
+    echo "Testing SSH connection encryption..."
+    
+    # Capture network traffic during SSH session
+    timeout 10s tcpdump -i any -c 20 port 22 > /tmp/ssh_traffic.log 2>&1 &
+    tcpdump_pid=$!
+    
+    # Execute CLI command
+    ssh walnut "echo 'test connection'" > /dev/null 2>&1
+    
+    # Stop traffic capture
+    kill $tcpdump_pid 2>/dev/null
+    
+    # Verify encrypted traffic (should not contain plaintext)
+    if grep -q "test connection" /tmp/ssh_traffic.log; then
+        echo "❌ SSH traffic appears to be unencrypted"
+        return 1
+    else
+        echo "✅ SSH traffic is properly encrypted"
+        return 0
+    fi
+}
+
+# Test connection limits
+test_connection_limits() {
+    echo "Testing SSH connection limits..."
+    
+    # Try to open many connections
+    for i in {1..25}; do
+        ssh -o ConnectTimeout=5 walnut "sleep 1" &
+    done
+    
+    wait
+    echo "✅ Connection limit testing completed"
+}
+
+# Run security tests
+test_ssh_encryption
+test_connection_limits
+
+echo "🔒 Network security testing completed"
+```
+
+### **5. 🚀 Load Tests**
+
+#### **5.1 Sustained Load Testing**
+```python
+# File: scripts/load_test_sustained.py
+import asyncio
+import aiohttp
+import random
+import time
+from dataclasses import dataclass
+from typing import List, Dict
+
+@dataclass
+class LoadTestConfig:
+    duration_minutes: int = 30
+    requests_per_second: int = 2
+    cli_agent_percentage: int = 30  # 30% CLI, 70% Ollama
+    
+class SustainedLoadTester:
+    def __init__(self, config: LoadTestConfig):
+        self.config = config
+        self.results = []
+        
+    async def generate_load(self):
+        """Generate sustained load on the system"""
+        end_time = time.time() + (self.config.duration_minutes * 60)
+        task_counter = 0
+        
+        async with aiohttp.ClientSession() as session:
+            while time.time() < end_time:
+                # Determine agent type based on percentage
+                use_cli = random.randint(1, 100) <= self.config.cli_agent_percentage
+                agent_type = "cli_gemini" if use_cli else "ollama"
+                
+                # Create task
+                task = asyncio.create_task(
+                    self.execute_single_request(session, agent_type, task_counter)
+                )
+                
+                task_counter += 1
+                
+                # Maintain request rate
+                await asyncio.sleep(1.0 / self.config.requests_per_second)
+        
+        # Wait for all tasks to complete
+        await asyncio.gather(*asyncio.all_tasks(), return_exceptions=True)
+        
+        self.analyze_results()
+    
+    async def execute_single_request(self, session: aiohttp.ClientSession, 
+                                   agent_type: str, task_id: int):
+        """Execute a single request and record metrics"""
+        start_time = time.time()
+        
+        try:
+            async with session.post(
+                'http://localhost:8000/api/tasks',
+                json={
+                    'agent_type': agent_type,
+                    'prompt': f'Load test task {task_id}',
+                    'priority': 3
+                },
+                timeout=aiohttp.ClientTimeout(total=60)
+            ) as response:
+                result = await response.json()
+                duration = time.time() - start_time
+                
+                self.results.append({
+                    'task_id': task_id,
+                    'agent_type': agent_type,
+                    'duration': duration,
+                    'status': response.status,
+                    'success': response.status == 200
+                })
+                
+        except Exception as e:
+            duration = time.time() - start_time
+            self.results.append({
+                'task_id': task_id,
+                'agent_type': agent_type,
+                'duration': duration,
+                'status': 0,
+                'success': False,
+                'error': str(e)
+            })
+    
+    def analyze_results(self):
+        """Analyze load test results"""
+        if not self.results:
+            print("No results to analyze")
+            return
+        
+        total_requests = len(self.results)
+        successful_requests = sum(1 for r in self.results if r['success'])
+        
+        cli_results = [r for r in self.results if r['agent_type'] == 'cli_gemini']
+        ollama_results = [r for r in self.results if r['agent_type'] == 'ollama']
+        
+        print(f"\n📊 Load Test Results:")
+        print(f"  Total Requests: {total_requests}")
+        print(f"  Success Rate: {successful_requests/total_requests*100:.1f}%")
+        
+        if cli_results:
+            cli_avg_duration = sum(r['duration'] for r in cli_results) / len(cli_results)
+            cli_success_rate = sum(1 for r in cli_results if r['success']) / len(cli_results)
+            print(f"  CLI Agents - Count: {len(cli_results)}, "
+                  f"Avg Duration: {cli_avg_duration:.2f}s, "
+                  f"Success Rate: {cli_success_rate*100:.1f}%")
+        
+        if ollama_results:
+            ollama_avg_duration = sum(r['duration'] for r in ollama_results) / len(ollama_results)
+            ollama_success_rate = sum(1 for r in ollama_results if r['success']) / len(ollama_results)
+            print(f"  Ollama Agents - Count: {len(ollama_results)}, "
+                  f"Avg Duration: {ollama_avg_duration:.2f}s, "
+                  f"Success Rate: {ollama_success_rate*100:.1f}%")
+
+# Run load test
+if __name__ == "__main__":
+    config = LoadTestConfig(
+        duration_minutes=30,
+        requests_per_second=2,
+        cli_agent_percentage=30
+    )
+    
+    tester = SustainedLoadTester(config)
+    asyncio.run(tester.generate_load())
+```
+
+### **6. 🧪 Chaos Testing**
+
+#### **6.1 Network Interruption Testing**
+```bash
+# File: scripts/chaos-test-network.sh
+#!/bin/bash
+
+echo "🌪️  Chaos Testing: Network Interruptions"
+
+# Function to simulate network latency
+simulate_network_latency() {
+    local target_host=$1
+    local delay_ms=$2
+    local duration_seconds=$3
+    
+    echo "Adding ${delay_ms}ms latency to $target_host for ${duration_seconds}s..."
+    
+    # Add network delay (requires root/sudo)
+    sudo tc qdisc add dev eth0 root netem delay ${delay_ms}ms
+    
+    # Wait for specified duration
+    sleep $duration_seconds
+    
+    # Remove network delay
+    sudo tc qdisc del dev eth0 root netem
+    
+    echo "Network latency removed"
+}
+
+# Function to simulate network packet loss
+simulate_packet_loss() {
+    local loss_percentage=$1
+    local duration_seconds=$2
+    
+    echo "Simulating ${loss_percentage}% packet loss for ${duration_seconds}s..."
+    
+    sudo tc qdisc add dev eth0 root netem loss ${loss_percentage}%
+    sleep $duration_seconds
+    sudo tc qdisc del dev eth0 root netem
+    
+    echo "Packet loss simulation ended"
+}
+
+# Test CLI agent resilience during network issues
+test_cli_resilience_during_network_chaos() {
+    echo "Testing CLI agent resilience during network chaos..."
+    
+    # Start background CLI agent tasks
+    for i in {1..5}; do
+        {
+            curl -X POST http://localhost:8000/api/tasks \
+                -H "Content-Type: application/json" \
+                -d "{\"agent_type\": \"cli_gemini\", \"prompt\": \"Chaos test task $i\"}" \
+                > /tmp/chaos_test_$i.log 2>&1
+        } &
+    done
+    
+    # Introduce network chaos
+    sleep 2
+    simulate_network_latency "walnut" 500 10  # 500ms delay for 10 seconds
+    sleep 5
+    simulate_packet_loss 10 5  # 10% packet loss for 5 seconds
+    
+    # Wait for all tasks to complete
+    wait
+    
+    # Analyze results
+    echo "Analyzing chaos test results..."
+    for i in {1..5}; do
+        if grep -q "completed" /tmp/chaos_test_$i.log; then
+            echo "  Task $i: ✅ Completed successfully despite network chaos"
+        else
+            echo "  Task $i: ❌ Failed during network chaos"
+        fi
+    done
+}
+
+# Note: This script requires root privileges for network simulation
+if [[ $EUID -eq 0 ]]; then
+    test_cli_resilience_during_network_chaos
+else
+    echo "⚠️  Chaos testing requires root privileges for network simulation"
+    echo "Run with: sudo $0"
+fi
+```
+
+---
+
+## 📊 Test Automation & CI/CD Integration
+
+### **Automated Test Pipeline**
+```yaml
+# File: .github/workflows/ccli-tests.yml
+name: CCLI Integration Tests
+
+on:
+  push:
+    branches: [ feature/gemini-cli-integration ]
+  pull_request:
+    branches: [ master ]
+
+jobs:
+  unit-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python
+        uses: actions/setup-python@v2
+        with:
+          python-version: 3.11
+      - name: Install dependencies
+        run: |
+          pip install -r requirements.txt
+          pip install pytest pytest-asyncio pytest-mock
+      - name: Run unit tests
+        run: pytest src/tests/ -v --cov=src --cov-report=xml
+      - name: Upload coverage
+        uses: codecov/codecov-action@v1
+
+  integration-tests:
+    runs-on: ubuntu-latest
+    needs: unit-tests
+    steps:
+      - uses: actions/checkout@v2
+      - name: Start test environment
+        run: docker-compose -f docker-compose.test.yml up -d
+      - name: Wait for services
+        run: sleep 30
+      - name: Run integration tests
+        run: pytest src/tests/integration/ -v
+      - name: Cleanup
+        run: docker-compose -f docker-compose.test.yml down
+
+  security-tests:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Run security scan
+        run: |
+          pip install bandit safety
+          bandit -r src/
+          safety check
+```
+
+### **Test Reporting Dashboard**
+```python
+# File: scripts/generate_test_report.py
+import json
+import datetime
+from pathlib import Path
+
+class TestReportGenerator:
+    def __init__(self):
+        self.results = {
+            'timestamp': datetime.datetime.now().isoformat(),
+            'test_suites': {}
+        }
+    
+    def add_test_suite(self, suite_name: str, results: dict):
+        """Add test suite results to the report"""
+        self.results['test_suites'][suite_name] = {
+            'total_tests': results.get('total', 0),
+            'passed': results.get('passed', 0),
+            'failed': results.get('failed', 0),
+            'success_rate': results.get('passed', 0) / max(results.get('total', 1), 1),
+            'duration': results.get('duration', 0),
+            'details': results.get('details', [])
+        }
+    
+    def generate_html_report(self, output_path: str):
+        """Generate HTML test report"""
+        html_content = self._build_html_report()
+        
+        with open(output_path, 'w') as f:
+            f.write(html_content)
+    
+    def _build_html_report(self) -> str:
+        """Build HTML report content"""
+        # HTML report template with test results
+        return f"""
+        <!DOCTYPE html>
+        <html>
+        <head>
+            <title>CCLI Test Report</title>
+            <style>
+                body {{ font-family: Arial, sans-serif; margin: 40px; }}
+                .success {{ color: green; }}
+                .failure {{ color: red; }}
+                .suite {{ margin: 20px 0; padding: 15px; border: 1px solid #ddd; }}
+            </style>
+        </head>
+        <body>
+            <h1>🧪 CCLI Test Report</h1>
+            <p>Generated: {self.results['timestamp']}</p>
+            {self._generate_suite_summaries()}
+        </body>
+        </html>
+        """
+    
+    def _generate_suite_summaries(self) -> str:
+        """Generate HTML for test suite summaries"""
+        html = ""
+        for suite_name, results in self.results['test_suites'].items():
+            status_class = "success" if results['success_rate'] >= 0.95 else "failure"
+            html += f"""
+            <div class="suite">
+                <h2>{suite_name}</h2>
+                <p class="{status_class}">
+                    {results['passed']}/{results['total']} tests passed 
+                    ({results['success_rate']*100:.1f}%)
+                </p>
+                <p>Duration: {results['duration']:.2f}s</p>
+            </div>
+            """
+        return html
+```
+
+---
+
+## 🎯 Success Criteria & Exit Conditions
+
+### **Test Completion Criteria**
+- [ ] **Unit Tests**: ≥90% code coverage achieved
+- [ ] **Integration Tests**: All CLI agent workflows tested successfully
+- [ ] **Performance Tests**: CLI agents within 150% of Ollama baseline
+- [ ] **Security Tests**: All SSH connections validated and secure
+- [ ] **Load Tests**: System stable under 10x normal load
+- [ ] **Chaos Tests**: System recovers gracefully from network issues
+
+### **Go/No-Go Decision Points**
+1. **After Unit Testing**: Proceed if >90% coverage and all tests pass
+2. **After Integration Testing**: Proceed if CLI agents work with existing system
+3. **After Performance Testing**: Proceed if performance within acceptable limits
+4. **After Security Testing**: Proceed if no security vulnerabilities found
+5. **After Load Testing**: Proceed if system handles production-like load
+
+### **Rollback Triggers**
+- Any test category has <80% success rate
+- CLI agent performance >200% of Ollama baseline
+- Security vulnerabilities discovered
+- System instability under normal load
+- Negative impact on existing Ollama agents
+
+---
+
+This comprehensive testing strategy ensures the CLI agent integration is thoroughly validated before production deployment while maintaining the stability and performance of the existing WHOOSH system.