Major WHOOSH system refactoring and feature enhancements
- Migrated from HIVE branding to WHOOSH across all components - Enhanced backend API with new services: AI models, BZZZ integration, templates, members - Added comprehensive testing suite with security, performance, and integration tests - Improved frontend with new components for project setup, AI models, and team management - Updated MCP server implementation with WHOOSH-specific tools and resources - Enhanced deployment configurations with production-ready Docker setups - Added comprehensive documentation and setup guides - Implemented age encryption service and UCXL integration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
897
planning/TESTING_STRATEGY.md
Normal file
897
planning/TESTING_STRATEGY.md
Normal file
@@ -0,0 +1,897 @@
|
||||
# 🧪 CCLI Testing Strategy
|
||||
|
||||
**Project**: Gemini CLI Agent Integration
|
||||
**Version**: 1.0
|
||||
**Testing Philosophy**: **Fail Fast, Test Early, Protect Production**
|
||||
|
||||
## 🎯 Testing Objectives
|
||||
|
||||
### **Primary Goals**
|
||||
1. **Zero Impact**: Ensure CLI agent integration doesn't affect existing Ollama agents
|
||||
2. **Reliability**: Validate CLI agents work consistently under various conditions
|
||||
3. **Performance**: Ensure CLI agents meet performance requirements
|
||||
4. **Security**: Verify SSH connections and authentication are secure
|
||||
5. **Scalability**: Test concurrent execution and resource usage
|
||||
|
||||
### **Quality Gates**
|
||||
- **Unit Tests**: ≥90% code coverage for CLI agent components
|
||||
- **Integration Tests**: 100% of CLI agent workflows tested end-to-end
|
||||
- **Performance Tests**: CLI agents perform within 150% of Ollama baseline
|
||||
- **Security Tests**: All SSH connections and authentication validated
|
||||
- **Load Tests**: System stable under 10x normal load with CLI agents
|
||||
|
||||
---
|
||||
|
||||
## 📋 Test Categories
|
||||
|
||||
### **1. 🔧 Unit Tests**
|
||||
|
||||
#### **1.1 CLI Agent Adapter Tests**
|
||||
```python
|
||||
# File: src/tests/test_gemini_cli_agent.py
|
||||
import pytest
|
||||
from unittest.mock import Mock, AsyncMock
|
||||
from src.agents.gemini_cli_agent import GeminiCliAgent, GeminiCliConfig
|
||||
|
||||
class TestGeminiCliAgent:
|
||||
@pytest.fixture
|
||||
def agent_config(self):
|
||||
return GeminiCliConfig(
|
||||
host="test-host",
|
||||
node_path="/test/node",
|
||||
gemini_path="/test/gemini",
|
||||
node_version="v22.14.0",
|
||||
model="gemini-2.5-pro"
|
||||
)
|
||||
|
||||
@pytest.fixture
|
||||
def agent(self, agent_config):
|
||||
return GeminiCliAgent(agent_config, "test_specialty")
|
||||
|
||||
async def test_execute_task_success(self, agent, mocker):
|
||||
"""Test successful task execution"""
|
||||
mock_ssh_execute = mocker.patch.object(agent, '_ssh_execute')
|
||||
mock_ssh_execute.return_value = Mock(
|
||||
stdout="Test response",
|
||||
returncode=0,
|
||||
duration=1.5
|
||||
)
|
||||
|
||||
result = await agent.execute_task("Test prompt")
|
||||
|
||||
assert result["status"] == "completed"
|
||||
assert result["response"] == "Test response"
|
||||
assert result["execution_time"] == 1.5
|
||||
assert result["model"] == "gemini-2.5-pro"
|
||||
|
||||
async def test_execute_task_failure(self, agent, mocker):
|
||||
"""Test task execution failure handling"""
|
||||
mock_ssh_execute = mocker.patch.object(agent, '_ssh_execute')
|
||||
mock_ssh_execute.side_effect = Exception("SSH connection failed")
|
||||
|
||||
result = await agent.execute_task("Test prompt")
|
||||
|
||||
assert result["status"] == "failed"
|
||||
assert "SSH connection failed" in result["error"]
|
||||
|
||||
async def test_concurrent_task_limit(self, agent):
|
||||
"""Test concurrent task execution limits"""
|
||||
agent.config.max_concurrent = 2
|
||||
|
||||
# Start 2 tasks
|
||||
task1 = agent.execute_task("Task 1")
|
||||
task2 = agent.execute_task("Task 2")
|
||||
|
||||
# Third task should fail
|
||||
with pytest.raises(Exception, match="maximum concurrent tasks"):
|
||||
await agent.execute_task("Task 3")
|
||||
```
|
||||
|
||||
#### **1.2 SSH Executor Tests**
|
||||
```python
|
||||
# File: src/tests/test_ssh_executor.py
|
||||
import pytest
|
||||
from src.executors.ssh_executor import SSHExecutor, SSHResult
|
||||
|
||||
class TestSSHExecutor:
|
||||
@pytest.fixture
|
||||
def executor(self):
|
||||
return SSHExecutor(connection_pool_size=2)
|
||||
|
||||
async def test_connection_pooling(self, executor, mocker):
|
||||
"""Test SSH connection pooling"""
|
||||
mock_connect = mocker.patch('asyncssh.connect')
|
||||
mock_conn = AsyncMock()
|
||||
mock_connect.return_value = mock_conn
|
||||
|
||||
# Execute multiple commands on same host
|
||||
await executor.execute("test-host", "command1")
|
||||
await executor.execute("test-host", "command2")
|
||||
|
||||
# Should reuse connection
|
||||
assert mock_connect.call_count == 1
|
||||
|
||||
async def test_command_timeout(self, executor, mocker):
|
||||
"""Test command timeout handling"""
|
||||
mock_connect = mocker.patch('asyncssh.connect')
|
||||
mock_conn = AsyncMock()
|
||||
mock_conn.run.side_effect = asyncio.TimeoutError()
|
||||
mock_connect.return_value = mock_conn
|
||||
|
||||
with pytest.raises(Exception, match="SSH command timeout"):
|
||||
await executor.execute("test-host", "slow-command", timeout=1)
|
||||
```
|
||||
|
||||
#### **1.3 Agent Factory Tests**
|
||||
```python
|
||||
# File: src/tests/test_cli_agent_factory.py
|
||||
from src.agents.cli_agent_factory import CliAgentFactory
|
||||
|
||||
class TestCliAgentFactory:
|
||||
def test_create_known_agent(self):
|
||||
"""Test creating predefined agents"""
|
||||
agent = CliAgentFactory.create_agent("walnut-gemini", "general_ai")
|
||||
|
||||
assert agent.config.host == "walnut"
|
||||
assert agent.config.node_version == "v22.14.0"
|
||||
assert agent.specialization == "general_ai"
|
||||
|
||||
def test_create_unknown_agent(self):
|
||||
"""Test error handling for unknown agents"""
|
||||
with pytest.raises(ValueError, match="Unknown CLI agent"):
|
||||
CliAgentFactory.create_agent("nonexistent-agent", "test")
|
||||
```
|
||||
|
||||
### **2. 🔗 Integration Tests**
|
||||
|
||||
#### **2.1 End-to-End CLI Agent Execution**
|
||||
```python
|
||||
# File: src/tests/integration/test_cli_agent_integration.py
|
||||
import pytest
|
||||
from backend.app.core.whoosh_coordinator import WHOOSHCoordinator
|
||||
from backend.app.models.agent import Agent, AgentType
|
||||
|
||||
class TestCliAgentIntegration:
|
||||
@pytest.fixture
|
||||
async def coordinator(self):
|
||||
coordinator = WHOOSHCoordinator()
|
||||
await coordinator.initialize()
|
||||
return coordinator
|
||||
|
||||
@pytest.fixture
|
||||
def cli_agent(self):
|
||||
return Agent(
|
||||
id="test-cli-agent",
|
||||
endpoint="cli://test-host",
|
||||
model="gemini-2.5-pro",
|
||||
specialty="general_ai",
|
||||
agent_type=AgentType.CLI_GEMINI,
|
||||
cli_config={
|
||||
"host": "test-host",
|
||||
"node_path": "/test/node",
|
||||
"gemini_path": "/test/gemini",
|
||||
"node_version": "v22.14.0"
|
||||
}
|
||||
)
|
||||
|
||||
async def test_cli_task_execution(self, coordinator, cli_agent):
|
||||
"""Test complete CLI task execution workflow"""
|
||||
task = coordinator.create_task(
|
||||
task_type=AgentType.CLI_GEMINI,
|
||||
context={"prompt": "What is 2+2?"},
|
||||
priority=3
|
||||
)
|
||||
|
||||
result = await coordinator.execute_task(task, cli_agent)
|
||||
|
||||
assert result["status"] == "completed"
|
||||
assert "response" in result
|
||||
assert task.status == TaskStatus.COMPLETED
|
||||
```
|
||||
|
||||
#### **2.2 Mixed Agent Type Coordination**
|
||||
```python
|
||||
# File: src/tests/integration/test_mixed_agent_coordination.py
|
||||
class TestMixedAgentCoordination:
|
||||
async def test_ollama_and_cli_agents_together(self, coordinator):
|
||||
"""Test Ollama and CLI agents working together"""
|
||||
# Create tasks for both agent types
|
||||
ollama_task = coordinator.create_task(
|
||||
task_type=AgentType.PYTORCH_DEV,
|
||||
context={"prompt": "Generate Python code"},
|
||||
priority=3
|
||||
)
|
||||
|
||||
cli_task = coordinator.create_task(
|
||||
task_type=AgentType.CLI_GEMINI,
|
||||
context={"prompt": "Analyze this code"},
|
||||
priority=3
|
||||
)
|
||||
|
||||
# Execute tasks concurrently
|
||||
ollama_result, cli_result = await asyncio.gather(
|
||||
coordinator.process_task(ollama_task),
|
||||
coordinator.process_task(cli_task)
|
||||
)
|
||||
|
||||
assert ollama_result["status"] == "completed"
|
||||
assert cli_result["status"] == "completed"
|
||||
```
|
||||
|
||||
#### **2.3 MCP Server CLI Agent Support**
|
||||
```typescript
|
||||
// File: mcp-server/src/tests/integration/test_cli_agent_mcp.test.ts
|
||||
describe('MCP CLI Agent Integration', () => {
|
||||
let whooshTools: WHOOSHTools;
|
||||
|
||||
beforeEach(() => {
|
||||
whooshTools = new WHOOSHTools(mockWHOOSHClient);
|
||||
});
|
||||
|
||||
test('should execute task on CLI agent', async () => {
|
||||
const result = await whooshTools.executeTool('whoosh_create_task', {
|
||||
type: 'cli_gemini',
|
||||
priority: 3,
|
||||
objective: 'Test CLI agent execution'
|
||||
});
|
||||
|
||||
expect(result.isError).toBe(false);
|
||||
expect(result.content[0].text).toContain('Task created successfully');
|
||||
});
|
||||
|
||||
test('should discover both Ollama and CLI agents', async () => {
|
||||
const result = await whooshTools.executeTool('whoosh_get_agents', {});
|
||||
|
||||
expect(result.isError).toBe(false);
|
||||
const agents = JSON.parse(result.content[0].text);
|
||||
|
||||
// Should include both types
|
||||
expect(agents.some(a => a.agent_type === 'ollama')).toBe(true);
|
||||
expect(agents.some(a => a.agent_type === 'cli_gemini')).toBe(true);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### **3. 📊 Performance Tests**
|
||||
|
||||
#### **3.1 Response Time Benchmarking**
|
||||
```bash
|
||||
# File: scripts/benchmark-response-times.sh
|
||||
#!/bin/bash
|
||||
|
||||
echo "🏃 CLI Agent Response Time Benchmarking"
|
||||
|
||||
# Test single task execution times
|
||||
benchmark_single_task() {
|
||||
local agent_type=$1
|
||||
local iterations=10
|
||||
local total_time=0
|
||||
|
||||
echo "Benchmarking $agent_type agent (${iterations} iterations)..."
|
||||
|
||||
for i in $(seq 1 $iterations); do
|
||||
start_time=$(date +%s.%N)
|
||||
|
||||
curl -s -X POST http://localhost:8000/api/tasks \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"agent_type\": \"$agent_type\",
|
||||
\"prompt\": \"What is the capital of France?\",
|
||||
\"priority\": 3
|
||||
}" > /dev/null
|
||||
|
||||
end_time=$(date +%s.%N)
|
||||
duration=$(echo "$end_time - $start_time" | bc)
|
||||
total_time=$(echo "$total_time + $duration" | bc)
|
||||
|
||||
echo "Iteration $i: ${duration}s"
|
||||
done
|
||||
|
||||
average_time=$(echo "scale=2; $total_time / $iterations" | bc)
|
||||
echo "$agent_type average response time: ${average_time}s"
|
||||
}
|
||||
|
||||
# Run benchmarks
|
||||
benchmark_single_task "ollama"
|
||||
benchmark_single_task "cli_gemini"
|
||||
|
||||
# Compare results
|
||||
echo "📊 Performance Comparison Complete"
|
||||
```
|
||||
|
||||
#### **3.2 Concurrent Execution Testing**
|
||||
```python
|
||||
# File: scripts/test_concurrent_execution.py
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import time
|
||||
from typing import List, Tuple
|
||||
|
||||
async def test_concurrent_cli_agents():
|
||||
"""Test concurrent CLI agent execution under load"""
|
||||
|
||||
async def execute_task(session: aiohttp.ClientSession, task_id: int) -> Tuple[int, float, str]:
|
||||
start_time = time.time()
|
||||
|
||||
async with session.post(
|
||||
'http://localhost:8000/api/tasks',
|
||||
json={
|
||||
'agent_type': 'cli_gemini',
|
||||
'prompt': f'Process task {task_id}',
|
||||
'priority': 3
|
||||
}
|
||||
) as response:
|
||||
result = await response.json()
|
||||
duration = time.time() - start_time
|
||||
status = result.get('status', 'unknown')
|
||||
|
||||
return task_id, duration, status
|
||||
|
||||
# Test various concurrency levels
|
||||
concurrency_levels = [1, 2, 4, 8, 16]
|
||||
|
||||
for concurrency in concurrency_levels:
|
||||
print(f"\n🔄 Testing {concurrency} concurrent CLI agent tasks...")
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
tasks = [
|
||||
execute_task(session, i)
|
||||
for i in range(concurrency)
|
||||
]
|
||||
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
# Analyze results
|
||||
successful_tasks = [r for r in results if isinstance(r, tuple) and r[2] == 'completed']
|
||||
failed_tasks = [r for r in results if not isinstance(r, tuple) or r[2] != 'completed']
|
||||
|
||||
if successful_tasks:
|
||||
avg_duration = sum(r[1] for r in successful_tasks) / len(successful_tasks)
|
||||
print(f" ✅ {len(successful_tasks)}/{concurrency} tasks successful")
|
||||
print(f" ⏱️ Average duration: {avg_duration:.2f}s")
|
||||
|
||||
if failed_tasks:
|
||||
print(f" ❌ {len(failed_tasks)} tasks failed")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(test_concurrent_cli_agents())
|
||||
```
|
||||
|
||||
#### **3.3 Resource Usage Monitoring**
|
||||
```python
|
||||
# File: scripts/monitor_resource_usage.py
|
||||
import psutil
|
||||
import time
|
||||
import asyncio
|
||||
from typing import Dict, List
|
||||
|
||||
class ResourceMonitor:
|
||||
def __init__(self):
|
||||
self.baseline_metrics = self.get_system_metrics()
|
||||
|
||||
def get_system_metrics(self) -> Dict:
|
||||
"""Get current system resource usage"""
|
||||
return {
|
||||
'cpu_percent': psutil.cpu_percent(interval=1),
|
||||
'memory_percent': psutil.virtual_memory().percent,
|
||||
'network_io': psutil.net_io_counters(),
|
||||
'ssh_connections': self.count_ssh_connections()
|
||||
}
|
||||
|
||||
def count_ssh_connections(self) -> int:
|
||||
"""Count active SSH connections"""
|
||||
connections = psutil.net_connections()
|
||||
ssh_conns = [c for c in connections if c.laddr and c.laddr.port == 22]
|
||||
return len(ssh_conns)
|
||||
|
||||
async def monitor_during_cli_execution(self, duration_minutes: int = 10):
|
||||
"""Monitor resource usage during CLI agent execution"""
|
||||
print(f"🔍 Monitoring resources for {duration_minutes} minutes...")
|
||||
|
||||
metrics_history = []
|
||||
end_time = time.time() + (duration_minutes * 60)
|
||||
|
||||
while time.time() < end_time:
|
||||
current_metrics = self.get_system_metrics()
|
||||
metrics_history.append({
|
||||
'timestamp': time.time(),
|
||||
**current_metrics
|
||||
})
|
||||
|
||||
print(f"CPU: {current_metrics['cpu_percent']}%, "
|
||||
f"Memory: {current_metrics['memory_percent']}%, "
|
||||
f"SSH Connections: {current_metrics['ssh_connections']}")
|
||||
|
||||
await asyncio.sleep(30) # Sample every 30 seconds
|
||||
|
||||
self.analyze_resource_usage(metrics_history)
|
||||
|
||||
def analyze_resource_usage(self, metrics_history: List[Dict]):
|
||||
"""Analyze resource usage patterns"""
|
||||
if not metrics_history:
|
||||
return
|
||||
|
||||
avg_cpu = sum(m['cpu_percent'] for m in metrics_history) / len(metrics_history)
|
||||
max_cpu = max(m['cpu_percent'] for m in metrics_history)
|
||||
|
||||
avg_memory = sum(m['memory_percent'] for m in metrics_history) / len(metrics_history)
|
||||
max_memory = max(m['memory_percent'] for m in metrics_history)
|
||||
|
||||
max_ssh_conns = max(m['ssh_connections'] for m in metrics_history)
|
||||
|
||||
print(f"\n📊 Resource Usage Analysis:")
|
||||
print(f" CPU - Average: {avg_cpu:.1f}%, Peak: {max_cpu:.1f}%")
|
||||
print(f" Memory - Average: {avg_memory:.1f}%, Peak: {max_memory:.1f}%")
|
||||
print(f" SSH Connections - Peak: {max_ssh_conns}")
|
||||
|
||||
# Check if within acceptable limits
|
||||
if max_cpu > 80:
|
||||
print(" ⚠️ High CPU usage detected")
|
||||
if max_memory > 85:
|
||||
print(" ⚠️ High memory usage detected")
|
||||
if max_ssh_conns > 20:
|
||||
print(" ⚠️ High SSH connection count")
|
||||
```
|
||||
|
||||
### **4. 🔒 Security Tests**
|
||||
|
||||
#### **4.1 SSH Authentication Testing**
|
||||
```python
|
||||
# File: src/tests/security/test_ssh_security.py
|
||||
import pytest
|
||||
from src.executors.ssh_executor import SSHExecutor
|
||||
|
||||
class TestSSHSecurity:
|
||||
async def test_key_based_authentication(self):
|
||||
"""Test SSH key-based authentication"""
|
||||
executor = SSHExecutor()
|
||||
|
||||
# Should succeed with proper key
|
||||
result = await executor.execute("walnut", "echo 'test'")
|
||||
assert result.returncode == 0
|
||||
|
||||
async def test_connection_timeout(self):
|
||||
"""Test SSH connection timeout handling"""
|
||||
executor = SSHExecutor()
|
||||
|
||||
with pytest.raises(Exception, match="timeout"):
|
||||
await executor.execute("invalid-host", "echo 'test'", timeout=5)
|
||||
|
||||
async def test_command_injection_prevention(self):
|
||||
"""Test prevention of command injection"""
|
||||
executor = SSHExecutor()
|
||||
|
||||
# Malicious command should be properly escaped
|
||||
malicious_input = "test'; rm -rf /; echo 'evil"
|
||||
result = await executor.execute("walnut", f"echo '{malicious_input}'")
|
||||
|
||||
# Should not execute the rm command
|
||||
assert "evil" in result.stdout
|
||||
assert result.returncode == 0
|
||||
```
|
||||
|
||||
#### **4.2 Network Security Testing**
|
||||
```bash
|
||||
# File: scripts/test-network-security.sh
|
||||
#!/bin/bash
|
||||
|
||||
echo "🔒 Network Security Testing for CLI Agents"
|
||||
|
||||
# Test SSH connection encryption
|
||||
test_ssh_encryption() {
|
||||
echo "Testing SSH connection encryption..."
|
||||
|
||||
# Capture network traffic during SSH session
|
||||
timeout 10s tcpdump -i any -c 20 port 22 > /tmp/ssh_traffic.log 2>&1 &
|
||||
tcpdump_pid=$!
|
||||
|
||||
# Execute CLI command
|
||||
ssh walnut "echo 'test connection'" > /dev/null 2>&1
|
||||
|
||||
# Stop traffic capture
|
||||
kill $tcpdump_pid 2>/dev/null
|
||||
|
||||
# Verify encrypted traffic (should not contain plaintext)
|
||||
if grep -q "test connection" /tmp/ssh_traffic.log; then
|
||||
echo "❌ SSH traffic appears to be unencrypted"
|
||||
return 1
|
||||
else
|
||||
echo "✅ SSH traffic is properly encrypted"
|
||||
return 0
|
||||
fi
|
||||
}
|
||||
|
||||
# Test connection limits
|
||||
test_connection_limits() {
|
||||
echo "Testing SSH connection limits..."
|
||||
|
||||
# Try to open many connections
|
||||
for i in {1..25}; do
|
||||
ssh -o ConnectTimeout=5 walnut "sleep 1" &
|
||||
done
|
||||
|
||||
wait
|
||||
echo "✅ Connection limit testing completed"
|
||||
}
|
||||
|
||||
# Run security tests
|
||||
test_ssh_encryption
|
||||
test_connection_limits
|
||||
|
||||
echo "🔒 Network security testing completed"
|
||||
```
|
||||
|
||||
### **5. 🚀 Load Tests**
|
||||
|
||||
#### **5.1 Sustained Load Testing**
|
||||
```python
|
||||
# File: scripts/load_test_sustained.py
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import random
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Dict
|
||||
|
||||
@dataclass
|
||||
class LoadTestConfig:
|
||||
duration_minutes: int = 30
|
||||
requests_per_second: int = 2
|
||||
cli_agent_percentage: int = 30 # 30% CLI, 70% Ollama
|
||||
|
||||
class SustainedLoadTester:
|
||||
def __init__(self, config: LoadTestConfig):
|
||||
self.config = config
|
||||
self.results = []
|
||||
|
||||
async def generate_load(self):
|
||||
"""Generate sustained load on the system"""
|
||||
end_time = time.time() + (self.config.duration_minutes * 60)
|
||||
task_counter = 0
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
while time.time() < end_time:
|
||||
# Determine agent type based on percentage
|
||||
use_cli = random.randint(1, 100) <= self.config.cli_agent_percentage
|
||||
agent_type = "cli_gemini" if use_cli else "ollama"
|
||||
|
||||
# Create task
|
||||
task = asyncio.create_task(
|
||||
self.execute_single_request(session, agent_type, task_counter)
|
||||
)
|
||||
|
||||
task_counter += 1
|
||||
|
||||
# Maintain request rate
|
||||
await asyncio.sleep(1.0 / self.config.requests_per_second)
|
||||
|
||||
# Wait for all tasks to complete
|
||||
await asyncio.gather(*asyncio.all_tasks(), return_exceptions=True)
|
||||
|
||||
self.analyze_results()
|
||||
|
||||
async def execute_single_request(self, session: aiohttp.ClientSession,
|
||||
agent_type: str, task_id: int):
|
||||
"""Execute a single request and record metrics"""
|
||||
start_time = time.time()
|
||||
|
||||
try:
|
||||
async with session.post(
|
||||
'http://localhost:8000/api/tasks',
|
||||
json={
|
||||
'agent_type': agent_type,
|
||||
'prompt': f'Load test task {task_id}',
|
||||
'priority': 3
|
||||
},
|
||||
timeout=aiohttp.ClientTimeout(total=60)
|
||||
) as response:
|
||||
result = await response.json()
|
||||
duration = time.time() - start_time
|
||||
|
||||
self.results.append({
|
||||
'task_id': task_id,
|
||||
'agent_type': agent_type,
|
||||
'duration': duration,
|
||||
'status': response.status,
|
||||
'success': response.status == 200
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
duration = time.time() - start_time
|
||||
self.results.append({
|
||||
'task_id': task_id,
|
||||
'agent_type': agent_type,
|
||||
'duration': duration,
|
||||
'status': 0,
|
||||
'success': False,
|
||||
'error': str(e)
|
||||
})
|
||||
|
||||
def analyze_results(self):
|
||||
"""Analyze load test results"""
|
||||
if not self.results:
|
||||
print("No results to analyze")
|
||||
return
|
||||
|
||||
total_requests = len(self.results)
|
||||
successful_requests = sum(1 for r in self.results if r['success'])
|
||||
|
||||
cli_results = [r for r in self.results if r['agent_type'] == 'cli_gemini']
|
||||
ollama_results = [r for r in self.results if r['agent_type'] == 'ollama']
|
||||
|
||||
print(f"\n📊 Load Test Results:")
|
||||
print(f" Total Requests: {total_requests}")
|
||||
print(f" Success Rate: {successful_requests/total_requests*100:.1f}%")
|
||||
|
||||
if cli_results:
|
||||
cli_avg_duration = sum(r['duration'] for r in cli_results) / len(cli_results)
|
||||
cli_success_rate = sum(1 for r in cli_results if r['success']) / len(cli_results)
|
||||
print(f" CLI Agents - Count: {len(cli_results)}, "
|
||||
f"Avg Duration: {cli_avg_duration:.2f}s, "
|
||||
f"Success Rate: {cli_success_rate*100:.1f}%")
|
||||
|
||||
if ollama_results:
|
||||
ollama_avg_duration = sum(r['duration'] for r in ollama_results) / len(ollama_results)
|
||||
ollama_success_rate = sum(1 for r in ollama_results if r['success']) / len(ollama_results)
|
||||
print(f" Ollama Agents - Count: {len(ollama_results)}, "
|
||||
f"Avg Duration: {ollama_avg_duration:.2f}s, "
|
||||
f"Success Rate: {ollama_success_rate*100:.1f}%")
|
||||
|
||||
# Run load test
|
||||
if __name__ == "__main__":
|
||||
config = LoadTestConfig(
|
||||
duration_minutes=30,
|
||||
requests_per_second=2,
|
||||
cli_agent_percentage=30
|
||||
)
|
||||
|
||||
tester = SustainedLoadTester(config)
|
||||
asyncio.run(tester.generate_load())
|
||||
```
|
||||
|
||||
### **6. 🧪 Chaos Testing**
|
||||
|
||||
#### **6.1 Network Interruption Testing**
|
||||
```bash
|
||||
# File: scripts/chaos-test-network.sh
|
||||
#!/bin/bash
|
||||
|
||||
echo "🌪️ Chaos Testing: Network Interruptions"
|
||||
|
||||
# Function to simulate network latency
|
||||
simulate_network_latency() {
|
||||
local target_host=$1
|
||||
local delay_ms=$2
|
||||
local duration_seconds=$3
|
||||
|
||||
echo "Adding ${delay_ms}ms latency to $target_host for ${duration_seconds}s..."
|
||||
|
||||
# Add network delay (requires root/sudo)
|
||||
sudo tc qdisc add dev eth0 root netem delay ${delay_ms}ms
|
||||
|
||||
# Wait for specified duration
|
||||
sleep $duration_seconds
|
||||
|
||||
# Remove network delay
|
||||
sudo tc qdisc del dev eth0 root netem
|
||||
|
||||
echo "Network latency removed"
|
||||
}
|
||||
|
||||
# Function to simulate network packet loss
|
||||
simulate_packet_loss() {
|
||||
local loss_percentage=$1
|
||||
local duration_seconds=$2
|
||||
|
||||
echo "Simulating ${loss_percentage}% packet loss for ${duration_seconds}s..."
|
||||
|
||||
sudo tc qdisc add dev eth0 root netem loss ${loss_percentage}%
|
||||
sleep $duration_seconds
|
||||
sudo tc qdisc del dev eth0 root netem
|
||||
|
||||
echo "Packet loss simulation ended"
|
||||
}
|
||||
|
||||
# Test CLI agent resilience during network issues
|
||||
test_cli_resilience_during_network_chaos() {
|
||||
echo "Testing CLI agent resilience during network chaos..."
|
||||
|
||||
# Start background CLI agent tasks
|
||||
for i in {1..5}; do
|
||||
{
|
||||
curl -X POST http://localhost:8000/api/tasks \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"agent_type\": \"cli_gemini\", \"prompt\": \"Chaos test task $i\"}" \
|
||||
> /tmp/chaos_test_$i.log 2>&1
|
||||
} &
|
||||
done
|
||||
|
||||
# Introduce network chaos
|
||||
sleep 2
|
||||
simulate_network_latency "walnut" 500 10 # 500ms delay for 10 seconds
|
||||
sleep 5
|
||||
simulate_packet_loss 10 5 # 10% packet loss for 5 seconds
|
||||
|
||||
# Wait for all tasks to complete
|
||||
wait
|
||||
|
||||
# Analyze results
|
||||
echo "Analyzing chaos test results..."
|
||||
for i in {1..5}; do
|
||||
if grep -q "completed" /tmp/chaos_test_$i.log; then
|
||||
echo " Task $i: ✅ Completed successfully despite network chaos"
|
||||
else
|
||||
echo " Task $i: ❌ Failed during network chaos"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
# Note: This script requires root privileges for network simulation
|
||||
if [[ $EUID -eq 0 ]]; then
|
||||
test_cli_resilience_during_network_chaos
|
||||
else
|
||||
echo "⚠️ Chaos testing requires root privileges for network simulation"
|
||||
echo "Run with: sudo $0"
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Test Automation & CI/CD Integration
|
||||
|
||||
### **Automated Test Pipeline**
|
||||
```yaml
|
||||
# File: .github/workflows/ccli-tests.yml
|
||||
name: CCLI Integration Tests
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ feature/gemini-cli-integration ]
|
||||
pull_request:
|
||||
branches: [ master ]
|
||||
|
||||
jobs:
|
||||
unit-tests:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: 3.11
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install -r requirements.txt
|
||||
pip install pytest pytest-asyncio pytest-mock
|
||||
- name: Run unit tests
|
||||
run: pytest src/tests/ -v --cov=src --cov-report=xml
|
||||
- name: Upload coverage
|
||||
uses: codecov/codecov-action@v1
|
||||
|
||||
integration-tests:
|
||||
runs-on: ubuntu-latest
|
||||
needs: unit-tests
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Start test environment
|
||||
run: docker-compose -f docker-compose.test.yml up -d
|
||||
- name: Wait for services
|
||||
run: sleep 30
|
||||
- name: Run integration tests
|
||||
run: pytest src/tests/integration/ -v
|
||||
- name: Cleanup
|
||||
run: docker-compose -f docker-compose.test.yml down
|
||||
|
||||
security-tests:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Run security scan
|
||||
run: |
|
||||
pip install bandit safety
|
||||
bandit -r src/
|
||||
safety check
|
||||
```
|
||||
|
||||
### **Test Reporting Dashboard**
|
||||
```python
|
||||
# File: scripts/generate_test_report.py
|
||||
import json
|
||||
import datetime
|
||||
from pathlib import Path
|
||||
|
||||
class TestReportGenerator:
|
||||
def __init__(self):
|
||||
self.results = {
|
||||
'timestamp': datetime.datetime.now().isoformat(),
|
||||
'test_suites': {}
|
||||
}
|
||||
|
||||
def add_test_suite(self, suite_name: str, results: dict):
|
||||
"""Add test suite results to the report"""
|
||||
self.results['test_suites'][suite_name] = {
|
||||
'total_tests': results.get('total', 0),
|
||||
'passed': results.get('passed', 0),
|
||||
'failed': results.get('failed', 0),
|
||||
'success_rate': results.get('passed', 0) / max(results.get('total', 1), 1),
|
||||
'duration': results.get('duration', 0),
|
||||
'details': results.get('details', [])
|
||||
}
|
||||
|
||||
def generate_html_report(self, output_path: str):
|
||||
"""Generate HTML test report"""
|
||||
html_content = self._build_html_report()
|
||||
|
||||
with open(output_path, 'w') as f:
|
||||
f.write(html_content)
|
||||
|
||||
def _build_html_report(self) -> str:
|
||||
"""Build HTML report content"""
|
||||
# HTML report template with test results
|
||||
return f"""
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<title>CCLI Test Report</title>
|
||||
<style>
|
||||
body {{ font-family: Arial, sans-serif; margin: 40px; }}
|
||||
.success {{ color: green; }}
|
||||
.failure {{ color: red; }}
|
||||
.suite {{ margin: 20px 0; padding: 15px; border: 1px solid #ddd; }}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>🧪 CCLI Test Report</h1>
|
||||
<p>Generated: {self.results['timestamp']}</p>
|
||||
{self._generate_suite_summaries()}
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
def _generate_suite_summaries(self) -> str:
|
||||
"""Generate HTML for test suite summaries"""
|
||||
html = ""
|
||||
for suite_name, results in self.results['test_suites'].items():
|
||||
status_class = "success" if results['success_rate'] >= 0.95 else "failure"
|
||||
html += f"""
|
||||
<div class="suite">
|
||||
<h2>{suite_name}</h2>
|
||||
<p class="{status_class}">
|
||||
{results['passed']}/{results['total']} tests passed
|
||||
({results['success_rate']*100:.1f}%)
|
||||
</p>
|
||||
<p>Duration: {results['duration']:.2f}s</p>
|
||||
</div>
|
||||
"""
|
||||
return html
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Success Criteria & Exit Conditions
|
||||
|
||||
### **Test Completion Criteria**
|
||||
- [ ] **Unit Tests**: ≥90% code coverage achieved
|
||||
- [ ] **Integration Tests**: All CLI agent workflows tested successfully
|
||||
- [ ] **Performance Tests**: CLI agents within 150% of Ollama baseline
|
||||
- [ ] **Security Tests**: All SSH connections validated and secure
|
||||
- [ ] **Load Tests**: System stable under 10x normal load
|
||||
- [ ] **Chaos Tests**: System recovers gracefully from network issues
|
||||
|
||||
### **Go/No-Go Decision Points**
|
||||
1. **After Unit Testing**: Proceed if >90% coverage and all tests pass
|
||||
2. **After Integration Testing**: Proceed if CLI agents work with existing system
|
||||
3. **After Performance Testing**: Proceed if performance within acceptable limits
|
||||
4. **After Security Testing**: Proceed if no security vulnerabilities found
|
||||
5. **After Load Testing**: Proceed if system handles production-like load
|
||||
|
||||
### **Rollback Triggers**
|
||||
- Any test category has <80% success rate
|
||||
- CLI agent performance >200% of Ollama baseline
|
||||
- Security vulnerabilities discovered
|
||||
- System instability under normal load
|
||||
- Negative impact on existing Ollama agents
|
||||
|
||||
---
|
||||
|
||||
This comprehensive testing strategy ensures the CLI agent integration is thoroughly validated before production deployment while maintaining the stability and performance of the existing WHOOSH system.
|
||||
Reference in New Issue
Block a user