Initial commit: Complete Hive distributed AI orchestration platform

This comprehensive implementation includes: - FastAPI backend with MCP server integration - React/TypeScript frontend with Vite - PostgreSQL database with Redis caching - Grafana/Prometheus monitoring stack - Docker Compose orchestration - Full MCP protocol support for Claude Code integration Features: - Agent discovery and management across network - Visual workflow editor and execution engine - Real-time task coordination and monitoring - Multi-model support with specialized agents - Distributed development task allocation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-07 21:44:31 +10:00
commit d7ad321176
2631 changed files with 870175 additions and 0 deletions
--- a/PROJECT_PLAN.md
+++ b/PROJECT_PLAN.md
@@ -0,0 +1,499 @@
+# 🐝 Hive: Unified Distributed AI Orchestration Platform
+
+## Project Overview
+
+**Hive** is a comprehensive distributed AI orchestration platform that consolidates the best components from our distributed AI development ecosystem into a single, powerful system for coordinating AI agents, managing workflows, and monitoring cluster performance.
+
+## 🎯 Vision Statement
+
+Create a unified platform that combines:
+- **Distributed AI Development** coordination and monitoring
+- **Visual Workflow Orchestration** with n8n compatibility  
+- **Multi-Agent Task Distribution** across specialized AI agents
+- **Real-time Performance Monitoring** and alerting
+- **MCP Integration** for standardized AI tool protocols
+
+## 🏗️ System Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        HIVE ORCHESTRATOR                        │
+├─────────────────────────────────────────────────────────────────┤
+│  Frontend Dashboard (React + TypeScript)                       │
+│  ├── 🎛️  Agent Management & Monitoring                         │
+│  ├── 🎨  Visual Workflow Editor (n8n-compatible)               │
+│  ├── 📊  Real-time Performance Dashboard                       │
+│  ├── 📋  Task Queue & Project Management                       │
+│  └── ⚙️   System Configuration & Settings                      │
+├─────────────────────────────────────────────────────────────────┤
+│  Backend Services (FastAPI + Python)                           │
+│  ├── 🧠  Hive Coordinator (unified orchestration)              │
+│  ├── 🔄  Workflow Engine (n8n + MCP bridge)                    │
+│  ├── 📡  Agent Communication (compressed protocols)            │
+│  ├── 📈  Performance Monitor (metrics & alerts)                │
+│  ├── 🔒  Authentication & Authorization                        │
+│  └── 💾  Data Storage (workflows, configs, metrics)           │
+├─────────────────────────────────────────────────────────────────┤
+│  Agent Network (Ollama + Specialized Models)                   │
+│  ├── 🏗️  ACACIA (Infrastructure & DevOps)                     │
+│  ├── 🌐  WALNUT (Full-Stack Development)                       │
+│  ├── ⚙️   IRONWOOD (Backend & Optimization)                    │
+│  └── 🔌  [Expandable Agent Pool]                               │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## 📦 Component Integration Plan
+
+### 🔧 **Core Components from Existing Projects**
+
+#### **1. From distributed-ai-dev**
+- **AIDevCoordinator**: Task orchestration and agent management
+- **Agent Configuration**: YAML-based agent profiles and capabilities
+- **Performance Monitoring**: Real-time metrics and GPU monitoring
+- **Claudette Compression**: Efficient agent communication protocols
+- **Quality Control**: Multi-agent code review and validation
+
+#### **2. From McPlan**
+- **Visual Workflow Editor**: React Flow-based n8n-compatible designer
+- **Execution Engine**: Real-time workflow execution with progress tracking
+- **WebSocket Infrastructure**: Live updates and monitoring
+- **MCP Bridge**: n8n workflow → MCP tool conversion
+- **Database Models**: Workflow storage and execution history
+
+#### **3. From Cluster Monitoring**
+- **Hardware Abstraction**: Multi-GPU support and hardware profiling
+- **Alert System**: Configurable alerts with severity levels
+- **Dashboard Components**: React-based monitoring interfaces
+- **Time-series Storage**: Performance data retention and analysis
+
+#### **4. From n8n-integration**
+- **Workflow Patterns**: Proven n8n integration examples
+- **Model Registry**: 28+ available models across cluster endpoints
+- **Protocol Standards**: Established communication patterns
+
+### 🚀 **Unified Architecture Components**
+
+#### **1. Hive Coordinator Service**
+```python
+class HiveCoordinator:
+    """
+    Unified orchestration engine combining:
+    - Agent coordination and task distribution
+    - Workflow execution management  
+    - Real-time monitoring and alerting
+    - MCP server integration
+    """
+    
+    # Core Services
+    agent_manager: AgentManager
+    workflow_engine: WorkflowEngine
+    performance_monitor: PerformanceMonitor
+    mcp_bridge: MCPBridge
+    
+    # API Interfaces
+    rest_api: FastAPI
+    websocket_manager: WebSocketManager
+    
+    # Configuration
+    config: HiveConfig
+    database: HiveDatabase
+```
+
+#### **2. Database Schema Integration**
+```sql
+-- Agent Management (enhanced from distributed-ai-dev)
+agents (id, name, endpoint, specialization, capabilities, hardware_config)
+agent_metrics (agent_id, timestamp, performance_data, gpu_metrics)
+agent_capabilities (agent_id, capability, proficiency_score)
+
+-- Workflow Management (from McPlan)
+workflows (id, name, n8n_data, mcp_tools, created_by, version)
+executions (id, workflow_id, status, input_data, output_data, logs)
+execution_steps (execution_id, step_index, node_id, status, timing)
+
+-- Task Coordination (enhanced)
+tasks (id, title, description, priority, assigned_agent, status)
+task_dependencies (task_id, depends_on_task_id)
+projects (id, name, description, task_template, agent_assignments)
+
+-- System Management
+users (id, email, role, preferences, api_keys)
+alerts (id, type, severity, message, resolved, timestamp)
+system_config (key, value, category, description)
+```
+
+#### **3. Frontend Component Architecture**
+```typescript
+// Unified Dashboard Structure
+src/
+├── components/
+│   ├── dashboard/
+│   │   ├── AgentMonitor.tsx        // Real-time agent status
+│   │   ├── PerformanceDashboard.tsx // System metrics
+│   │   └── SystemAlerts.tsx        // Alert management
+│   ├── workflows/
+│   │   ├── WorkflowEditor.tsx      // Visual n8n editor
+│   │   ├── ExecutionMonitor.tsx    // Real-time execution
+│   │   └── WorkflowLibrary.tsx     // Workflow management
+│   ├── agents/
+│   │   ├── AgentManager.tsx        // Agent configuration
+│   │   ├── TaskQueue.tsx           // Task assignment
+│   │   └── CapabilityMatrix.tsx    // Skills management
+│   └── projects/
+│       ├── ProjectDashboard.tsx    // Project overview
+│       ├── TaskManagement.tsx      // Task coordination
+│       └── QualityControl.tsx      // Code review
+├── stores/
+│   ├── hiveStore.ts               // Global state management
+│   ├── agentStore.ts              // Agent-specific state
+│   ├── workflowStore.ts           // Workflow state
+│   └── performanceStore.ts        // Metrics state
+└── services/
+    ├── api.ts                     // REST API client
+    ├── websocket.ts               // Real-time updates
+    └── config.ts                  // Configuration management
+```
+
+#### **4. Configuration System**
+```yaml
+# hive.yaml - Unified Configuration
+hive:
+  cluster:
+    name: "Development Cluster"
+    region: "home.deepblack.cloud"
+    
+  agents:
+    acacia:
+      name: "ACACIA Infrastructure Specialist"
+      endpoint: "http://192.168.1.72:11434"
+      model: "deepseek-r1:7b"
+      specialization: "infrastructure"
+      capabilities: ["devops", "architecture", "deployment"]
+      hardware:
+        gpu_type: "AMD Radeon RX 7900 XTX"
+        vram_gb: 24
+        cpu_cores: 16
+      performance_targets:
+        min_tps: 15
+        max_response_time: 30
+        
+    walnut:
+      name: "WALNUT Full-Stack Developer"
+      endpoint: "http://192.168.1.27:11434"
+      model: "starcoder2:15b"
+      specialization: "full-stack"
+      capabilities: ["frontend", "backend", "ui-design"]
+      hardware:
+        gpu_type: "NVIDIA RTX 4090"
+        vram_gb: 24
+        cpu_cores: 12
+      performance_targets:
+        min_tps: 20
+        max_response_time: 25
+        
+    ironwood:
+      name: "IRONWOOD Backend Specialist"
+      endpoint: "http://192.168.1.113:11434"
+      model: "deepseek-coder-v2"
+      specialization: "backend"
+      capabilities: ["optimization", "databases", "apis"]
+      hardware:
+        gpu_type: "NVIDIA RTX 4080"
+        vram_gb: 16
+        cpu_cores: 8
+      performance_targets:
+        min_tps: 18
+        max_response_time: 35
+
+  workflows:
+    templates:
+      web_development:
+        agents: ["walnut", "ironwood"]
+        stages: ["planning", "frontend", "backend", "integration", "testing"]
+      infrastructure:
+        agents: ["acacia", "ironwood"]
+        stages: ["design", "provisioning", "deployment", "monitoring"]
+        
+  monitoring:
+    metrics_retention_days: 30
+    alert_thresholds:
+      cpu_usage: 85
+      memory_usage: 90
+      gpu_usage: 95
+      response_time: 60
+    health_check_interval: 30
+    
+  mcp_servers:
+    registry:
+      comfyui: "ws://localhost:8188/api/mcp"
+      code_review: "http://localhost:8000/mcp"
+      
+  security:
+    require_approval: true
+    api_rate_limit: 100
+    session_timeout: 3600
+```
+
+## 🗂️ Project Structure
+
+```
+hive/
+├── 📋 PROJECT_PLAN.md              # This document
+├── 🚀 DEPLOYMENT.md                # Infrastructure deployment guide
+├── 🔧 DEVELOPMENT.md               # Development setup and guidelines
+├── 📊 ARCHITECTURE.md              # Detailed technical architecture
+│
+├── backend/                        # Python FastAPI backend
+│   ├── app/
+│   │   ├── core/                   # Core services
+│   │   │   ├── hive_coordinator.py # Main orchestration engine
+│   │   │   ├── agent_manager.py    # Agent lifecycle management
+│   │   │   ├── workflow_engine.py  # n8n workflow execution
+│   │   │   ├── mcp_bridge.py       # MCP protocol integration
+│   │   │   └── performance_monitor.py # Metrics and alerting
+│   │   ├── api/                    # REST API endpoints
+│   │   │   ├── agents.py           # Agent management API
+│   │   │   ├── workflows.py        # Workflow API
+│   │   │   ├── executions.py       # Execution API
+│   │   │   ├── monitoring.py       # Metrics API
+│   │   │   └── projects.py         # Project management API
+│   │   ├── models/                 # Database models
+│   │   │   ├── agent.py
+│   │   │   ├── workflow.py
+│   │   │   ├── execution.py
+│   │   │   ├── task.py
+│   │   │   └── user.py
+│   │   ├── services/               # Business logic
+│   │   └── utils/                  # Helper functions
+│   ├── migrations/                 # Database migrations
+│   ├── tests/                      # Backend tests
+│   └── requirements.txt
+│
+├── frontend/                       # React TypeScript frontend
+│   ├── src/
+│   │   ├── components/             # React components
+│   │   ├── stores/                 # State management
+│   │   ├── services/               # API clients
+│   │   ├── types/                  # TypeScript definitions
+│   │   ├── hooks/                  # Custom React hooks
+│   │   └── utils/                  # Helper functions
+│   ├── public/
+│   ├── package.json
+│   └── vite.config.ts
+│
+├── config/                         # Configuration files
+│   ├── hive.yaml                   # Main configuration
+│   ├── agents/                     # Agent-specific configs
+│   ├── workflows/                  # Workflow templates
+│   └── monitoring/                 # Monitoring configs
+│
+├── scripts/                        # Utility scripts
+│   ├── setup.sh                    # Initial setup
+│   ├── deploy.sh                   # Deployment automation
+│   ├── migrate.py                  # Data migration from existing projects
+│   └── health_check.py             # System health validation
+│
+├── docker/                         # Container configuration
+│   ├── docker-compose.yml          # Development environment
+│   ├── docker-compose.prod.yml     # Production deployment
+│   ├── Dockerfile.backend
+│   ├── Dockerfile.frontend
+│   └── nginx.conf                  # Reverse proxy config
+│
+├── docs/                           # Documentation
+│   ├── api/                        # API documentation
+│   ├── user-guide/                 # User documentation
+│   ├── admin-guide/                # Administration guide
+│   └── developer-guide/            # Development documentation
+│
+└── tests/                          # Integration tests
+    ├── e2e/                        # End-to-end tests
+    ├── integration/                # Integration tests
+    └── performance/                # Performance tests
+```
+
+## 🔄 Migration Strategy
+
+### **Phase 1: Foundation (Week 1-2)**
+1. **Project Setup**
+   - Create unified project structure
+   - Set up development environment
+   - Initialize database schema
+   - Configure CI/CD pipeline
+
+2. **Core Integration**
+   - Merge AIDevCoordinator and McPlan execution engine
+   - Unify configuration systems (YAML + database)
+   - Integrate authentication systems
+   - Set up basic API endpoints
+
+### **Phase 2: Backend Services (Week 3-4)**
+1. **Agent Management**
+   - Implement unified agent registration and discovery
+   - Migrate agent hardware profiling and monitoring
+   - Add capability-based task assignment
+   - Integrate performance metrics collection
+
+2. **Workflow Engine**
+   - Port n8n workflow parsing and execution
+   - Implement MCP bridge functionality
+   - Add real-time execution monitoring
+   - Create workflow template system
+
+### **Phase 3: Frontend Development (Week 5-6)**
+1. **Dashboard Integration**
+   - Merge monitoring dashboards from both projects
+   - Create unified navigation and layout
+   - Implement real-time WebSocket updates
+   - Add responsive design for mobile access
+
+2. **Workflow Editor**
+   - Port React Flow visual editor
+   - Enhance with Hive-specific features
+   - Add template library and sharing
+   - Implement collaborative editing
+
+### **Phase 4: Advanced Features (Week 7-8)**
+1. **Quality Control**
+   - Implement multi-agent code review
+   - Add automated testing coordination
+   - Create approval workflow system
+   - Integrate security scanning
+
+2. **Performance Optimization**
+   - Add intelligent load balancing
+   - Implement caching strategies
+   - Optimize database queries
+   - Add performance analytics
+
+### **Phase 5: Production Deployment (Week 9-10)**
+1. **Infrastructure**
+   - Set up Docker Swarm deployment
+   - Configure SSL/TLS and domain routing
+   - Implement backup and recovery
+   - Add monitoring and alerting
+
+2. **Documentation & Training**
+   - Complete user documentation
+   - Create admin guides
+   - Record demo videos
+   - Conduct user training
+
+## 🎯 Success Metrics
+
+### **Technical Metrics**
+- **Agent Utilization**: >80% average utilization across cluster
+- **Response Time**: <30 seconds average for workflow execution
+- **Throughput**: >50 concurrent task executions
+- **Uptime**: 99.9% system availability
+- **Performance**: <2 second UI response time
+
+### **User Experience Metrics**
+- **Workflow Creation**: <5 minutes to create and deploy simple workflow
+- **Agent Discovery**: Automatic agent health detection within 30 seconds
+- **Error Recovery**: <1 minute mean time to recovery
+- **Learning Curve**: <2 hours for new user onboarding
+
+### **Business Metrics**
+- **Development Velocity**: 50% reduction in multi-agent coordination time
+- **Code Quality**: 90% automated test coverage
+- **Scalability**: Support for 10+ concurrent projects
+- **Maintainability**: <24 hours for feature additions
+
+## 🔧 Technology Stack
+
+### **Backend**
+- **Framework**: FastAPI + Python 3.11+
+- **Database**: PostgreSQL + Redis (caching)
+- **Message Queue**: Redis + Celery
+- **Monitoring**: Prometheus + Grafana
+- **Documentation**: OpenAPI/Swagger
+
+### **Frontend**
+- **Framework**: React 18 + TypeScript
+- **UI Library**: Tailwind CSS + Headless UI
+- **State Management**: Zustand + React Query
+- **Visualization**: React Flow + D3.js
+- **Build Tool**: Vite
+
+### **Infrastructure**
+- **Containers**: Docker + Docker Swarm
+- **Reverse Proxy**: Traefik v3
+- **SSL/TLS**: Let's Encrypt
+- **Storage**: NFS + PostgreSQL
+- **Monitoring**: Grafana + Prometheus
+
+### **Development**
+- **Version Control**: Git + GitLab
+- **CI/CD**: GitLab CI + Docker Registry
+- **Testing**: pytest + Jest + Playwright
+- **Code Quality**: Black + ESLint + TypeScript
+
+## 🚀 Quick Start Guide
+
+### **Development Setup**
+```bash
+# Clone and setup
+git clone <hive-repo>
+cd hive
+
+# Start development environment
+./scripts/setup.sh
+docker-compose up -d
+
+# Access services
+# Frontend: http://localhost:3000
+# Backend API: http://localhost:8000
+# Documentation: http://localhost:8000/docs
+```
+
+### **Production Deployment**
+```bash
+# Deploy to Docker Swarm
+./scripts/deploy.sh production
+
+# Access production services
+# Web Interface: https://hive.home.deepblack.cloud
+# API: https://hive.home.deepblack.cloud/api
+# Monitoring: https://grafana.home.deepblack.cloud
+```
+
+## 🔮 Future Enhancements
+
+### **Phase 6: Advanced AI Integration (Month 3-4)**
+- **Multi-modal AI**: Image, audio, and video processing
+- **Fine-tuning Pipeline**: Custom model training coordination
+- **Model Registry**: Centralized model management and versioning
+- **A/B Testing**: Automated model comparison and selection
+
+### **Phase 7: Enterprise Features (Month 5-6)**
+- **Multi-tenancy**: Organization and team isolation
+- **RBAC**: Role-based access control with LDAP integration
+- **Audit Logging**: Comprehensive activity tracking
+- **Compliance**: SOC2, GDPR compliance features
+
+### **Phase 8: Ecosystem Integration (Month 7-8)**
+- **Cloud Providers**: AWS, GCP, Azure integration
+- **CI/CD Integration**: GitHub Actions, Jenkins plugins
+- **API Gateway**: External API management and rate limiting
+- **Marketplace**: Community workflow and agent sharing
+
+## 📞 Support and Community
+
+### **Documentation**
+- **User Guide**: Step-by-step tutorials and examples
+- **API Reference**: Complete API documentation with examples
+- **Admin Guide**: Deployment, configuration, and maintenance
+- **Developer Guide**: Contributing, architecture, and extensions
+
+### **Community**
+- **Discord**: Real-time support and discussions
+- **GitHub**: Issue tracking and feature requests
+- **Wiki**: Community-contributed documentation
+- **Newsletter**: Monthly updates and best practices
+
+---
+
+**Hive represents the culmination of our distributed AI development efforts, providing a unified, scalable, and user-friendly platform for coordinating AI agents, managing workflows, and monitoring performance across our entire infrastructure.**
+
+🐝 *"Individual agents are strong, but the Hive is unstoppable."*