# 🏗️ WHOOSH-Bzzz Registration Architecture Design Plan ## 🔍 Current Architecture Problems 1. **Static Configuration**: Hardcoded node IPs in `cluster_service.py` 2. **SSH Dependencies**: Requires SSH keys, network access, security risks 3. **Docker Isolation**: Can't SSH from container to host network 4. **No Dynamic Discovery**: Nodes can't join/leave dynamically 5. **Stale Data**: No real-time hardware/status updates ## 🎯 Proposed Architecture: Registration-Based Cluster Similar to Docker Swarm's `docker swarm join` with tokens: ```bash # Bzzz clients register with WHOOSH coordinator WHOOSH_CLUSTER_TOKEN=abc123... WHOOSH_COORDINATOR_URL=https://whoosh.example.com bzzz-client ``` ## 📋 Implementation Plan ### Phase 1: WHOOSH Coordinator Registration System #### 1.1 Database Schema Changes ```sql -- Cluster registration tokens CREATE TABLE cluster_tokens ( id SERIAL PRIMARY KEY, token VARCHAR(64) UNIQUE NOT NULL, description TEXT, created_at TIMESTAMP DEFAULT NOW(), expires_at TIMESTAMP, is_active BOOLEAN DEFAULT true ); -- Registered cluster nodes CREATE TABLE cluster_nodes ( id SERIAL PRIMARY KEY, node_id VARCHAR(64) UNIQUE NOT NULL, hostname VARCHAR(255) NOT NULL, ip_address INET NOT NULL, registration_token VARCHAR(64) REFERENCES cluster_tokens(token), -- Hardware info (reported by client) cpu_info JSONB, memory_info JSONB, gpu_info JSONB, disk_info JSONB, -- Status tracking status VARCHAR(20) DEFAULT 'online', last_heartbeat TIMESTAMP DEFAULT NOW(), first_registered TIMESTAMP DEFAULT NOW(), -- Capabilities services JSONB, -- ollama, docker, etc. capabilities JSONB -- models, tools, etc. ); ``` #### 1.2 Registration API Endpoints ```python # /api/cluster/register (POST) # - Validates token # - Records node hardware info # - Returns node_id and heartbeat interval # /api/cluster/heartbeat (POST) # - Updates last_heartbeat # - Updates current status/metrics # - Returns cluster commands/tasks # /api/cluster/tokens (GET/POST) # - Generate/list/revoke cluster tokens # - Admin endpoint for token management ``` ### Phase 2: Bzzz Client Registration Capability #### 2.1 Environment Variables ```bash WHOOSH_CLUSTER_TOKEN=token_here # Required for registration WHOOSH_COORDINATOR_URL=https://whoosh.local:8000 # WHOOSH API endpoint WHOOSH_NODE_ID=walnut-$(hostname) # Optional: custom node ID WHOOSH_HEARTBEAT_INTERVAL=30 # Seconds between heartbeats ``` #### 2.2 Hardware Detection Module ```python # bzzz/system_info.py def get_system_info(): return { "cpu": detect_cpu(), # lscpu parsing "memory": detect_memory(), # /proc/meminfo "gpu": detect_gpu(), # nvidia-smi, lspci "disk": detect_storage(), # df, lsblk "services": detect_services(), # docker, ollama, etc. "capabilities": detect_models() # available models } ``` #### 2.3 Registration Logic ```python # bzzz/cluster_client.py class WHOOSHClusterClient: def __init__(self): self.token = os.getenv('WHOOSH_CLUSTER_TOKEN') self.coordinator_url = os.getenv('WHOOSH_COORDINATOR_URL') self.node_id = os.getenv('WHOOSH_NODE_ID', f"{socket.gethostname()}-{uuid4()}") async def register(self): """Register with WHOOSH coordinator""" system_info = get_system_info() payload = { "token": self.token, "node_id": self.node_id, "hostname": socket.gethostname(), "ip_address": get_local_ip(), "system_info": system_info } # POST to /api/cluster/register async def heartbeat_loop(self): """Send periodic heartbeats with current status""" while True: current_status = get_current_status() # POST to /api/cluster/heartbeat await asyncio.sleep(self.heartbeat_interval) ``` ### Phase 3: Integration & Migration #### 3.1 Remove Hardcoded Nodes - Delete static `cluster_nodes` dict from `cluster_service.py` - Replace with dynamic database queries - Update all cluster APIs to use registered nodes #### 3.2 Frontend Updates - **Node Management UI**: View/approve/remove registered nodes - **Token Management**: Generate/revoke cluster tokens - **Real-time Status**: Live hardware metrics from heartbeats - **Registration Instructions**: Show token and join commands #### 3.3 Bzzz Client Integration - Add cluster client to Bzzz startup sequence - Environment variable configuration - Graceful handling of registration failures ## 🔄 Registration Flow ```mermaid sequenceDiagram participant B as Bzzz Client participant H as WHOOSH Coordinator participant DB as Database Note over H: Admin generates token H->>DB: INSERT cluster_token Note over B: Start with env vars B->>B: Detect system info B->>H: POST /api/cluster/register H->>DB: Validate token H->>DB: INSERT cluster_node H->>B: Return node_id, heartbeat_interval loop Every 30 seconds B->>B: Get current status B->>H: POST /api/cluster/heartbeat H->>DB: UPDATE last_heartbeat end ``` ## 🔐 Security Considerations 1. **Token-based Auth**: No SSH keys or passwords needed 2. **Token Expiration**: Configurable token lifetimes 3. **IP Validation**: Optional IP whitelist for token usage 4. **TLS Required**: All communication over HTTPS 5. **Token Rotation**: Ability to revoke/regenerate tokens ## ✅ Benefits of New Architecture 1. **Dynamic Discovery**: Nodes self-register, no pre-configuration 2. **Real-time Data**: Live hardware metrics from heartbeats 3. **Security**: No SSH, credential management, or open ports 4. **Scalability**: Works with any number of nodes 5. **Fault Tolerance**: Nodes can rejoin after network issues 6. **Docker Friendly**: No host network access required 7. **Cloud Ready**: Works across NAT, VPCs, different networks ## 🚀 Implementation Priority 1. **High Priority**: Database schema, registration endpoints, basic heartbeat 2. **Medium Priority**: Bzzz client integration, hardware detection 3. **Low Priority**: Advanced UI features, token management UI ## 📝 Implementation Status - [ ] Phase 1.1: Database schema migration - [ ] Phase 1.2: Registration API endpoints - [ ] Phase 2.1: Bzzz environment variable support - [ ] Phase 2.2: System hardware detection module - [ ] Phase 2.3: Registration client logic - [ ] Phase 3.1: Remove hardcoded cluster nodes - [ ] Phase 3.2: Frontend cluster management UI - [ ] Phase 3.3: Full Bzzz integration ## 🔗 Related Files - `/backend/app/services/cluster_service.py` - Current hardcoded implementation - `/backend/app/api/cluster.py` - Cluster API endpoints - `/backend/migrations/` - Database schema changes - `/frontend/src/components/cluster/` - Cluster UI components --- **Created**: 2025-07-31 **Status**: Planning Phase **Priority**: High **Impact**: Solves fundamental hardware detection and cluster management issues