Files
hive/planning/WHOOSH_BZZZ_REGISTRATION_ARCHITECTURE.md
anthonyrawlins 268214d971 Major WHOOSH system refactoring and feature enhancements
- Migrated from HIVE branding to WHOOSH across all components
- Enhanced backend API with new services: AI models, BZZZ integration, templates, members
- Added comprehensive testing suite with security, performance, and integration tests
- Improved frontend with new components for project setup, AI models, and team management
- Updated MCP server implementation with WHOOSH-specific tools and resources
- Enhanced deployment configurations with production-ready Docker setups
- Added comprehensive documentation and setup guides
- Implemented age encryption service and UCXL integration

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-27 08:34:48 +10:00

7.0 KiB

🏗️ WHOOSH-Bzzz Registration Architecture Design Plan

🔍 Current Architecture Problems

  1. Static Configuration: Hardcoded node IPs in cluster_service.py
  2. SSH Dependencies: Requires SSH keys, network access, security risks
  3. Docker Isolation: Can't SSH from container to host network
  4. No Dynamic Discovery: Nodes can't join/leave dynamically
  5. Stale Data: No real-time hardware/status updates

🎯 Proposed Architecture: Registration-Based Cluster

Similar to Docker Swarm's docker swarm join with tokens:

# Bzzz clients register with WHOOSH coordinator
WHOOSH_CLUSTER_TOKEN=abc123... WHOOSH_COORDINATOR_URL=https://whoosh.example.com bzzz-client

📋 Implementation Plan

Phase 1: WHOOSH Coordinator Registration System

1.1 Database Schema Changes

-- Cluster registration tokens
CREATE TABLE cluster_tokens (
    id SERIAL PRIMARY KEY,
    token VARCHAR(64) UNIQUE NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    is_active BOOLEAN DEFAULT true
);

-- Registered cluster nodes  
CREATE TABLE cluster_nodes (
    id SERIAL PRIMARY KEY,
    node_id VARCHAR(64) UNIQUE NOT NULL,
    hostname VARCHAR(255) NOT NULL,
    ip_address INET NOT NULL,
    registration_token VARCHAR(64) REFERENCES cluster_tokens(token),
    
    -- Hardware info (reported by client)
    cpu_info JSONB,
    memory_info JSONB, 
    gpu_info JSONB,
    disk_info JSONB,
    
    -- Status tracking
    status VARCHAR(20) DEFAULT 'online',
    last_heartbeat TIMESTAMP DEFAULT NOW(),
    first_registered TIMESTAMP DEFAULT NOW(),
    
    -- Capabilities
    services JSONB, -- ollama, docker, etc.
    capabilities JSONB -- models, tools, etc.
);

1.2 Registration API Endpoints

# /api/cluster/register (POST)
# - Validates token
# - Records node hardware info
# - Returns node_id and heartbeat interval

# /api/cluster/heartbeat (POST)  
# - Updates last_heartbeat
# - Updates current status/metrics
# - Returns cluster commands/tasks

# /api/cluster/tokens (GET/POST)
# - Generate/list/revoke cluster tokens
# - Admin endpoint for token management

Phase 2: Bzzz Client Registration Capability

2.1 Environment Variables

WHOOSH_CLUSTER_TOKEN=token_here       # Required for registration
WHOOSH_COORDINATOR_URL=https://whoosh.local:8000  # WHOOSH API endpoint
WHOOSH_NODE_ID=walnut-$(hostname)     # Optional: custom node ID
WHOOSH_HEARTBEAT_INTERVAL=30          # Seconds between heartbeats

2.2 Hardware Detection Module

# bzzz/system_info.py
def get_system_info():
    return {
        "cpu": detect_cpu(),           # lscpu parsing
        "memory": detect_memory(),     # /proc/meminfo
        "gpu": detect_gpu(),           # nvidia-smi, lspci
        "disk": detect_storage(),      # df, lsblk
        "services": detect_services(), # docker, ollama, etc.
        "capabilities": detect_models() # available models
    }

2.3 Registration Logic

# bzzz/cluster_client.py
class WHOOSHClusterClient:
    def __init__(self):
        self.token = os.getenv('WHOOSH_CLUSTER_TOKEN')
        self.coordinator_url = os.getenv('WHOOSH_COORDINATOR_URL')
        self.node_id = os.getenv('WHOOSH_NODE_ID', f"{socket.gethostname()}-{uuid4()}")
        
    async def register(self):
        """Register with WHOOSH coordinator"""
        system_info = get_system_info()
        payload = {
            "token": self.token,
            "node_id": self.node_id, 
            "hostname": socket.gethostname(),
            "ip_address": get_local_ip(),
            "system_info": system_info
        }
        # POST to /api/cluster/register
        
    async def heartbeat_loop(self):
        """Send periodic heartbeats with current status"""
        while True:
            current_status = get_current_status()
            # POST to /api/cluster/heartbeat
            await asyncio.sleep(self.heartbeat_interval)

Phase 3: Integration & Migration

3.1 Remove Hardcoded Nodes

  • Delete static cluster_nodes dict from cluster_service.py
  • Replace with dynamic database queries
  • Update all cluster APIs to use registered nodes

3.2 Frontend Updates

  • Node Management UI: View/approve/remove registered nodes
  • Token Management: Generate/revoke cluster tokens
  • Real-time Status: Live hardware metrics from heartbeats
  • Registration Instructions: Show token and join commands

3.3 Bzzz Client Integration

  • Add cluster client to Bzzz startup sequence
  • Environment variable configuration
  • Graceful handling of registration failures

🔄 Registration Flow

sequenceDiagram
    participant B as Bzzz Client
    participant H as WHOOSH Coordinator
    participant DB as Database
    
    Note over H: Admin generates token
    H->>DB: INSERT cluster_token
    
    Note over B: Start with env vars
    B->>B: Detect system info
    B->>H: POST /api/cluster/register
    H->>DB: Validate token
    H->>DB: INSERT cluster_node
    H->>B: Return node_id, heartbeat_interval
    
    loop Every 30 seconds
        B->>B: Get current status
        B->>H: POST /api/cluster/heartbeat  
        H->>DB: UPDATE last_heartbeat
    end

🔐 Security Considerations

  1. Token-based Auth: No SSH keys or passwords needed
  2. Token Expiration: Configurable token lifetimes
  3. IP Validation: Optional IP whitelist for token usage
  4. TLS Required: All communication over HTTPS
  5. Token Rotation: Ability to revoke/regenerate tokens

Benefits of New Architecture

  1. Dynamic Discovery: Nodes self-register, no pre-configuration
  2. Real-time Data: Live hardware metrics from heartbeats
  3. Security: No SSH, credential management, or open ports
  4. Scalability: Works with any number of nodes
  5. Fault Tolerance: Nodes can rejoin after network issues
  6. Docker Friendly: No host network access required
  7. Cloud Ready: Works across NAT, VPCs, different networks

🚀 Implementation Priority

  1. High Priority: Database schema, registration endpoints, basic heartbeat
  2. Medium Priority: Bzzz client integration, hardware detection
  3. Low Priority: Advanced UI features, token management UI

📝 Implementation Status

  • Phase 1.1: Database schema migration
  • Phase 1.2: Registration API endpoints
  • Phase 2.1: Bzzz environment variable support
  • Phase 2.2: System hardware detection module
  • Phase 2.3: Registration client logic
  • Phase 3.1: Remove hardcoded cluster nodes
  • Phase 3.2: Frontend cluster management UI
  • Phase 3.3: Full Bzzz integration
  • /backend/app/services/cluster_service.py - Current hardcoded implementation
  • /backend/app/api/cluster.py - Cluster API endpoints
  • /backend/migrations/ - Database schema changes
  • /frontend/src/components/cluster/ - Cluster UI components

Created: 2025-07-31
Status: Planning Phase
Priority: High
Impact: Solves fundamental hardware detection and cluster management issues