- Agent roles integration progress - Various backend and frontend updates - Storybook cache cleanup 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
6.9 KiB
6.9 KiB
🏗️ Hive-Bzzz Registration Architecture Design Plan
🔍 Current Architecture Problems
- Static Configuration: Hardcoded node IPs in
cluster_service.py - SSH Dependencies: Requires SSH keys, network access, security risks
- Docker Isolation: Can't SSH from container to host network
- No Dynamic Discovery: Nodes can't join/leave dynamically
- Stale Data: No real-time hardware/status updates
🎯 Proposed Architecture: Registration-Based Cluster
Similar to Docker Swarm's docker swarm join with tokens:
# Bzzz clients register with Hive coordinator
HIVE_CLUSTER_TOKEN=abc123... HIVE_COORDINATOR_URL=https://hive.example.com bzzz-client
📋 Implementation Plan
Phase 1: Hive Coordinator Registration System
1.1 Database Schema Changes
-- Cluster registration tokens
CREATE TABLE cluster_tokens (
id SERIAL PRIMARY KEY,
token VARCHAR(64) UNIQUE NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
is_active BOOLEAN DEFAULT true
);
-- Registered cluster nodes
CREATE TABLE cluster_nodes (
id SERIAL PRIMARY KEY,
node_id VARCHAR(64) UNIQUE NOT NULL,
hostname VARCHAR(255) NOT NULL,
ip_address INET NOT NULL,
registration_token VARCHAR(64) REFERENCES cluster_tokens(token),
-- Hardware info (reported by client)
cpu_info JSONB,
memory_info JSONB,
gpu_info JSONB,
disk_info JSONB,
-- Status tracking
status VARCHAR(20) DEFAULT 'online',
last_heartbeat TIMESTAMP DEFAULT NOW(),
first_registered TIMESTAMP DEFAULT NOW(),
-- Capabilities
services JSONB, -- ollama, docker, etc.
capabilities JSONB -- models, tools, etc.
);
1.2 Registration API Endpoints
# /api/cluster/register (POST)
# - Validates token
# - Records node hardware info
# - Returns node_id and heartbeat interval
# /api/cluster/heartbeat (POST)
# - Updates last_heartbeat
# - Updates current status/metrics
# - Returns cluster commands/tasks
# /api/cluster/tokens (GET/POST)
# - Generate/list/revoke cluster tokens
# - Admin endpoint for token management
Phase 2: Bzzz Client Registration Capability
2.1 Environment Variables
HIVE_CLUSTER_TOKEN=token_here # Required for registration
HIVE_COORDINATOR_URL=https://hive.local:8000 # Hive API endpoint
HIVE_NODE_ID=walnut-$(hostname) # Optional: custom node ID
HIVE_HEARTBEAT_INTERVAL=30 # Seconds between heartbeats
2.2 Hardware Detection Module
# bzzz/system_info.py
def get_system_info():
return {
"cpu": detect_cpu(), # lscpu parsing
"memory": detect_memory(), # /proc/meminfo
"gpu": detect_gpu(), # nvidia-smi, lspci
"disk": detect_storage(), # df, lsblk
"services": detect_services(), # docker, ollama, etc.
"capabilities": detect_models() # available models
}
2.3 Registration Logic
# bzzz/cluster_client.py
class HiveClusterClient:
def __init__(self):
self.token = os.getenv('HIVE_CLUSTER_TOKEN')
self.coordinator_url = os.getenv('HIVE_COORDINATOR_URL')
self.node_id = os.getenv('HIVE_NODE_ID', f"{socket.gethostname()}-{uuid4()}")
async def register(self):
"""Register with Hive coordinator"""
system_info = get_system_info()
payload = {
"token": self.token,
"node_id": self.node_id,
"hostname": socket.gethostname(),
"ip_address": get_local_ip(),
"system_info": system_info
}
# POST to /api/cluster/register
async def heartbeat_loop(self):
"""Send periodic heartbeats with current status"""
while True:
current_status = get_current_status()
# POST to /api/cluster/heartbeat
await asyncio.sleep(self.heartbeat_interval)
Phase 3: Integration & Migration
3.1 Remove Hardcoded Nodes
- Delete static
cluster_nodesdict fromcluster_service.py - Replace with dynamic database queries
- Update all cluster APIs to use registered nodes
3.2 Frontend Updates
- Node Management UI: View/approve/remove registered nodes
- Token Management: Generate/revoke cluster tokens
- Real-time Status: Live hardware metrics from heartbeats
- Registration Instructions: Show token and join commands
3.3 Bzzz Client Integration
- Add cluster client to Bzzz startup sequence
- Environment variable configuration
- Graceful handling of registration failures
🔄 Registration Flow
sequenceDiagram
participant B as Bzzz Client
participant H as Hive Coordinator
participant DB as Database
Note over H: Admin generates token
H->>DB: INSERT cluster_token
Note over B: Start with env vars
B->>B: Detect system info
B->>H: POST /api/cluster/register
H->>DB: Validate token
H->>DB: INSERT cluster_node
H->>B: Return node_id, heartbeat_interval
loop Every 30 seconds
B->>B: Get current status
B->>H: POST /api/cluster/heartbeat
H->>DB: UPDATE last_heartbeat
end
🔐 Security Considerations
- Token-based Auth: No SSH keys or passwords needed
- Token Expiration: Configurable token lifetimes
- IP Validation: Optional IP whitelist for token usage
- TLS Required: All communication over HTTPS
- Token Rotation: Ability to revoke/regenerate tokens
✅ Benefits of New Architecture
- Dynamic Discovery: Nodes self-register, no pre-configuration
- Real-time Data: Live hardware metrics from heartbeats
- Security: No SSH, credential management, or open ports
- Scalability: Works with any number of nodes
- Fault Tolerance: Nodes can rejoin after network issues
- Docker Friendly: No host network access required
- Cloud Ready: Works across NAT, VPCs, different networks
🚀 Implementation Priority
- High Priority: Database schema, registration endpoints, basic heartbeat
- Medium Priority: Bzzz client integration, hardware detection
- Low Priority: Advanced UI features, token management UI
📝 Implementation Status
- Phase 1.1: Database schema migration
- Phase 1.2: Registration API endpoints
- Phase 2.1: Bzzz environment variable support
- Phase 2.2: System hardware detection module
- Phase 2.3: Registration client logic
- Phase 3.1: Remove hardcoded cluster nodes
- Phase 3.2: Frontend cluster management UI
- Phase 3.3: Full Bzzz integration
🔗 Related Files
/backend/app/services/cluster_service.py- Current hardcoded implementation/backend/app/api/cluster.py- Cluster API endpoints/backend/migrations/- Database schema changes/frontend/src/components/cluster/- Cluster UI components
Created: 2025-07-31
Status: Planning Phase
Priority: High
Impact: Solves fundamental hardware detection and cluster management issues