- Migrated from HIVE branding to WHOOSH across all components - Enhanced backend API with new services: AI models, BZZZ integration, templates, members - Added comprehensive testing suite with security, performance, and integration tests - Improved frontend with new components for project setup, AI models, and team management - Updated MCP server implementation with WHOOSH-specific tools and resources - Enhanced deployment configurations with production-ready Docker setups - Added comprehensive documentation and setup guides - Implemented age encryption service and UCXL integration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
221 lines
7.0 KiB
Markdown
221 lines
7.0 KiB
Markdown
# 🏗️ WHOOSH-Bzzz Registration Architecture Design Plan
|
|
|
|
## 🔍 Current Architecture Problems
|
|
|
|
1. **Static Configuration**: Hardcoded node IPs in `cluster_service.py`
|
|
2. **SSH Dependencies**: Requires SSH keys, network access, security risks
|
|
3. **Docker Isolation**: Can't SSH from container to host network
|
|
4. **No Dynamic Discovery**: Nodes can't join/leave dynamically
|
|
5. **Stale Data**: No real-time hardware/status updates
|
|
|
|
## 🎯 Proposed Architecture: Registration-Based Cluster
|
|
|
|
Similar to Docker Swarm's `docker swarm join` with tokens:
|
|
|
|
```bash
|
|
# Bzzz clients register with WHOOSH coordinator
|
|
WHOOSH_CLUSTER_TOKEN=abc123... WHOOSH_COORDINATOR_URL=https://whoosh.example.com bzzz-client
|
|
```
|
|
|
|
## 📋 Implementation Plan
|
|
|
|
### Phase 1: WHOOSH Coordinator Registration System
|
|
|
|
#### 1.1 Database Schema Changes
|
|
```sql
|
|
-- Cluster registration tokens
|
|
CREATE TABLE cluster_tokens (
|
|
id SERIAL PRIMARY KEY,
|
|
token VARCHAR(64) UNIQUE NOT NULL,
|
|
description TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW(),
|
|
expires_at TIMESTAMP,
|
|
is_active BOOLEAN DEFAULT true
|
|
);
|
|
|
|
-- Registered cluster nodes
|
|
CREATE TABLE cluster_nodes (
|
|
id SERIAL PRIMARY KEY,
|
|
node_id VARCHAR(64) UNIQUE NOT NULL,
|
|
hostname VARCHAR(255) NOT NULL,
|
|
ip_address INET NOT NULL,
|
|
registration_token VARCHAR(64) REFERENCES cluster_tokens(token),
|
|
|
|
-- Hardware info (reported by client)
|
|
cpu_info JSONB,
|
|
memory_info JSONB,
|
|
gpu_info JSONB,
|
|
disk_info JSONB,
|
|
|
|
-- Status tracking
|
|
status VARCHAR(20) DEFAULT 'online',
|
|
last_heartbeat TIMESTAMP DEFAULT NOW(),
|
|
first_registered TIMESTAMP DEFAULT NOW(),
|
|
|
|
-- Capabilities
|
|
services JSONB, -- ollama, docker, etc.
|
|
capabilities JSONB -- models, tools, etc.
|
|
);
|
|
```
|
|
|
|
#### 1.2 Registration API Endpoints
|
|
```python
|
|
# /api/cluster/register (POST)
|
|
# - Validates token
|
|
# - Records node hardware info
|
|
# - Returns node_id and heartbeat interval
|
|
|
|
# /api/cluster/heartbeat (POST)
|
|
# - Updates last_heartbeat
|
|
# - Updates current status/metrics
|
|
# - Returns cluster commands/tasks
|
|
|
|
# /api/cluster/tokens (GET/POST)
|
|
# - Generate/list/revoke cluster tokens
|
|
# - Admin endpoint for token management
|
|
```
|
|
|
|
### Phase 2: Bzzz Client Registration Capability
|
|
|
|
#### 2.1 Environment Variables
|
|
```bash
|
|
WHOOSH_CLUSTER_TOKEN=token_here # Required for registration
|
|
WHOOSH_COORDINATOR_URL=https://whoosh.local:8000 # WHOOSH API endpoint
|
|
WHOOSH_NODE_ID=walnut-$(hostname) # Optional: custom node ID
|
|
WHOOSH_HEARTBEAT_INTERVAL=30 # Seconds between heartbeats
|
|
```
|
|
|
|
#### 2.2 Hardware Detection Module
|
|
```python
|
|
# bzzz/system_info.py
|
|
def get_system_info():
|
|
return {
|
|
"cpu": detect_cpu(), # lscpu parsing
|
|
"memory": detect_memory(), # /proc/meminfo
|
|
"gpu": detect_gpu(), # nvidia-smi, lspci
|
|
"disk": detect_storage(), # df, lsblk
|
|
"services": detect_services(), # docker, ollama, etc.
|
|
"capabilities": detect_models() # available models
|
|
}
|
|
```
|
|
|
|
#### 2.3 Registration Logic
|
|
```python
|
|
# bzzz/cluster_client.py
|
|
class WHOOSHClusterClient:
|
|
def __init__(self):
|
|
self.token = os.getenv('WHOOSH_CLUSTER_TOKEN')
|
|
self.coordinator_url = os.getenv('WHOOSH_COORDINATOR_URL')
|
|
self.node_id = os.getenv('WHOOSH_NODE_ID', f"{socket.gethostname()}-{uuid4()}")
|
|
|
|
async def register(self):
|
|
"""Register with WHOOSH coordinator"""
|
|
system_info = get_system_info()
|
|
payload = {
|
|
"token": self.token,
|
|
"node_id": self.node_id,
|
|
"hostname": socket.gethostname(),
|
|
"ip_address": get_local_ip(),
|
|
"system_info": system_info
|
|
}
|
|
# POST to /api/cluster/register
|
|
|
|
async def heartbeat_loop(self):
|
|
"""Send periodic heartbeats with current status"""
|
|
while True:
|
|
current_status = get_current_status()
|
|
# POST to /api/cluster/heartbeat
|
|
await asyncio.sleep(self.heartbeat_interval)
|
|
```
|
|
|
|
### Phase 3: Integration & Migration
|
|
|
|
#### 3.1 Remove Hardcoded Nodes
|
|
- Delete static `cluster_nodes` dict from `cluster_service.py`
|
|
- Replace with dynamic database queries
|
|
- Update all cluster APIs to use registered nodes
|
|
|
|
#### 3.2 Frontend Updates
|
|
- **Node Management UI**: View/approve/remove registered nodes
|
|
- **Token Management**: Generate/revoke cluster tokens
|
|
- **Real-time Status**: Live hardware metrics from heartbeats
|
|
- **Registration Instructions**: Show token and join commands
|
|
|
|
#### 3.3 Bzzz Client Integration
|
|
- Add cluster client to Bzzz startup sequence
|
|
- Environment variable configuration
|
|
- Graceful handling of registration failures
|
|
|
|
## 🔄 Registration Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant B as Bzzz Client
|
|
participant H as WHOOSH Coordinator
|
|
participant DB as Database
|
|
|
|
Note over H: Admin generates token
|
|
H->>DB: INSERT cluster_token
|
|
|
|
Note over B: Start with env vars
|
|
B->>B: Detect system info
|
|
B->>H: POST /api/cluster/register
|
|
H->>DB: Validate token
|
|
H->>DB: INSERT cluster_node
|
|
H->>B: Return node_id, heartbeat_interval
|
|
|
|
loop Every 30 seconds
|
|
B->>B: Get current status
|
|
B->>H: POST /api/cluster/heartbeat
|
|
H->>DB: UPDATE last_heartbeat
|
|
end
|
|
```
|
|
|
|
## 🔐 Security Considerations
|
|
|
|
1. **Token-based Auth**: No SSH keys or passwords needed
|
|
2. **Token Expiration**: Configurable token lifetimes
|
|
3. **IP Validation**: Optional IP whitelist for token usage
|
|
4. **TLS Required**: All communication over HTTPS
|
|
5. **Token Rotation**: Ability to revoke/regenerate tokens
|
|
|
|
## ✅ Benefits of New Architecture
|
|
|
|
1. **Dynamic Discovery**: Nodes self-register, no pre-configuration
|
|
2. **Real-time Data**: Live hardware metrics from heartbeats
|
|
3. **Security**: No SSH, credential management, or open ports
|
|
4. **Scalability**: Works with any number of nodes
|
|
5. **Fault Tolerance**: Nodes can rejoin after network issues
|
|
6. **Docker Friendly**: No host network access required
|
|
7. **Cloud Ready**: Works across NAT, VPCs, different networks
|
|
|
|
## 🚀 Implementation Priority
|
|
|
|
1. **High Priority**: Database schema, registration endpoints, basic heartbeat
|
|
2. **Medium Priority**: Bzzz client integration, hardware detection
|
|
3. **Low Priority**: Advanced UI features, token management UI
|
|
|
|
## 📝 Implementation Status
|
|
|
|
- [ ] Phase 1.1: Database schema migration
|
|
- [ ] Phase 1.2: Registration API endpoints
|
|
- [ ] Phase 2.1: Bzzz environment variable support
|
|
- [ ] Phase 2.2: System hardware detection module
|
|
- [ ] Phase 2.3: Registration client logic
|
|
- [ ] Phase 3.1: Remove hardcoded cluster nodes
|
|
- [ ] Phase 3.2: Frontend cluster management UI
|
|
- [ ] Phase 3.3: Full Bzzz integration
|
|
|
|
## 🔗 Related Files
|
|
|
|
- `/backend/app/services/cluster_service.py` - Current hardcoded implementation
|
|
- `/backend/app/api/cluster.py` - Cluster API endpoints
|
|
- `/backend/migrations/` - Database schema changes
|
|
- `/frontend/src/components/cluster/` - Cluster UI components
|
|
|
|
---
|
|
|
|
**Created**: 2025-07-31
|
|
**Status**: Planning Phase
|
|
**Priority**: High
|
|
**Impact**: Solves fundamental hardware detection and cluster management issues |