Major WHOOSH system refactoring and feature enhancements
- Migrated from HIVE branding to WHOOSH across all components - Enhanced backend API with new services: AI models, BZZZ integration, templates, members - Added comprehensive testing suite with security, performance, and integration tests - Improved frontend with new components for project setup, AI models, and team management - Updated MCP server implementation with WHOOSH-specific tools and resources - Enhanced deployment configurations with production-ready Docker setups - Added comprehensive documentation and setup guides - Implemented age encryption service and UCXL integration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
221
planning/WHOOSH_BZZZ_REGISTRATION_ARCHITECTURE.md
Normal file
221
planning/WHOOSH_BZZZ_REGISTRATION_ARCHITECTURE.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# 🏗️ WHOOSH-Bzzz Registration Architecture Design Plan
|
||||
|
||||
## 🔍 Current Architecture Problems
|
||||
|
||||
1. **Static Configuration**: Hardcoded node IPs in `cluster_service.py`
|
||||
2. **SSH Dependencies**: Requires SSH keys, network access, security risks
|
||||
3. **Docker Isolation**: Can't SSH from container to host network
|
||||
4. **No Dynamic Discovery**: Nodes can't join/leave dynamically
|
||||
5. **Stale Data**: No real-time hardware/status updates
|
||||
|
||||
## 🎯 Proposed Architecture: Registration-Based Cluster
|
||||
|
||||
Similar to Docker Swarm's `docker swarm join` with tokens:
|
||||
|
||||
```bash
|
||||
# Bzzz clients register with WHOOSH coordinator
|
||||
WHOOSH_CLUSTER_TOKEN=abc123... WHOOSH_COORDINATOR_URL=https://whoosh.example.com bzzz-client
|
||||
```
|
||||
|
||||
## 📋 Implementation Plan
|
||||
|
||||
### Phase 1: WHOOSH Coordinator Registration System
|
||||
|
||||
#### 1.1 Database Schema Changes
|
||||
```sql
|
||||
-- Cluster registration tokens
|
||||
CREATE TABLE cluster_tokens (
|
||||
id SERIAL PRIMARY KEY,
|
||||
token VARCHAR(64) UNIQUE NOT NULL,
|
||||
description TEXT,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
expires_at TIMESTAMP,
|
||||
is_active BOOLEAN DEFAULT true
|
||||
);
|
||||
|
||||
-- Registered cluster nodes
|
||||
CREATE TABLE cluster_nodes (
|
||||
id SERIAL PRIMARY KEY,
|
||||
node_id VARCHAR(64) UNIQUE NOT NULL,
|
||||
hostname VARCHAR(255) NOT NULL,
|
||||
ip_address INET NOT NULL,
|
||||
registration_token VARCHAR(64) REFERENCES cluster_tokens(token),
|
||||
|
||||
-- Hardware info (reported by client)
|
||||
cpu_info JSONB,
|
||||
memory_info JSONB,
|
||||
gpu_info JSONB,
|
||||
disk_info JSONB,
|
||||
|
||||
-- Status tracking
|
||||
status VARCHAR(20) DEFAULT 'online',
|
||||
last_heartbeat TIMESTAMP DEFAULT NOW(),
|
||||
first_registered TIMESTAMP DEFAULT NOW(),
|
||||
|
||||
-- Capabilities
|
||||
services JSONB, -- ollama, docker, etc.
|
||||
capabilities JSONB -- models, tools, etc.
|
||||
);
|
||||
```
|
||||
|
||||
#### 1.2 Registration API Endpoints
|
||||
```python
|
||||
# /api/cluster/register (POST)
|
||||
# - Validates token
|
||||
# - Records node hardware info
|
||||
# - Returns node_id and heartbeat interval
|
||||
|
||||
# /api/cluster/heartbeat (POST)
|
||||
# - Updates last_heartbeat
|
||||
# - Updates current status/metrics
|
||||
# - Returns cluster commands/tasks
|
||||
|
||||
# /api/cluster/tokens (GET/POST)
|
||||
# - Generate/list/revoke cluster tokens
|
||||
# - Admin endpoint for token management
|
||||
```
|
||||
|
||||
### Phase 2: Bzzz Client Registration Capability
|
||||
|
||||
#### 2.1 Environment Variables
|
||||
```bash
|
||||
WHOOSH_CLUSTER_TOKEN=token_here # Required for registration
|
||||
WHOOSH_COORDINATOR_URL=https://whoosh.local:8000 # WHOOSH API endpoint
|
||||
WHOOSH_NODE_ID=walnut-$(hostname) # Optional: custom node ID
|
||||
WHOOSH_HEARTBEAT_INTERVAL=30 # Seconds between heartbeats
|
||||
```
|
||||
|
||||
#### 2.2 Hardware Detection Module
|
||||
```python
|
||||
# bzzz/system_info.py
|
||||
def get_system_info():
|
||||
return {
|
||||
"cpu": detect_cpu(), # lscpu parsing
|
||||
"memory": detect_memory(), # /proc/meminfo
|
||||
"gpu": detect_gpu(), # nvidia-smi, lspci
|
||||
"disk": detect_storage(), # df, lsblk
|
||||
"services": detect_services(), # docker, ollama, etc.
|
||||
"capabilities": detect_models() # available models
|
||||
}
|
||||
```
|
||||
|
||||
#### 2.3 Registration Logic
|
||||
```python
|
||||
# bzzz/cluster_client.py
|
||||
class WHOOSHClusterClient:
|
||||
def __init__(self):
|
||||
self.token = os.getenv('WHOOSH_CLUSTER_TOKEN')
|
||||
self.coordinator_url = os.getenv('WHOOSH_COORDINATOR_URL')
|
||||
self.node_id = os.getenv('WHOOSH_NODE_ID', f"{socket.gethostname()}-{uuid4()}")
|
||||
|
||||
async def register(self):
|
||||
"""Register with WHOOSH coordinator"""
|
||||
system_info = get_system_info()
|
||||
payload = {
|
||||
"token": self.token,
|
||||
"node_id": self.node_id,
|
||||
"hostname": socket.gethostname(),
|
||||
"ip_address": get_local_ip(),
|
||||
"system_info": system_info
|
||||
}
|
||||
# POST to /api/cluster/register
|
||||
|
||||
async def heartbeat_loop(self):
|
||||
"""Send periodic heartbeats with current status"""
|
||||
while True:
|
||||
current_status = get_current_status()
|
||||
# POST to /api/cluster/heartbeat
|
||||
await asyncio.sleep(self.heartbeat_interval)
|
||||
```
|
||||
|
||||
### Phase 3: Integration & Migration
|
||||
|
||||
#### 3.1 Remove Hardcoded Nodes
|
||||
- Delete static `cluster_nodes` dict from `cluster_service.py`
|
||||
- Replace with dynamic database queries
|
||||
- Update all cluster APIs to use registered nodes
|
||||
|
||||
#### 3.2 Frontend Updates
|
||||
- **Node Management UI**: View/approve/remove registered nodes
|
||||
- **Token Management**: Generate/revoke cluster tokens
|
||||
- **Real-time Status**: Live hardware metrics from heartbeats
|
||||
- **Registration Instructions**: Show token and join commands
|
||||
|
||||
#### 3.3 Bzzz Client Integration
|
||||
- Add cluster client to Bzzz startup sequence
|
||||
- Environment variable configuration
|
||||
- Graceful handling of registration failures
|
||||
|
||||
## 🔄 Registration Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant B as Bzzz Client
|
||||
participant H as WHOOSH Coordinator
|
||||
participant DB as Database
|
||||
|
||||
Note over H: Admin generates token
|
||||
H->>DB: INSERT cluster_token
|
||||
|
||||
Note over B: Start with env vars
|
||||
B->>B: Detect system info
|
||||
B->>H: POST /api/cluster/register
|
||||
H->>DB: Validate token
|
||||
H->>DB: INSERT cluster_node
|
||||
H->>B: Return node_id, heartbeat_interval
|
||||
|
||||
loop Every 30 seconds
|
||||
B->>B: Get current status
|
||||
B->>H: POST /api/cluster/heartbeat
|
||||
H->>DB: UPDATE last_heartbeat
|
||||
end
|
||||
```
|
||||
|
||||
## 🔐 Security Considerations
|
||||
|
||||
1. **Token-based Auth**: No SSH keys or passwords needed
|
||||
2. **Token Expiration**: Configurable token lifetimes
|
||||
3. **IP Validation**: Optional IP whitelist for token usage
|
||||
4. **TLS Required**: All communication over HTTPS
|
||||
5. **Token Rotation**: Ability to revoke/regenerate tokens
|
||||
|
||||
## ✅ Benefits of New Architecture
|
||||
|
||||
1. **Dynamic Discovery**: Nodes self-register, no pre-configuration
|
||||
2. **Real-time Data**: Live hardware metrics from heartbeats
|
||||
3. **Security**: No SSH, credential management, or open ports
|
||||
4. **Scalability**: Works with any number of nodes
|
||||
5. **Fault Tolerance**: Nodes can rejoin after network issues
|
||||
6. **Docker Friendly**: No host network access required
|
||||
7. **Cloud Ready**: Works across NAT, VPCs, different networks
|
||||
|
||||
## 🚀 Implementation Priority
|
||||
|
||||
1. **High Priority**: Database schema, registration endpoints, basic heartbeat
|
||||
2. **Medium Priority**: Bzzz client integration, hardware detection
|
||||
3. **Low Priority**: Advanced UI features, token management UI
|
||||
|
||||
## 📝 Implementation Status
|
||||
|
||||
- [ ] Phase 1.1: Database schema migration
|
||||
- [ ] Phase 1.2: Registration API endpoints
|
||||
- [ ] Phase 2.1: Bzzz environment variable support
|
||||
- [ ] Phase 2.2: System hardware detection module
|
||||
- [ ] Phase 2.3: Registration client logic
|
||||
- [ ] Phase 3.1: Remove hardcoded cluster nodes
|
||||
- [ ] Phase 3.2: Frontend cluster management UI
|
||||
- [ ] Phase 3.3: Full Bzzz integration
|
||||
|
||||
## 🔗 Related Files
|
||||
|
||||
- `/backend/app/services/cluster_service.py` - Current hardcoded implementation
|
||||
- `/backend/app/api/cluster.py` - Cluster API endpoints
|
||||
- `/backend/migrations/` - Database schema changes
|
||||
- `/frontend/src/components/cluster/` - Cluster UI components
|
||||
|
||||
---
|
||||
|
||||
**Created**: 2025-07-31
|
||||
**Status**: Planning Phase
|
||||
**Priority**: High
|
||||
**Impact**: Solves fundamental hardware detection and cluster management issues
|
||||
Reference in New Issue
Block a user