Major WHOOSH system refactoring and feature enhancements

- Migrated from HIVE branding to WHOOSH across all components
- Enhanced backend API with new services: AI models, BZZZ integration, templates, members
- Added comprehensive testing suite with security, performance, and integration tests
- Improved frontend with new components for project setup, AI models, and team management
- Updated MCP server implementation with WHOOSH-specific tools and resources
- Enhanced deployment configurations with production-ready Docker setups
- Added comprehensive documentation and setup guides
- Implemented age encryption service and UCXL integration

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
anthonyrawlins
2025-08-27 08:34:48 +10:00
parent 0e9844ef13
commit 268214d971
399 changed files with 57390 additions and 2045 deletions

View File

@@ -0,0 +1,221 @@
# 🏗️ WHOOSH-Bzzz Registration Architecture Design Plan
## 🔍 Current Architecture Problems
1. **Static Configuration**: Hardcoded node IPs in `cluster_service.py`
2. **SSH Dependencies**: Requires SSH keys, network access, security risks
3. **Docker Isolation**: Can't SSH from container to host network
4. **No Dynamic Discovery**: Nodes can't join/leave dynamically
5. **Stale Data**: No real-time hardware/status updates
## 🎯 Proposed Architecture: Registration-Based Cluster
Similar to Docker Swarm's `docker swarm join` with tokens:
```bash
# Bzzz clients register with WHOOSH coordinator
WHOOSH_CLUSTER_TOKEN=abc123... WHOOSH_COORDINATOR_URL=https://whoosh.example.com bzzz-client
```
## 📋 Implementation Plan
### Phase 1: WHOOSH Coordinator Registration System
#### 1.1 Database Schema Changes
```sql
-- Cluster registration tokens
CREATE TABLE cluster_tokens (
id SERIAL PRIMARY KEY,
token VARCHAR(64) UNIQUE NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
is_active BOOLEAN DEFAULT true
);
-- Registered cluster nodes
CREATE TABLE cluster_nodes (
id SERIAL PRIMARY KEY,
node_id VARCHAR(64) UNIQUE NOT NULL,
hostname VARCHAR(255) NOT NULL,
ip_address INET NOT NULL,
registration_token VARCHAR(64) REFERENCES cluster_tokens(token),
-- Hardware info (reported by client)
cpu_info JSONB,
memory_info JSONB,
gpu_info JSONB,
disk_info JSONB,
-- Status tracking
status VARCHAR(20) DEFAULT 'online',
last_heartbeat TIMESTAMP DEFAULT NOW(),
first_registered TIMESTAMP DEFAULT NOW(),
-- Capabilities
services JSONB, -- ollama, docker, etc.
capabilities JSONB -- models, tools, etc.
);
```
#### 1.2 Registration API Endpoints
```python
# /api/cluster/register (POST)
# - Validates token
# - Records node hardware info
# - Returns node_id and heartbeat interval
# /api/cluster/heartbeat (POST)
# - Updates last_heartbeat
# - Updates current status/metrics
# - Returns cluster commands/tasks
# /api/cluster/tokens (GET/POST)
# - Generate/list/revoke cluster tokens
# - Admin endpoint for token management
```
### Phase 2: Bzzz Client Registration Capability
#### 2.1 Environment Variables
```bash
WHOOSH_CLUSTER_TOKEN=token_here # Required for registration
WHOOSH_COORDINATOR_URL=https://whoosh.local:8000 # WHOOSH API endpoint
WHOOSH_NODE_ID=walnut-$(hostname) # Optional: custom node ID
WHOOSH_HEARTBEAT_INTERVAL=30 # Seconds between heartbeats
```
#### 2.2 Hardware Detection Module
```python
# bzzz/system_info.py
def get_system_info():
return {
"cpu": detect_cpu(), # lscpu parsing
"memory": detect_memory(), # /proc/meminfo
"gpu": detect_gpu(), # nvidia-smi, lspci
"disk": detect_storage(), # df, lsblk
"services": detect_services(), # docker, ollama, etc.
"capabilities": detect_models() # available models
}
```
#### 2.3 Registration Logic
```python
# bzzz/cluster_client.py
class WHOOSHClusterClient:
def __init__(self):
self.token = os.getenv('WHOOSH_CLUSTER_TOKEN')
self.coordinator_url = os.getenv('WHOOSH_COORDINATOR_URL')
self.node_id = os.getenv('WHOOSH_NODE_ID', f"{socket.gethostname()}-{uuid4()}")
async def register(self):
"""Register with WHOOSH coordinator"""
system_info = get_system_info()
payload = {
"token": self.token,
"node_id": self.node_id,
"hostname": socket.gethostname(),
"ip_address": get_local_ip(),
"system_info": system_info
}
# POST to /api/cluster/register
async def heartbeat_loop(self):
"""Send periodic heartbeats with current status"""
while True:
current_status = get_current_status()
# POST to /api/cluster/heartbeat
await asyncio.sleep(self.heartbeat_interval)
```
### Phase 3: Integration & Migration
#### 3.1 Remove Hardcoded Nodes
- Delete static `cluster_nodes` dict from `cluster_service.py`
- Replace with dynamic database queries
- Update all cluster APIs to use registered nodes
#### 3.2 Frontend Updates
- **Node Management UI**: View/approve/remove registered nodes
- **Token Management**: Generate/revoke cluster tokens
- **Real-time Status**: Live hardware metrics from heartbeats
- **Registration Instructions**: Show token and join commands
#### 3.3 Bzzz Client Integration
- Add cluster client to Bzzz startup sequence
- Environment variable configuration
- Graceful handling of registration failures
## 🔄 Registration Flow
```mermaid
sequenceDiagram
participant B as Bzzz Client
participant H as WHOOSH Coordinator
participant DB as Database
Note over H: Admin generates token
H->>DB: INSERT cluster_token
Note over B: Start with env vars
B->>B: Detect system info
B->>H: POST /api/cluster/register
H->>DB: Validate token
H->>DB: INSERT cluster_node
H->>B: Return node_id, heartbeat_interval
loop Every 30 seconds
B->>B: Get current status
B->>H: POST /api/cluster/heartbeat
H->>DB: UPDATE last_heartbeat
end
```
## 🔐 Security Considerations
1. **Token-based Auth**: No SSH keys or passwords needed
2. **Token Expiration**: Configurable token lifetimes
3. **IP Validation**: Optional IP whitelist for token usage
4. **TLS Required**: All communication over HTTPS
5. **Token Rotation**: Ability to revoke/regenerate tokens
## ✅ Benefits of New Architecture
1. **Dynamic Discovery**: Nodes self-register, no pre-configuration
2. **Real-time Data**: Live hardware metrics from heartbeats
3. **Security**: No SSH, credential management, or open ports
4. **Scalability**: Works with any number of nodes
5. **Fault Tolerance**: Nodes can rejoin after network issues
6. **Docker Friendly**: No host network access required
7. **Cloud Ready**: Works across NAT, VPCs, different networks
## 🚀 Implementation Priority
1. **High Priority**: Database schema, registration endpoints, basic heartbeat
2. **Medium Priority**: Bzzz client integration, hardware detection
3. **Low Priority**: Advanced UI features, token management UI
## 📝 Implementation Status
- [ ] Phase 1.1: Database schema migration
- [ ] Phase 1.2: Registration API endpoints
- [ ] Phase 2.1: Bzzz environment variable support
- [ ] Phase 2.2: System hardware detection module
- [ ] Phase 2.3: Registration client logic
- [ ] Phase 3.1: Remove hardcoded cluster nodes
- [ ] Phase 3.2: Frontend cluster management UI
- [ ] Phase 3.3: Full Bzzz integration
## 🔗 Related Files
- `/backend/app/services/cluster_service.py` - Current hardcoded implementation
- `/backend/app/api/cluster.py` - Cluster API endpoints
- `/backend/migrations/` - Database schema changes
- `/frontend/src/components/cluster/` - Cluster UI components
---
**Created**: 2025-07-31
**Status**: Planning Phase
**Priority**: High
**Impact**: Solves fundamental hardware detection and cluster management issues