Files
bzzz/deployments/bare-metal/config-ui/requirements.md
anthonyrawlins f5f96ba505 Major updates and improvements to BZZZ system
- Updated configuration and deployment files
- Improved system architecture and components
- Enhanced documentation and testing
- Fixed various issues and added new features

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-17 18:06:57 +10:00

8.1 KiB

BZZZ Configuration Web Interface Requirements

Overview

A comprehensive web-based configuration interface that guides users through setting up their BZZZ cluster after the initial installation.

User Information Requirements

1. Cluster Infrastructure Configuration

Network Settings

  • Subnet IP Range (CIDR notation)

    • Auto-detected from system
    • User can override (e.g., 192.168.1.0/24)
    • Validation for valid CIDR format
    • Conflict detection with existing networks
  • Node Discovery Method

    • Option 1: Automatic discovery via broadcast
    • Option 2: Manual IP address list
    • Option 3: DNS-based discovery
    • Integration with existing network infrastructure
  • Network Interface Selection

    • Dropdown of available interfaces
    • Auto-select primary interface
    • Show interface details (IP, status, speed)
    • Validation for interface accessibility
  • Port Configuration

    • BZZZ Go Service Port (default: 8080)
    • MCP Server Port (default: 3000)
    • Web UI Port (default: 8080)
    • WebSocket Port (default: 8081)
    • Reserved port range exclusions
    • Port conflict detection

Firewall & Security

  • Firewall Configuration
    • Auto-configure firewall rules (ufw/iptables)
    • Manual firewall setup instructions
    • Port testing and validation
    • Network connectivity verification

2. Authentication & Security Setup

SSH Key Management

  • SSH Key Options

    • Generate new SSH key pair
    • Upload existing public key
    • Use existing system SSH keys
    • Key distribution to cluster nodes
  • SSH Access Configuration

    • SSH username for cluster access
    • Sudo privileges configuration
    • SSH port (default: 22)
    • Key-based vs password authentication

Security Settings

  • TLS/SSL Configuration

    • Generate self-signed certificates
    • Upload existing certificates
    • Let's Encrypt integration
    • Certificate distribution
  • Authentication Methods

    • Token-based authentication
    • OAuth2 integration
    • LDAP/Active Directory
    • Local user management

3. AI Model Configuration

OpenAI Integration

  • API Key Management

    • Secure API key input
    • Key validation and testing
    • Organization and project settings
    • Usage monitoring setup
  • Model Preferences

    • Default model selection (GPT-5)
    • Model-to-task mapping
    • Custom model parameters
    • Fallback model configuration

Local AI Models (Ollama/Parallama)

  • Ollama/Parallama Installation

    • Option to install standard Ollama
    • Option to install Parallama (multi-GPU fork)
    • Auto-detect existing Ollama installations
    • Upgrade/migrate from Ollama to Parallama
  • Node Discovery & Configuration

    • Auto-discover Ollama/Parallama instances
    • Manual endpoint configuration
    • Model availability checking
    • Load balancing preferences
    • GPU assignment for Parallama
  • Multi-GPU Configuration (Parallama)

    • GPU topology detection
    • Model sharding across GPUs
    • Memory allocation per GPU
    • Performance optimization settings
    • GPU failure handling
  • Model Distribution Strategy

    • Which models on which nodes
    • GPU-specific model placement
    • Automatic model pulling
    • Storage requirements
    • Model update policies

4. Cost Management

Spending Limits

  • Daily Limits (USD)

    • Per-user limits
    • Per-project limits
    • Global daily limit
    • Warning thresholds
  • Monthly Limits (USD)

    • Budget allocation
    • Automatic budget reset
    • Cost tracking granularity
    • Billing integration

Cost Optimization

  • Usage Monitoring
    • Real-time cost tracking
    • Historical usage reports
    • Cost per model/task type
    • Optimization recommendations

5. Hardware & Resource Detection

System Resources

  • CPU Configuration

    • Core count and allocation
    • CPU affinity settings
    • Performance optimization
    • Load balancing
  • Memory Management

    • Available RAM detection
    • Memory allocation per service
    • Swap configuration
    • Memory monitoring
  • Storage Configuration

    • Available disk space
    • Storage paths for data/logs
    • Backup storage locations
    • Storage monitoring

GPU Resources

  • GPU Detection

    • NVIDIA CUDA support
    • AMD ROCm support
    • GPU memory allocation
    • Multi-GPU configuration
  • AI Workload Optimization

    • GPU scheduling
    • Model-to-GPU assignment
    • Power management
    • Temperature monitoring

6. Service Configuration

Container Management

  • Docker Configuration

    • Container registry selection
    • Image pull policies
    • Resource limits per container
    • Container orchestration (Docker Swarm/K8s)
  • Registry Settings

    • Public registry (Docker Hub)
    • Private registry setup
    • Authentication for registries
    • Image versioning strategy

Update Management

  • Release Channels

    • Stable releases
    • Beta releases
    • Development builds
    • Custom release sources
  • Auto-Update Settings

    • Automatic updates enabled/disabled
    • Update scheduling
    • Rollback capabilities
    • Update notifications

7. Monitoring & Observability

Logging Configuration

  • Log Levels

    • Debug, Info, Warn, Error
    • Per-component log levels
    • Log rotation settings
    • Centralized logging
  • Log Destinations

    • Local file logging
    • Syslog integration
    • External log collectors
    • Log retention policies

Metrics & Monitoring

  • Metrics Collection

    • Prometheus integration
    • Custom metrics
    • Performance monitoring
    • Health checks
  • Alerting

    • Alert rules configuration
    • Notification channels
    • Escalation policies
    • Alert suppression

8. Cluster Topology

Node Roles

  • Coordinator Nodes

    • Primary coordinator selection
    • Coordinator failover
    • Load balancing
    • State synchronization
  • Worker Nodes

    • Worker node capabilities
    • Task scheduling preferences
    • Resource allocation
    • Worker health monitoring
  • Storage Nodes

    • Distributed storage setup
    • Replication factors
    • Data consistency
    • Backup strategies

High Availability

  • Failover Configuration

    • Automatic failover
    • Manual failover procedures
    • Split-brain prevention
    • Recovery strategies
  • Load Balancing

    • Load balancing algorithms
    • Health check configuration
    • Traffic distribution
    • Performance optimization

Configuration Flow

Step 1: System Detection

  • Detect hardware resources
  • Identify network interfaces
  • Check system dependencies
  • Validate installation

Step 2: Network Configuration

  • Configure network settings
  • Set up firewall rules
  • Test connectivity
  • Validate port accessibility

Step 3: Security Setup

  • Configure authentication
  • Set up SSH access
  • Generate/install certificates
  • Test security settings

Step 4: AI Integration

  • Configure OpenAI API
  • Set up Ollama endpoints
  • Configure model preferences
  • Test AI connectivity

Step 5: Resource Allocation

  • Allocate CPU/memory
  • Configure storage paths
  • Set up GPU resources
  • Configure monitoring

Step 6: Service Deployment

  • Deploy BZZZ services
  • Configure service parameters
  • Start services
  • Validate service health

Step 7: Cluster Formation

  • Discover other nodes
  • Join/create cluster
  • Configure replication
  • Test cluster connectivity

Step 8: Testing & Validation

  • Run connectivity tests
  • Test AI model access
  • Validate security settings
  • Performance benchmarking

Technical Implementation

Frontend Framework

  • React/Next.js for modern UI
  • Material-UI or Tailwind CSS for components
  • Real-time updates via WebSocket
  • Progressive Web App capabilities

Backend API

  • Go REST API integrated with BZZZ service
  • Configuration validation and testing
  • Real-time status updates
  • Secure configuration storage

Configuration Persistence

  • YAML configuration files
  • Environment variable generation
  • Docker Compose generation
  • Systemd service configuration

Validation & Testing

  • Network connectivity testing
  • Service health validation
  • Configuration syntax checking
  • Resource availability verification

This comprehensive configuration system ensures users can easily set up and manage their BZZZ clusters regardless of their technical expertise level.