# BZZZ Operations Guide **Version 2.0 - Phase 2B Edition** **Deployment, monitoring, and maintenance procedures** ## Quick Reference - **[Docker Deployment](#docker-deployment)** - Containerized deployment - **[Production Setup](#production-configuration)** - Production-ready configuration - **[Monitoring](#monitoring--observability)** - Metrics and alerting - **[Maintenance](#maintenance-procedures)** - Routine maintenance tasks - **[Troubleshooting](#troubleshooting)** - Common issues and solutions ## Docker Deployment ### Single Node Development ```bash # Clone repository git clone https://github.com/anthonyrawlins/bzzz.git cd bzzz # Build Docker image docker build -t bzzz:latest . # Run single node docker run -d \ --name bzzz-node \ -p 8080:8080 \ -p 4001:4001 \ -v $(pwd)/config:/app/config \ -v bzzz-data:/app/data \ bzzz:latest ``` ### Docker Compose Cluster ```yaml # docker-compose.yml version: '3.8' services: bzzz-node-1: build: . ports: - "8080:8080" - "4001:4001" environment: - BZZZ_NODE_ID=node-1 - BZZZ_ROLE=backend_developer volumes: - ./config:/app/config - bzzz-data-1:/app/data networks: - bzzz-network bzzz-node-2: build: . ports: - "8081:8080" - "4002:4001" environment: - BZZZ_NODE_ID=node-2 - BZZZ_ROLE=senior_software_architect - BZZZ_BOOTSTRAP_PEERS=/dns/bzzz-node-1/tcp/4001 volumes: - ./config:/app/config - bzzz-data-2:/app/data networks: - bzzz-network depends_on: - bzzz-node-1 networks: bzzz-network: driver: bridge volumes: bzzz-data-1: bzzz-data-2: ``` ### Docker Swarm Production ```yaml # docker-compose.swarm.yml version: '3.8' services: bzzz: image: bzzz:latest deploy: replicas: 3 placement: constraints: - node.role == worker preferences: - spread: node.id resources: limits: memory: 512M cpus: '1.0' reservations: memory: 256M cpus: '0.5' ports: - "8080:8080" environment: - BZZZ_CLUSTER_MODE=true networks: - bzzz-overlay volumes: - bzzz-config:/app/config - bzzz-data:/app/data networks: bzzz-overlay: driver: overlay encrypted: true volumes: bzzz-config: external: true bzzz-data: external: true ``` ## Production Configuration ### Environment Variables ```bash # Core configuration export BZZZ_NODE_ID="production-node-01" export BZZZ_AGENT_ID="prod-agent-backend" export BZZZ_ROLE="backend_developer" # Network configuration export BZZZ_API_HOST="0.0.0.0" export BZZZ_API_PORT="8080" export BZZZ_P2P_PORT="4001" # Security configuration export BZZZ_ADMIN_KEY_SHARES="5" export BZZZ_ADMIN_KEY_THRESHOLD="3" # Performance tuning export BZZZ_DHT_CACHE_SIZE="1000" export BZZZ_DHT_REPLICATION_FACTOR="3" export BZZZ_MAX_CONNECTIONS="500" ``` ### Production config.yaml ```yaml node: id: "${BZZZ_NODE_ID}" data_dir: "/app/data" agent: id: "${BZZZ_AGENT_ID}" role: "${BZZZ_ROLE}" max_tasks: 10 api: host: "${BZZZ_API_HOST}" port: ${BZZZ_API_PORT} cors_enabled: false rate_limit: 1000 timeout: "30s" p2p: port: ${BZZZ_P2P_PORT} bootstrap_peers: - "/dns/bootstrap-1.bzzz.network/tcp/4001" - "/dns/bootstrap-2.bzzz.network/tcp/4001" max_connections: ${BZZZ_MAX_CONNECTIONS} dht: cache_size: ${BZZZ_DHT_CACHE_SIZE} cache_ttl: "1h" replication_factor: ${BZZZ_DHT_REPLICATION_FACTOR} security: admin_election_timeout: "30s" heartbeat_interval: "5s" shamir_shares: ${BZZZ_ADMIN_KEY_SHARES} shamir_threshold: ${BZZZ_ADMIN_KEY_THRESHOLD} logging: level: "info" format: "json" file: "/app/logs/bzzz.log" max_size: "100MB" max_files: 10 ``` ## Monitoring & Observability ### Health Check Endpoint ```bash # Basic health check curl http://localhost:8080/health # Detailed status curl http://localhost:8080/api/agent/status # DHT metrics curl http://localhost:8080/api/dht/metrics ``` ### Prometheus Metrics Add to `prometheus.yml`: ```yaml scrape_configs: - job_name: 'bzzz' static_configs: - targets: ['localhost:8080'] metrics_path: '/metrics' scrape_interval: 15s ``` ### Grafana Dashboard Import the BZZZ dashboard from `monitoring/grafana-dashboard.json`: Key metrics to monitor: - **Decision throughput** - Decisions published per minute - **DHT performance** - Storage/retrieval latency - **P2P connectivity** - Connected peers count - **Memory usage** - Go runtime metrics - **Election events** - Admin election frequency ### Log Aggregation #### ELK Stack Configuration ```yaml # filebeat.yml filebeat.inputs: - type: log enabled: true paths: - /app/logs/bzzz.log json.keys_under_root: true json.add_error_key: true output.elasticsearch: hosts: ["elasticsearch:9200"] index: "bzzz-%{+yyyy.MM.dd}" logging.level: info ``` #### Structured Logging Query Examples ```json # Find all admin elections { "query": { "bool": { "must": [ {"match": {"level": "info"}}, {"match": {"component": "election"}}, {"range": {"timestamp": {"gte": "now-1h"}}} ] } } } # Find encryption errors { "query": { "bool": { "must": [ {"match": {"level": "error"}}, {"match": {"component": "crypto"}} ] } } } ``` ## Maintenance Procedures ### Regular Maintenance Tasks #### Daily Checks ```bash #!/bin/bash # daily-check.sh echo "BZZZ Daily Health Check - $(date)" # Check service status echo "=== Service Status ===" docker ps | grep bzzz # Check API health echo "=== API Health ===" curl -s http://localhost:8080/health | jq . # Check peer connectivity echo "=== Peer Status ===" curl -s http://localhost:8080/api/agent/peers | jq '.connected_peers | length' # Check recent errors echo "=== Recent Errors ===" docker logs bzzz-node --since=24h | grep ERROR | tail -5 echo "Daily check completed" ``` #### Weekly Tasks ```bash #!/bin/bash # weekly-maintenance.sh echo "BZZZ Weekly Maintenance - $(date)" # Rotate logs docker exec bzzz-node logrotate /app/config/logrotate.conf # Check disk usage echo "=== Disk Usage ===" docker exec bzzz-node df -h /app/data # DHT metrics review echo "=== DHT Metrics ===" curl -s http://localhost:8080/api/dht/metrics | jq '.stored_items, .cache_hit_rate' # Database cleanup (if needed) docker exec bzzz-node /app/scripts/cleanup-old-data.sh echo "Weekly maintenance completed" ``` #### Monthly Tasks ```bash #!/bin/bash # monthly-maintenance.sh echo "BZZZ Monthly Maintenance - $(date)" # Full backup ./backup-bzzz-data.sh # Performance review echo "=== Performance Metrics ===" curl -s http://localhost:8080/api/debug/status | jq '.performance' # Security audit echo "=== Security Check ===" ./scripts/security-audit.sh # Update dependencies (if needed) echo "=== Dependency Check ===" docker exec bzzz-node go list -m -u all echo "Monthly maintenance completed" ``` ### Backup Procedures #### Data Backup Script ```bash #!/bin/bash # backup-bzzz-data.sh BACKUP_DIR="/backup/bzzz" DATE=$(date +%Y%m%d_%H%M%S) NODE_ID=$(docker exec bzzz-node cat /app/config/node_id) echo "Starting backup for node: $NODE_ID" # Create backup directory mkdir -p "$BACKUP_DIR/$DATE" # Backup configuration docker cp bzzz-node:/app/config "$BACKUP_DIR/$DATE/config" # Backup data directory docker cp bzzz-node:/app/data "$BACKUP_DIR/$DATE/data" # Backup logs docker cp bzzz-node:/app/logs "$BACKUP_DIR/$DATE/logs" # Create manifest cat > "$BACKUP_DIR/$DATE/manifest.json" << EOF { "node_id": "$NODE_ID", "backup_date": "$(date -u +%Y-%m-%dT%H:%M:%SZ)", "version": "2.0", "components": ["config", "data", "logs"] } EOF # Compress backup cd "$BACKUP_DIR" tar -czf "bzzz-backup-$NODE_ID-$DATE.tar.gz" "$DATE" rm -rf "$DATE" echo "Backup completed: bzzz-backup-$NODE_ID-$DATE.tar.gz" ``` #### Restore Procedure ```bash #!/bin/bash # restore-bzzz-data.sh BACKUP_FILE="$1" if [ -z "$BACKUP_FILE" ]; then echo "Usage: $0 " exit 1 fi echo "Restoring from: $BACKUP_FILE" # Stop service docker stop bzzz-node # Extract backup tar -xzf "$BACKUP_FILE" -C /tmp/ # Find extracted directory BACKUP_DIR=$(find /tmp -maxdepth 1 -type d -name "202*" | head -1) # Restore configuration docker cp "$BACKUP_DIR/config" bzzz-node:/app/ # Restore data docker cp "$BACKUP_DIR/data" bzzz-node:/app/ # Start service docker start bzzz-node echo "Restore completed. Check service status." ``` ## Troubleshooting ### Common Issues #### Service Won't Start ```bash # Check logs docker logs bzzz-node # Check configuration docker exec bzzz-node /app/bzzz --config /app/config/config.yaml --validate # Check permissions docker exec bzzz-node ls -la /app/data ``` #### High Memory Usage ```bash # Check Go memory stats curl http://localhost:8080/api/debug/status | jq '.memory' # Check DHT cache size curl http://localhost:8080/api/dht/metrics | jq '.cache_size' # Restart with memory limit docker update --memory=512m bzzz-node docker restart bzzz-node ``` #### Peer Connectivity Issues ```bash # Check P2P status curl http://localhost:8080/api/agent/peers # Check network connectivity docker exec bzzz-node netstat -an | grep 4001 # Check firewall rules sudo ufw status | grep 4001 # Test bootstrap peers docker exec bzzz-node ping bootstrap-1.bzzz.network ``` #### DHT Storage Problems ```bash # Check DHT metrics curl http://localhost:8080/api/dht/metrics # Clear DHT cache curl -X POST http://localhost:8080/api/debug/clear-cache # Check disk space docker exec bzzz-node df -h /app/data ``` ### Performance Tuning #### High Load Optimization ```yaml # config.yaml adjustments for high load dht: cache_size: 10000 # Increase cache cache_ttl: "30m" # Shorter TTL for fresher data replication_factor: 5 # Higher replication p2p: max_connections: 1000 # More connections api: rate_limit: 5000 # Higher rate limit timeout: "60s" # Longer timeout ``` #### Low Resource Optimization ```yaml # config.yaml adjustments for resource-constrained environments dht: cache_size: 100 # Smaller cache cache_ttl: "2h" # Longer TTL replication_factor: 2 # Lower replication p2p: max_connections: 50 # Fewer connections logging: level: "warn" # Less verbose logging ``` ### Security Hardening #### Production Security Checklist - [ ] Change default ports - [ ] Enable TLS for API endpoints - [ ] Configure firewall rules - [ ] Set up log monitoring - [ ] Enable audit logging - [ ] Rotate Age keys regularly - [ ] Monitor for unusual admin elections - [ ] Implement rate limiting - [ ] Use non-root Docker user - [ ] Regular security updates #### Network Security ```bash # Firewall configuration sudo ufw allow 22 # SSH sudo ufw allow 8080/tcp # BZZZ API sudo ufw allow 4001/tcp # P2P networking sudo ufw enable # Docker security docker run --security-opt no-new-privileges \ --read-only \ --tmpfs /tmp:rw,noexec,nosuid,size=1g \ bzzz:latest ``` --- ## Cross-References - **[User Manual](USER_MANUAL.md)** - Basic usage and configuration - **[Developer Guide](DEVELOPER.md)** - Development and testing procedures - **[Architecture Documentation](ARCHITECTURE.md)** - System design and deployment patterns - **[Technical Report](TECHNICAL_REPORT.md)** - Performance characteristics and scaling - **[Security Documentation](SECURITY.md)** - Security best practices **BZZZ Operations Guide v2.0** - Production deployment and maintenance procedures for Phase 2B unified architecture.