Files
bzzz/docs/BZZZv2B-OPERATIONS.md
anthonyrawlins ee6bb09511 Complete Phase 2B documentation suite and implementation
🎉 MAJOR MILESTONE: Complete BZZZ Phase 2B documentation and core implementation

## Documentation Suite (7,000+ lines)
-  User Manual: Comprehensive guide with practical examples
-  API Reference: Complete REST API documentation
-  SDK Documentation: Multi-language SDK guide (Go, Python, JS, Rust)
-  Developer Guide: Development setup and contribution procedures
-  Architecture Documentation: Detailed system design with ASCII diagrams
-  Technical Report: Performance analysis and benchmarks
-  Security Documentation: Comprehensive security model
-  Operations Guide: Production deployment and monitoring
-  Documentation Index: Cross-referenced navigation system

## SDK Examples & Integration
- 🔧 Go SDK: Simple client, event streaming, crypto operations
- 🐍 Python SDK: Async client with comprehensive examples
- 📜 JavaScript SDK: Collaborative agent implementation
- 🦀 Rust SDK: High-performance monitoring system
- 📖 Multi-language README with setup instructions

## Core Implementation
- 🔐 Age encryption implementation (pkg/crypto/age_crypto.go)
- 🗂️ Shamir secret sharing (pkg/crypto/shamir.go)
- 💾 DHT encrypted storage (pkg/dht/encrypted_storage.go)
- 📤 UCXL decision publisher (pkg/ucxl/decision_publisher.go)
- 🔄 Updated main.go with Phase 2B integration

## Project Organization
- 📂 Moved legacy docs to old-docs/ directory
- 🎯 Comprehensive README.md update with modern structure
- 🔗 Full cross-reference system between all documentation
- 📊 Production-ready deployment procedures

## Quality Assurance
-  All documentation cross-referenced and validated
-  Working code examples in multiple languages
-  Production deployment procedures tested
-  Security best practices implemented
-  Performance benchmarks documented

Ready for production deployment and community adoption.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-08 19:57:40 +10:00

12 KiB

BZZZ Operations Guide

Version 2.0 - Phase 2B Edition
Deployment, monitoring, and maintenance procedures

Quick Reference

Docker Deployment

Single Node Development

# Clone repository
git clone https://github.com/anthonyrawlins/bzzz.git
cd bzzz

# Build Docker image
docker build -t bzzz:latest .

# Run single node
docker run -d \
  --name bzzz-node \
  -p 8080:8080 \
  -p 4001:4001 \
  -v $(pwd)/config:/app/config \
  -v bzzz-data:/app/data \
  bzzz:latest

Docker Compose Cluster

# docker-compose.yml
version: '3.8'
services:
  bzzz-node-1:
    build: .
    ports:
      - "8080:8080"
      - "4001:4001"
    environment:
      - BZZZ_NODE_ID=node-1
      - BZZZ_ROLE=backend_developer
    volumes:
      - ./config:/app/config
      - bzzz-data-1:/app/data
    networks:
      - bzzz-network

  bzzz-node-2:
    build: .
    ports:
      - "8081:8080"
      - "4002:4001"
    environment:
      - BZZZ_NODE_ID=node-2
      - BZZZ_ROLE=senior_software_architect
      - BZZZ_BOOTSTRAP_PEERS=/dns/bzzz-node-1/tcp/4001
    volumes:
      - ./config:/app/config
      - bzzz-data-2:/app/data
    networks:
      - bzzz-network
    depends_on:
      - bzzz-node-1

networks:
  bzzz-network:
    driver: bridge

volumes:
  bzzz-data-1:
  bzzz-data-2:

Docker Swarm Production

# docker-compose.swarm.yml
version: '3.8'
services:
  bzzz:
    image: bzzz:latest
    deploy:
      replicas: 3
      placement:
        constraints:
          - node.role == worker
        preferences:
          - spread: node.id
      resources:
        limits:
          memory: 512M
          cpus: '1.0'
        reservations:
          memory: 256M
          cpus: '0.5'
    ports:
      - "8080:8080"
    environment:
      - BZZZ_CLUSTER_MODE=true
    networks:
      - bzzz-overlay
    volumes:
      - bzzz-config:/app/config
      - bzzz-data:/app/data

networks:
  bzzz-overlay:
    driver: overlay
    encrypted: true

volumes:
  bzzz-config:
    external: true
  bzzz-data:
    external: true

Production Configuration

Environment Variables

# Core configuration
export BZZZ_NODE_ID="production-node-01"
export BZZZ_AGENT_ID="prod-agent-backend"
export BZZZ_ROLE="backend_developer"

# Network configuration
export BZZZ_API_HOST="0.0.0.0"
export BZZZ_API_PORT="8080"
export BZZZ_P2P_PORT="4001"

# Security configuration
export BZZZ_ADMIN_KEY_SHARES="5"
export BZZZ_ADMIN_KEY_THRESHOLD="3"

# Performance tuning
export BZZZ_DHT_CACHE_SIZE="1000"
export BZZZ_DHT_REPLICATION_FACTOR="3"
export BZZZ_MAX_CONNECTIONS="500"

Production config.yaml

node:
  id: "${BZZZ_NODE_ID}"
  data_dir: "/app/data"
  
agent:
  id: "${BZZZ_AGENT_ID}"
  role: "${BZZZ_ROLE}"
  max_tasks: 10
  
api:
  host: "${BZZZ_API_HOST}"
  port: ${BZZZ_API_PORT}
  cors_enabled: false
  rate_limit: 1000
  timeout: "30s"
  
p2p:
  port: ${BZZZ_P2P_PORT}
  bootstrap_peers:
    - "/dns/bootstrap-1.bzzz.network/tcp/4001"
    - "/dns/bootstrap-2.bzzz.network/tcp/4001"
  max_connections: ${BZZZ_MAX_CONNECTIONS}
  
dht:
  cache_size: ${BZZZ_DHT_CACHE_SIZE}
  cache_ttl: "1h"
  replication_factor: ${BZZZ_DHT_REPLICATION_FACTOR}
  
security:
  admin_election_timeout: "30s"
  heartbeat_interval: "5s"
  shamir_shares: ${BZZZ_ADMIN_KEY_SHARES}
  shamir_threshold: ${BZZZ_ADMIN_KEY_THRESHOLD}
  
logging:
  level: "info"
  format: "json"
  file: "/app/logs/bzzz.log"
  max_size: "100MB"
  max_files: 10

Monitoring & Observability

Health Check Endpoint

# Basic health check
curl http://localhost:8080/health

# Detailed status
curl http://localhost:8080/api/agent/status

# DHT metrics
curl http://localhost:8080/api/dht/metrics

Prometheus Metrics

Add to prometheus.yml:

scrape_configs:
  - job_name: 'bzzz'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'
    scrape_interval: 15s

Grafana Dashboard

Import the BZZZ dashboard from monitoring/grafana-dashboard.json:

Key metrics to monitor:

  • Decision throughput - Decisions published per minute
  • DHT performance - Storage/retrieval latency
  • P2P connectivity - Connected peers count
  • Memory usage - Go runtime metrics
  • Election events - Admin election frequency

Log Aggregation

ELK Stack Configuration

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /app/logs/bzzz.log
    json.keys_under_root: true
    json.add_error_key: true

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "bzzz-%{+yyyy.MM.dd}"

logging.level: info

Structured Logging Query Examples

# Find all admin elections
{
  "query": {
    "bool": {
      "must": [
        {"match": {"level": "info"}},
        {"match": {"component": "election"}},
        {"range": {"timestamp": {"gte": "now-1h"}}}
      ]
    }
  }
}

# Find encryption errors
{
  "query": {
    "bool": {
      "must": [
        {"match": {"level": "error"}},
        {"match": {"component": "crypto"}}
      ]
    }
  }
}

Maintenance Procedures

Regular Maintenance Tasks

Daily Checks

#!/bin/bash
# daily-check.sh

echo "BZZZ Daily Health Check - $(date)"

# Check service status
echo "=== Service Status ==="
docker ps | grep bzzz

# Check API health
echo "=== API Health ==="
curl -s http://localhost:8080/health | jq .

# Check peer connectivity
echo "=== Peer Status ==="
curl -s http://localhost:8080/api/agent/peers | jq '.connected_peers | length'

# Check recent errors
echo "=== Recent Errors ==="
docker logs bzzz-node --since=24h | grep ERROR | tail -5

echo "Daily check completed"

Weekly Tasks

#!/bin/bash
# weekly-maintenance.sh

echo "BZZZ Weekly Maintenance - $(date)"

# Rotate logs
docker exec bzzz-node logrotate /app/config/logrotate.conf

# Check disk usage
echo "=== Disk Usage ==="
docker exec bzzz-node df -h /app/data

# DHT metrics review
echo "=== DHT Metrics ==="
curl -s http://localhost:8080/api/dht/metrics | jq '.stored_items, .cache_hit_rate'

# Database cleanup (if needed)
docker exec bzzz-node /app/scripts/cleanup-old-data.sh

echo "Weekly maintenance completed"

Monthly Tasks

#!/bin/bash
# monthly-maintenance.sh

echo "BZZZ Monthly Maintenance - $(date)"

# Full backup
./backup-bzzz-data.sh

# Performance review
echo "=== Performance Metrics ==="
curl -s http://localhost:8080/api/debug/status | jq '.performance'

# Security audit
echo "=== Security Check ==="
./scripts/security-audit.sh

# Update dependencies (if needed)
echo "=== Dependency Check ==="
docker exec bzzz-node go list -m -u all

echo "Monthly maintenance completed"

Backup Procedures

Data Backup Script

#!/bin/bash
# backup-bzzz-data.sh

BACKUP_DIR="/backup/bzzz"
DATE=$(date +%Y%m%d_%H%M%S)
NODE_ID=$(docker exec bzzz-node cat /app/config/node_id)

echo "Starting backup for node: $NODE_ID"

# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"

# Backup configuration
docker cp bzzz-node:/app/config "$BACKUP_DIR/$DATE/config"

# Backup data directory
docker cp bzzz-node:/app/data "$BACKUP_DIR/$DATE/data"

# Backup logs
docker cp bzzz-node:/app/logs "$BACKUP_DIR/$DATE/logs"

# Create manifest
cat > "$BACKUP_DIR/$DATE/manifest.json" << EOF
{
  "node_id": "$NODE_ID",
  "backup_date": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "version": "2.0",
  "components": ["config", "data", "logs"]
}
EOF

# Compress backup
cd "$BACKUP_DIR"
tar -czf "bzzz-backup-$NODE_ID-$DATE.tar.gz" "$DATE"
rm -rf "$DATE"

echo "Backup completed: bzzz-backup-$NODE_ID-$DATE.tar.gz"

Restore Procedure

#!/bin/bash
# restore-bzzz-data.sh

BACKUP_FILE="$1"
if [ -z "$BACKUP_FILE" ]; then
    echo "Usage: $0 <backup-file.tar.gz>"
    exit 1
fi

echo "Restoring from: $BACKUP_FILE"

# Stop service
docker stop bzzz-node

# Extract backup
tar -xzf "$BACKUP_FILE" -C /tmp/

# Find extracted directory
BACKUP_DIR=$(find /tmp -maxdepth 1 -type d -name "202*" | head -1)

# Restore configuration
docker cp "$BACKUP_DIR/config" bzzz-node:/app/

# Restore data
docker cp "$BACKUP_DIR/data" bzzz-node:/app/

# Start service
docker start bzzz-node

echo "Restore completed. Check service status."

Troubleshooting

Common Issues

Service Won't Start

# Check logs
docker logs bzzz-node

# Check configuration
docker exec bzzz-node /app/bzzz --config /app/config/config.yaml --validate

# Check permissions
docker exec bzzz-node ls -la /app/data

High Memory Usage

# Check Go memory stats
curl http://localhost:8080/api/debug/status | jq '.memory'

# Check DHT cache size
curl http://localhost:8080/api/dht/metrics | jq '.cache_size'

# Restart with memory limit
docker update --memory=512m bzzz-node
docker restart bzzz-node

Peer Connectivity Issues

# Check P2P status
curl http://localhost:8080/api/agent/peers

# Check network connectivity
docker exec bzzz-node netstat -an | grep 4001

# Check firewall rules
sudo ufw status | grep 4001

# Test bootstrap peers
docker exec bzzz-node ping bootstrap-1.bzzz.network

DHT Storage Problems

# Check DHT metrics
curl http://localhost:8080/api/dht/metrics

# Clear DHT cache
curl -X POST http://localhost:8080/api/debug/clear-cache

# Check disk space
docker exec bzzz-node df -h /app/data

Performance Tuning

High Load Optimization

# config.yaml adjustments for high load
dht:
  cache_size: 10000          # Increase cache
  cache_ttl: "30m"          # Shorter TTL for fresher data
  replication_factor: 5     # Higher replication

p2p:
  max_connections: 1000     # More connections
  
api:
  rate_limit: 5000         # Higher rate limit
  timeout: "60s"           # Longer timeout

Low Resource Optimization

# config.yaml adjustments for resource-constrained environments
dht:
  cache_size: 100          # Smaller cache
  cache_ttl: "2h"         # Longer TTL
  replication_factor: 2    # Lower replication

p2p:
  max_connections: 50      # Fewer connections

logging:
  level: "warn"           # Less verbose logging

Security Hardening

Production Security Checklist

  • Change default ports
  • Enable TLS for API endpoints
  • Configure firewall rules
  • Set up log monitoring
  • Enable audit logging
  • Rotate Age keys regularly
  • Monitor for unusual admin elections
  • Implement rate limiting
  • Use non-root Docker user
  • Regular security updates

Network Security

# Firewall configuration
sudo ufw allow 22          # SSH
sudo ufw allow 8080/tcp    # BZZZ API
sudo ufw allow 4001/tcp    # P2P networking
sudo ufw enable

# Docker security
docker run --security-opt no-new-privileges \
           --read-only \
           --tmpfs /tmp:rw,noexec,nosuid,size=1g \
           bzzz:latest

Cross-References

BZZZ Operations Guide v2.0 - Production deployment and maintenance procedures for Phase 2B unified architecture.