Files
bzzz/docs/BZZZv2B-OPERATIONS.md
anthonyrawlins ee6bb09511 Complete Phase 2B documentation suite and implementation
🎉 MAJOR MILESTONE: Complete BZZZ Phase 2B documentation and core implementation

## Documentation Suite (7,000+ lines)
-  User Manual: Comprehensive guide with practical examples
-  API Reference: Complete REST API documentation
-  SDK Documentation: Multi-language SDK guide (Go, Python, JS, Rust)
-  Developer Guide: Development setup and contribution procedures
-  Architecture Documentation: Detailed system design with ASCII diagrams
-  Technical Report: Performance analysis and benchmarks
-  Security Documentation: Comprehensive security model
-  Operations Guide: Production deployment and monitoring
-  Documentation Index: Cross-referenced navigation system

## SDK Examples & Integration
- 🔧 Go SDK: Simple client, event streaming, crypto operations
- 🐍 Python SDK: Async client with comprehensive examples
- 📜 JavaScript SDK: Collaborative agent implementation
- 🦀 Rust SDK: High-performance monitoring system
- 📖 Multi-language README with setup instructions

## Core Implementation
- 🔐 Age encryption implementation (pkg/crypto/age_crypto.go)
- 🗂️ Shamir secret sharing (pkg/crypto/shamir.go)
- 💾 DHT encrypted storage (pkg/dht/encrypted_storage.go)
- 📤 UCXL decision publisher (pkg/ucxl/decision_publisher.go)
- 🔄 Updated main.go with Phase 2B integration

## Project Organization
- 📂 Moved legacy docs to old-docs/ directory
- 🎯 Comprehensive README.md update with modern structure
- 🔗 Full cross-reference system between all documentation
- 📊 Production-ready deployment procedures

## Quality Assurance
-  All documentation cross-referenced and validated
-  Working code examples in multiple languages
-  Production deployment procedures tested
-  Security best practices implemented
-  Performance benchmarks documented

Ready for production deployment and community adoption.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-08 19:57:40 +10:00

569 lines
12 KiB
Markdown

# BZZZ Operations Guide
**Version 2.0 - Phase 2B Edition**
**Deployment, monitoring, and maintenance procedures**
## Quick Reference
- **[Docker Deployment](#docker-deployment)** - Containerized deployment
- **[Production Setup](#production-configuration)** - Production-ready configuration
- **[Monitoring](#monitoring--observability)** - Metrics and alerting
- **[Maintenance](#maintenance-procedures)** - Routine maintenance tasks
- **[Troubleshooting](#troubleshooting)** - Common issues and solutions
## Docker Deployment
### Single Node Development
```bash
# Clone repository
git clone https://github.com/anthonyrawlins/bzzz.git
cd bzzz
# Build Docker image
docker build -t bzzz:latest .
# Run single node
docker run -d \
--name bzzz-node \
-p 8080:8080 \
-p 4001:4001 \
-v $(pwd)/config:/app/config \
-v bzzz-data:/app/data \
bzzz:latest
```
### Docker Compose Cluster
```yaml
# docker-compose.yml
version: '3.8'
services:
bzzz-node-1:
build: .
ports:
- "8080:8080"
- "4001:4001"
environment:
- BZZZ_NODE_ID=node-1
- BZZZ_ROLE=backend_developer
volumes:
- ./config:/app/config
- bzzz-data-1:/app/data
networks:
- bzzz-network
bzzz-node-2:
build: .
ports:
- "8081:8080"
- "4002:4001"
environment:
- BZZZ_NODE_ID=node-2
- BZZZ_ROLE=senior_software_architect
- BZZZ_BOOTSTRAP_PEERS=/dns/bzzz-node-1/tcp/4001
volumes:
- ./config:/app/config
- bzzz-data-2:/app/data
networks:
- bzzz-network
depends_on:
- bzzz-node-1
networks:
bzzz-network:
driver: bridge
volumes:
bzzz-data-1:
bzzz-data-2:
```
### Docker Swarm Production
```yaml
# docker-compose.swarm.yml
version: '3.8'
services:
bzzz:
image: bzzz:latest
deploy:
replicas: 3
placement:
constraints:
- node.role == worker
preferences:
- spread: node.id
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.5'
ports:
- "8080:8080"
environment:
- BZZZ_CLUSTER_MODE=true
networks:
- bzzz-overlay
volumes:
- bzzz-config:/app/config
- bzzz-data:/app/data
networks:
bzzz-overlay:
driver: overlay
encrypted: true
volumes:
bzzz-config:
external: true
bzzz-data:
external: true
```
## Production Configuration
### Environment Variables
```bash
# Core configuration
export BZZZ_NODE_ID="production-node-01"
export BZZZ_AGENT_ID="prod-agent-backend"
export BZZZ_ROLE="backend_developer"
# Network configuration
export BZZZ_API_HOST="0.0.0.0"
export BZZZ_API_PORT="8080"
export BZZZ_P2P_PORT="4001"
# Security configuration
export BZZZ_ADMIN_KEY_SHARES="5"
export BZZZ_ADMIN_KEY_THRESHOLD="3"
# Performance tuning
export BZZZ_DHT_CACHE_SIZE="1000"
export BZZZ_DHT_REPLICATION_FACTOR="3"
export BZZZ_MAX_CONNECTIONS="500"
```
### Production config.yaml
```yaml
node:
id: "${BZZZ_NODE_ID}"
data_dir: "/app/data"
agent:
id: "${BZZZ_AGENT_ID}"
role: "${BZZZ_ROLE}"
max_tasks: 10
api:
host: "${BZZZ_API_HOST}"
port: ${BZZZ_API_PORT}
cors_enabled: false
rate_limit: 1000
timeout: "30s"
p2p:
port: ${BZZZ_P2P_PORT}
bootstrap_peers:
- "/dns/bootstrap-1.bzzz.network/tcp/4001"
- "/dns/bootstrap-2.bzzz.network/tcp/4001"
max_connections: ${BZZZ_MAX_CONNECTIONS}
dht:
cache_size: ${BZZZ_DHT_CACHE_SIZE}
cache_ttl: "1h"
replication_factor: ${BZZZ_DHT_REPLICATION_FACTOR}
security:
admin_election_timeout: "30s"
heartbeat_interval: "5s"
shamir_shares: ${BZZZ_ADMIN_KEY_SHARES}
shamir_threshold: ${BZZZ_ADMIN_KEY_THRESHOLD}
logging:
level: "info"
format: "json"
file: "/app/logs/bzzz.log"
max_size: "100MB"
max_files: 10
```
## Monitoring & Observability
### Health Check Endpoint
```bash
# Basic health check
curl http://localhost:8080/health
# Detailed status
curl http://localhost:8080/api/agent/status
# DHT metrics
curl http://localhost:8080/api/dht/metrics
```
### Prometheus Metrics
Add to `prometheus.yml`:
```yaml
scrape_configs:
- job_name: 'bzzz'
static_configs:
- targets: ['localhost:8080']
metrics_path: '/metrics'
scrape_interval: 15s
```
### Grafana Dashboard
Import the BZZZ dashboard from `monitoring/grafana-dashboard.json`:
Key metrics to monitor:
- **Decision throughput** - Decisions published per minute
- **DHT performance** - Storage/retrieval latency
- **P2P connectivity** - Connected peers count
- **Memory usage** - Go runtime metrics
- **Election events** - Admin election frequency
### Log Aggregation
#### ELK Stack Configuration
```yaml
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /app/logs/bzzz.log
json.keys_under_root: true
json.add_error_key: true
output.elasticsearch:
hosts: ["elasticsearch:9200"]
index: "bzzz-%{+yyyy.MM.dd}"
logging.level: info
```
#### Structured Logging Query Examples
```json
# Find all admin elections
{
"query": {
"bool": {
"must": [
{"match": {"level": "info"}},
{"match": {"component": "election"}},
{"range": {"timestamp": {"gte": "now-1h"}}}
]
}
}
}
# Find encryption errors
{
"query": {
"bool": {
"must": [
{"match": {"level": "error"}},
{"match": {"component": "crypto"}}
]
}
}
}
```
## Maintenance Procedures
### Regular Maintenance Tasks
#### Daily Checks
```bash
#!/bin/bash
# daily-check.sh
echo "BZZZ Daily Health Check - $(date)"
# Check service status
echo "=== Service Status ==="
docker ps | grep bzzz
# Check API health
echo "=== API Health ==="
curl -s http://localhost:8080/health | jq .
# Check peer connectivity
echo "=== Peer Status ==="
curl -s http://localhost:8080/api/agent/peers | jq '.connected_peers | length'
# Check recent errors
echo "=== Recent Errors ==="
docker logs bzzz-node --since=24h | grep ERROR | tail -5
echo "Daily check completed"
```
#### Weekly Tasks
```bash
#!/bin/bash
# weekly-maintenance.sh
echo "BZZZ Weekly Maintenance - $(date)"
# Rotate logs
docker exec bzzz-node logrotate /app/config/logrotate.conf
# Check disk usage
echo "=== Disk Usage ==="
docker exec bzzz-node df -h /app/data
# DHT metrics review
echo "=== DHT Metrics ==="
curl -s http://localhost:8080/api/dht/metrics | jq '.stored_items, .cache_hit_rate'
# Database cleanup (if needed)
docker exec bzzz-node /app/scripts/cleanup-old-data.sh
echo "Weekly maintenance completed"
```
#### Monthly Tasks
```bash
#!/bin/bash
# monthly-maintenance.sh
echo "BZZZ Monthly Maintenance - $(date)"
# Full backup
./backup-bzzz-data.sh
# Performance review
echo "=== Performance Metrics ==="
curl -s http://localhost:8080/api/debug/status | jq '.performance'
# Security audit
echo "=== Security Check ==="
./scripts/security-audit.sh
# Update dependencies (if needed)
echo "=== Dependency Check ==="
docker exec bzzz-node go list -m -u all
echo "Monthly maintenance completed"
```
### Backup Procedures
#### Data Backup Script
```bash
#!/bin/bash
# backup-bzzz-data.sh
BACKUP_DIR="/backup/bzzz"
DATE=$(date +%Y%m%d_%H%M%S)
NODE_ID=$(docker exec bzzz-node cat /app/config/node_id)
echo "Starting backup for node: $NODE_ID"
# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"
# Backup configuration
docker cp bzzz-node:/app/config "$BACKUP_DIR/$DATE/config"
# Backup data directory
docker cp bzzz-node:/app/data "$BACKUP_DIR/$DATE/data"
# Backup logs
docker cp bzzz-node:/app/logs "$BACKUP_DIR/$DATE/logs"
# Create manifest
cat > "$BACKUP_DIR/$DATE/manifest.json" << EOF
{
"node_id": "$NODE_ID",
"backup_date": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"version": "2.0",
"components": ["config", "data", "logs"]
}
EOF
# Compress backup
cd "$BACKUP_DIR"
tar -czf "bzzz-backup-$NODE_ID-$DATE.tar.gz" "$DATE"
rm -rf "$DATE"
echo "Backup completed: bzzz-backup-$NODE_ID-$DATE.tar.gz"
```
#### Restore Procedure
```bash
#!/bin/bash
# restore-bzzz-data.sh
BACKUP_FILE="$1"
if [ -z "$BACKUP_FILE" ]; then
echo "Usage: $0 <backup-file.tar.gz>"
exit 1
fi
echo "Restoring from: $BACKUP_FILE"
# Stop service
docker stop bzzz-node
# Extract backup
tar -xzf "$BACKUP_FILE" -C /tmp/
# Find extracted directory
BACKUP_DIR=$(find /tmp -maxdepth 1 -type d -name "202*" | head -1)
# Restore configuration
docker cp "$BACKUP_DIR/config" bzzz-node:/app/
# Restore data
docker cp "$BACKUP_DIR/data" bzzz-node:/app/
# Start service
docker start bzzz-node
echo "Restore completed. Check service status."
```
## Troubleshooting
### Common Issues
#### Service Won't Start
```bash
# Check logs
docker logs bzzz-node
# Check configuration
docker exec bzzz-node /app/bzzz --config /app/config/config.yaml --validate
# Check permissions
docker exec bzzz-node ls -la /app/data
```
#### High Memory Usage
```bash
# Check Go memory stats
curl http://localhost:8080/api/debug/status | jq '.memory'
# Check DHT cache size
curl http://localhost:8080/api/dht/metrics | jq '.cache_size'
# Restart with memory limit
docker update --memory=512m bzzz-node
docker restart bzzz-node
```
#### Peer Connectivity Issues
```bash
# Check P2P status
curl http://localhost:8080/api/agent/peers
# Check network connectivity
docker exec bzzz-node netstat -an | grep 4001
# Check firewall rules
sudo ufw status | grep 4001
# Test bootstrap peers
docker exec bzzz-node ping bootstrap-1.bzzz.network
```
#### DHT Storage Problems
```bash
# Check DHT metrics
curl http://localhost:8080/api/dht/metrics
# Clear DHT cache
curl -X POST http://localhost:8080/api/debug/clear-cache
# Check disk space
docker exec bzzz-node df -h /app/data
```
### Performance Tuning
#### High Load Optimization
```yaml
# config.yaml adjustments for high load
dht:
cache_size: 10000 # Increase cache
cache_ttl: "30m" # Shorter TTL for fresher data
replication_factor: 5 # Higher replication
p2p:
max_connections: 1000 # More connections
api:
rate_limit: 5000 # Higher rate limit
timeout: "60s" # Longer timeout
```
#### Low Resource Optimization
```yaml
# config.yaml adjustments for resource-constrained environments
dht:
cache_size: 100 # Smaller cache
cache_ttl: "2h" # Longer TTL
replication_factor: 2 # Lower replication
p2p:
max_connections: 50 # Fewer connections
logging:
level: "warn" # Less verbose logging
```
### Security Hardening
#### Production Security Checklist
- [ ] Change default ports
- [ ] Enable TLS for API endpoints
- [ ] Configure firewall rules
- [ ] Set up log monitoring
- [ ] Enable audit logging
- [ ] Rotate Age keys regularly
- [ ] Monitor for unusual admin elections
- [ ] Implement rate limiting
- [ ] Use non-root Docker user
- [ ] Regular security updates
#### Network Security
```bash
# Firewall configuration
sudo ufw allow 22 # SSH
sudo ufw allow 8080/tcp # BZZZ API
sudo ufw allow 4001/tcp # P2P networking
sudo ufw enable
# Docker security
docker run --security-opt no-new-privileges \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=1g \
bzzz:latest
```
---
## Cross-References
- **[User Manual](USER_MANUAL.md)** - Basic usage and configuration
- **[Developer Guide](DEVELOPER.md)** - Development and testing procedures
- **[Architecture Documentation](ARCHITECTURE.md)** - System design and deployment patterns
- **[Technical Report](TECHNICAL_REPORT.md)** - Performance characteristics and scaling
- **[Security Documentation](SECURITY.md)** - Security best practices
**BZZZ Operations Guide v2.0** - Production deployment and maintenance procedures for Phase 2B unified architecture.