🎉 MAJOR MILESTONE: Complete BZZZ Phase 2B documentation and core implementation ## Documentation Suite (7,000+ lines) - ✅ User Manual: Comprehensive guide with practical examples - ✅ API Reference: Complete REST API documentation - ✅ SDK Documentation: Multi-language SDK guide (Go, Python, JS, Rust) - ✅ Developer Guide: Development setup and contribution procedures - ✅ Architecture Documentation: Detailed system design with ASCII diagrams - ✅ Technical Report: Performance analysis and benchmarks - ✅ Security Documentation: Comprehensive security model - ✅ Operations Guide: Production deployment and monitoring - ✅ Documentation Index: Cross-referenced navigation system ## SDK Examples & Integration - 🔧 Go SDK: Simple client, event streaming, crypto operations - 🐍 Python SDK: Async client with comprehensive examples - 📜 JavaScript SDK: Collaborative agent implementation - 🦀 Rust SDK: High-performance monitoring system - 📖 Multi-language README with setup instructions ## Core Implementation - 🔐 Age encryption implementation (pkg/crypto/age_crypto.go) - 🗂️ Shamir secret sharing (pkg/crypto/shamir.go) - 💾 DHT encrypted storage (pkg/dht/encrypted_storage.go) - 📤 UCXL decision publisher (pkg/ucxl/decision_publisher.go) - 🔄 Updated main.go with Phase 2B integration ## Project Organization - 📂 Moved legacy docs to old-docs/ directory - 🎯 Comprehensive README.md update with modern structure - 🔗 Full cross-reference system between all documentation - 📊 Production-ready deployment procedures ## Quality Assurance - ✅ All documentation cross-referenced and validated - ✅ Working code examples in multiple languages - ✅ Production deployment procedures tested - ✅ Security best practices implemented - ✅ Performance benchmarks documented Ready for production deployment and community adoption. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
569 lines
12 KiB
Markdown
569 lines
12 KiB
Markdown
# BZZZ Operations Guide
|
|
|
|
**Version 2.0 - Phase 2B Edition**
|
|
**Deployment, monitoring, and maintenance procedures**
|
|
|
|
## Quick Reference
|
|
|
|
- **[Docker Deployment](#docker-deployment)** - Containerized deployment
|
|
- **[Production Setup](#production-configuration)** - Production-ready configuration
|
|
- **[Monitoring](#monitoring--observability)** - Metrics and alerting
|
|
- **[Maintenance](#maintenance-procedures)** - Routine maintenance tasks
|
|
- **[Troubleshooting](#troubleshooting)** - Common issues and solutions
|
|
|
|
## Docker Deployment
|
|
|
|
### Single Node Development
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://github.com/anthonyrawlins/bzzz.git
|
|
cd bzzz
|
|
|
|
# Build Docker image
|
|
docker build -t bzzz:latest .
|
|
|
|
# Run single node
|
|
docker run -d \
|
|
--name bzzz-node \
|
|
-p 8080:8080 \
|
|
-p 4001:4001 \
|
|
-v $(pwd)/config:/app/config \
|
|
-v bzzz-data:/app/data \
|
|
bzzz:latest
|
|
```
|
|
|
|
### Docker Compose Cluster
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
version: '3.8'
|
|
services:
|
|
bzzz-node-1:
|
|
build: .
|
|
ports:
|
|
- "8080:8080"
|
|
- "4001:4001"
|
|
environment:
|
|
- BZZZ_NODE_ID=node-1
|
|
- BZZZ_ROLE=backend_developer
|
|
volumes:
|
|
- ./config:/app/config
|
|
- bzzz-data-1:/app/data
|
|
networks:
|
|
- bzzz-network
|
|
|
|
bzzz-node-2:
|
|
build: .
|
|
ports:
|
|
- "8081:8080"
|
|
- "4002:4001"
|
|
environment:
|
|
- BZZZ_NODE_ID=node-2
|
|
- BZZZ_ROLE=senior_software_architect
|
|
- BZZZ_BOOTSTRAP_PEERS=/dns/bzzz-node-1/tcp/4001
|
|
volumes:
|
|
- ./config:/app/config
|
|
- bzzz-data-2:/app/data
|
|
networks:
|
|
- bzzz-network
|
|
depends_on:
|
|
- bzzz-node-1
|
|
|
|
networks:
|
|
bzzz-network:
|
|
driver: bridge
|
|
|
|
volumes:
|
|
bzzz-data-1:
|
|
bzzz-data-2:
|
|
```
|
|
|
|
### Docker Swarm Production
|
|
|
|
```yaml
|
|
# docker-compose.swarm.yml
|
|
version: '3.8'
|
|
services:
|
|
bzzz:
|
|
image: bzzz:latest
|
|
deploy:
|
|
replicas: 3
|
|
placement:
|
|
constraints:
|
|
- node.role == worker
|
|
preferences:
|
|
- spread: node.id
|
|
resources:
|
|
limits:
|
|
memory: 512M
|
|
cpus: '1.0'
|
|
reservations:
|
|
memory: 256M
|
|
cpus: '0.5'
|
|
ports:
|
|
- "8080:8080"
|
|
environment:
|
|
- BZZZ_CLUSTER_MODE=true
|
|
networks:
|
|
- bzzz-overlay
|
|
volumes:
|
|
- bzzz-config:/app/config
|
|
- bzzz-data:/app/data
|
|
|
|
networks:
|
|
bzzz-overlay:
|
|
driver: overlay
|
|
encrypted: true
|
|
|
|
volumes:
|
|
bzzz-config:
|
|
external: true
|
|
bzzz-data:
|
|
external: true
|
|
```
|
|
|
|
## Production Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Core configuration
|
|
export BZZZ_NODE_ID="production-node-01"
|
|
export BZZZ_AGENT_ID="prod-agent-backend"
|
|
export BZZZ_ROLE="backend_developer"
|
|
|
|
# Network configuration
|
|
export BZZZ_API_HOST="0.0.0.0"
|
|
export BZZZ_API_PORT="8080"
|
|
export BZZZ_P2P_PORT="4001"
|
|
|
|
# Security configuration
|
|
export BZZZ_ADMIN_KEY_SHARES="5"
|
|
export BZZZ_ADMIN_KEY_THRESHOLD="3"
|
|
|
|
# Performance tuning
|
|
export BZZZ_DHT_CACHE_SIZE="1000"
|
|
export BZZZ_DHT_REPLICATION_FACTOR="3"
|
|
export BZZZ_MAX_CONNECTIONS="500"
|
|
```
|
|
|
|
### Production config.yaml
|
|
|
|
```yaml
|
|
node:
|
|
id: "${BZZZ_NODE_ID}"
|
|
data_dir: "/app/data"
|
|
|
|
agent:
|
|
id: "${BZZZ_AGENT_ID}"
|
|
role: "${BZZZ_ROLE}"
|
|
max_tasks: 10
|
|
|
|
api:
|
|
host: "${BZZZ_API_HOST}"
|
|
port: ${BZZZ_API_PORT}
|
|
cors_enabled: false
|
|
rate_limit: 1000
|
|
timeout: "30s"
|
|
|
|
p2p:
|
|
port: ${BZZZ_P2P_PORT}
|
|
bootstrap_peers:
|
|
- "/dns/bootstrap-1.bzzz.network/tcp/4001"
|
|
- "/dns/bootstrap-2.bzzz.network/tcp/4001"
|
|
max_connections: ${BZZZ_MAX_CONNECTIONS}
|
|
|
|
dht:
|
|
cache_size: ${BZZZ_DHT_CACHE_SIZE}
|
|
cache_ttl: "1h"
|
|
replication_factor: ${BZZZ_DHT_REPLICATION_FACTOR}
|
|
|
|
security:
|
|
admin_election_timeout: "30s"
|
|
heartbeat_interval: "5s"
|
|
shamir_shares: ${BZZZ_ADMIN_KEY_SHARES}
|
|
shamir_threshold: ${BZZZ_ADMIN_KEY_THRESHOLD}
|
|
|
|
logging:
|
|
level: "info"
|
|
format: "json"
|
|
file: "/app/logs/bzzz.log"
|
|
max_size: "100MB"
|
|
max_files: 10
|
|
```
|
|
|
|
## Monitoring & Observability
|
|
|
|
### Health Check Endpoint
|
|
|
|
```bash
|
|
# Basic health check
|
|
curl http://localhost:8080/health
|
|
|
|
# Detailed status
|
|
curl http://localhost:8080/api/agent/status
|
|
|
|
# DHT metrics
|
|
curl http://localhost:8080/api/dht/metrics
|
|
```
|
|
|
|
### Prometheus Metrics
|
|
|
|
Add to `prometheus.yml`:
|
|
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'bzzz'
|
|
static_configs:
|
|
- targets: ['localhost:8080']
|
|
metrics_path: '/metrics'
|
|
scrape_interval: 15s
|
|
```
|
|
|
|
### Grafana Dashboard
|
|
|
|
Import the BZZZ dashboard from `monitoring/grafana-dashboard.json`:
|
|
|
|
Key metrics to monitor:
|
|
- **Decision throughput** - Decisions published per minute
|
|
- **DHT performance** - Storage/retrieval latency
|
|
- **P2P connectivity** - Connected peers count
|
|
- **Memory usage** - Go runtime metrics
|
|
- **Election events** - Admin election frequency
|
|
|
|
### Log Aggregation
|
|
|
|
#### ELK Stack Configuration
|
|
|
|
```yaml
|
|
# filebeat.yml
|
|
filebeat.inputs:
|
|
- type: log
|
|
enabled: true
|
|
paths:
|
|
- /app/logs/bzzz.log
|
|
json.keys_under_root: true
|
|
json.add_error_key: true
|
|
|
|
output.elasticsearch:
|
|
hosts: ["elasticsearch:9200"]
|
|
index: "bzzz-%{+yyyy.MM.dd}"
|
|
|
|
logging.level: info
|
|
```
|
|
|
|
#### Structured Logging Query Examples
|
|
|
|
```json
|
|
# Find all admin elections
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"must": [
|
|
{"match": {"level": "info"}},
|
|
{"match": {"component": "election"}},
|
|
{"range": {"timestamp": {"gte": "now-1h"}}}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
|
|
# Find encryption errors
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"must": [
|
|
{"match": {"level": "error"}},
|
|
{"match": {"component": "crypto"}}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Maintenance Procedures
|
|
|
|
### Regular Maintenance Tasks
|
|
|
|
#### Daily Checks
|
|
```bash
|
|
#!/bin/bash
|
|
# daily-check.sh
|
|
|
|
echo "BZZZ Daily Health Check - $(date)"
|
|
|
|
# Check service status
|
|
echo "=== Service Status ==="
|
|
docker ps | grep bzzz
|
|
|
|
# Check API health
|
|
echo "=== API Health ==="
|
|
curl -s http://localhost:8080/health | jq .
|
|
|
|
# Check peer connectivity
|
|
echo "=== Peer Status ==="
|
|
curl -s http://localhost:8080/api/agent/peers | jq '.connected_peers | length'
|
|
|
|
# Check recent errors
|
|
echo "=== Recent Errors ==="
|
|
docker logs bzzz-node --since=24h | grep ERROR | tail -5
|
|
|
|
echo "Daily check completed"
|
|
```
|
|
|
|
#### Weekly Tasks
|
|
```bash
|
|
#!/bin/bash
|
|
# weekly-maintenance.sh
|
|
|
|
echo "BZZZ Weekly Maintenance - $(date)"
|
|
|
|
# Rotate logs
|
|
docker exec bzzz-node logrotate /app/config/logrotate.conf
|
|
|
|
# Check disk usage
|
|
echo "=== Disk Usage ==="
|
|
docker exec bzzz-node df -h /app/data
|
|
|
|
# DHT metrics review
|
|
echo "=== DHT Metrics ==="
|
|
curl -s http://localhost:8080/api/dht/metrics | jq '.stored_items, .cache_hit_rate'
|
|
|
|
# Database cleanup (if needed)
|
|
docker exec bzzz-node /app/scripts/cleanup-old-data.sh
|
|
|
|
echo "Weekly maintenance completed"
|
|
```
|
|
|
|
#### Monthly Tasks
|
|
```bash
|
|
#!/bin/bash
|
|
# monthly-maintenance.sh
|
|
|
|
echo "BZZZ Monthly Maintenance - $(date)"
|
|
|
|
# Full backup
|
|
./backup-bzzz-data.sh
|
|
|
|
# Performance review
|
|
echo "=== Performance Metrics ==="
|
|
curl -s http://localhost:8080/api/debug/status | jq '.performance'
|
|
|
|
# Security audit
|
|
echo "=== Security Check ==="
|
|
./scripts/security-audit.sh
|
|
|
|
# Update dependencies (if needed)
|
|
echo "=== Dependency Check ==="
|
|
docker exec bzzz-node go list -m -u all
|
|
|
|
echo "Monthly maintenance completed"
|
|
```
|
|
|
|
### Backup Procedures
|
|
|
|
#### Data Backup Script
|
|
```bash
|
|
#!/bin/bash
|
|
# backup-bzzz-data.sh
|
|
|
|
BACKUP_DIR="/backup/bzzz"
|
|
DATE=$(date +%Y%m%d_%H%M%S)
|
|
NODE_ID=$(docker exec bzzz-node cat /app/config/node_id)
|
|
|
|
echo "Starting backup for node: $NODE_ID"
|
|
|
|
# Create backup directory
|
|
mkdir -p "$BACKUP_DIR/$DATE"
|
|
|
|
# Backup configuration
|
|
docker cp bzzz-node:/app/config "$BACKUP_DIR/$DATE/config"
|
|
|
|
# Backup data directory
|
|
docker cp bzzz-node:/app/data "$BACKUP_DIR/$DATE/data"
|
|
|
|
# Backup logs
|
|
docker cp bzzz-node:/app/logs "$BACKUP_DIR/$DATE/logs"
|
|
|
|
# Create manifest
|
|
cat > "$BACKUP_DIR/$DATE/manifest.json" << EOF
|
|
{
|
|
"node_id": "$NODE_ID",
|
|
"backup_date": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
|
|
"version": "2.0",
|
|
"components": ["config", "data", "logs"]
|
|
}
|
|
EOF
|
|
|
|
# Compress backup
|
|
cd "$BACKUP_DIR"
|
|
tar -czf "bzzz-backup-$NODE_ID-$DATE.tar.gz" "$DATE"
|
|
rm -rf "$DATE"
|
|
|
|
echo "Backup completed: bzzz-backup-$NODE_ID-$DATE.tar.gz"
|
|
```
|
|
|
|
#### Restore Procedure
|
|
```bash
|
|
#!/bin/bash
|
|
# restore-bzzz-data.sh
|
|
|
|
BACKUP_FILE="$1"
|
|
if [ -z "$BACKUP_FILE" ]; then
|
|
echo "Usage: $0 <backup-file.tar.gz>"
|
|
exit 1
|
|
fi
|
|
|
|
echo "Restoring from: $BACKUP_FILE"
|
|
|
|
# Stop service
|
|
docker stop bzzz-node
|
|
|
|
# Extract backup
|
|
tar -xzf "$BACKUP_FILE" -C /tmp/
|
|
|
|
# Find extracted directory
|
|
BACKUP_DIR=$(find /tmp -maxdepth 1 -type d -name "202*" | head -1)
|
|
|
|
# Restore configuration
|
|
docker cp "$BACKUP_DIR/config" bzzz-node:/app/
|
|
|
|
# Restore data
|
|
docker cp "$BACKUP_DIR/data" bzzz-node:/app/
|
|
|
|
# Start service
|
|
docker start bzzz-node
|
|
|
|
echo "Restore completed. Check service status."
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Service Won't Start
|
|
```bash
|
|
# Check logs
|
|
docker logs bzzz-node
|
|
|
|
# Check configuration
|
|
docker exec bzzz-node /app/bzzz --config /app/config/config.yaml --validate
|
|
|
|
# Check permissions
|
|
docker exec bzzz-node ls -la /app/data
|
|
```
|
|
|
|
#### High Memory Usage
|
|
```bash
|
|
# Check Go memory stats
|
|
curl http://localhost:8080/api/debug/status | jq '.memory'
|
|
|
|
# Check DHT cache size
|
|
curl http://localhost:8080/api/dht/metrics | jq '.cache_size'
|
|
|
|
# Restart with memory limit
|
|
docker update --memory=512m bzzz-node
|
|
docker restart bzzz-node
|
|
```
|
|
|
|
#### Peer Connectivity Issues
|
|
```bash
|
|
# Check P2P status
|
|
curl http://localhost:8080/api/agent/peers
|
|
|
|
# Check network connectivity
|
|
docker exec bzzz-node netstat -an | grep 4001
|
|
|
|
# Check firewall rules
|
|
sudo ufw status | grep 4001
|
|
|
|
# Test bootstrap peers
|
|
docker exec bzzz-node ping bootstrap-1.bzzz.network
|
|
```
|
|
|
|
#### DHT Storage Problems
|
|
```bash
|
|
# Check DHT metrics
|
|
curl http://localhost:8080/api/dht/metrics
|
|
|
|
# Clear DHT cache
|
|
curl -X POST http://localhost:8080/api/debug/clear-cache
|
|
|
|
# Check disk space
|
|
docker exec bzzz-node df -h /app/data
|
|
```
|
|
|
|
### Performance Tuning
|
|
|
|
#### High Load Optimization
|
|
```yaml
|
|
# config.yaml adjustments for high load
|
|
dht:
|
|
cache_size: 10000 # Increase cache
|
|
cache_ttl: "30m" # Shorter TTL for fresher data
|
|
replication_factor: 5 # Higher replication
|
|
|
|
p2p:
|
|
max_connections: 1000 # More connections
|
|
|
|
api:
|
|
rate_limit: 5000 # Higher rate limit
|
|
timeout: "60s" # Longer timeout
|
|
```
|
|
|
|
#### Low Resource Optimization
|
|
```yaml
|
|
# config.yaml adjustments for resource-constrained environments
|
|
dht:
|
|
cache_size: 100 # Smaller cache
|
|
cache_ttl: "2h" # Longer TTL
|
|
replication_factor: 2 # Lower replication
|
|
|
|
p2p:
|
|
max_connections: 50 # Fewer connections
|
|
|
|
logging:
|
|
level: "warn" # Less verbose logging
|
|
```
|
|
|
|
### Security Hardening
|
|
|
|
#### Production Security Checklist
|
|
- [ ] Change default ports
|
|
- [ ] Enable TLS for API endpoints
|
|
- [ ] Configure firewall rules
|
|
- [ ] Set up log monitoring
|
|
- [ ] Enable audit logging
|
|
- [ ] Rotate Age keys regularly
|
|
- [ ] Monitor for unusual admin elections
|
|
- [ ] Implement rate limiting
|
|
- [ ] Use non-root Docker user
|
|
- [ ] Regular security updates
|
|
|
|
#### Network Security
|
|
```bash
|
|
# Firewall configuration
|
|
sudo ufw allow 22 # SSH
|
|
sudo ufw allow 8080/tcp # BZZZ API
|
|
sudo ufw allow 4001/tcp # P2P networking
|
|
sudo ufw enable
|
|
|
|
# Docker security
|
|
docker run --security-opt no-new-privileges \
|
|
--read-only \
|
|
--tmpfs /tmp:rw,noexec,nosuid,size=1g \
|
|
bzzz:latest
|
|
```
|
|
|
|
---
|
|
|
|
## Cross-References
|
|
|
|
- **[User Manual](USER_MANUAL.md)** - Basic usage and configuration
|
|
- **[Developer Guide](DEVELOPER.md)** - Development and testing procedures
|
|
- **[Architecture Documentation](ARCHITECTURE.md)** - System design and deployment patterns
|
|
- **[Technical Report](TECHNICAL_REPORT.md)** - Performance characteristics and scaling
|
|
- **[Security Documentation](SECURITY.md)** - Security best practices
|
|
|
|
**BZZZ Operations Guide v2.0** - Production deployment and maintenance procedures for Phase 2B unified architecture. |