 065dddf8d5
			
		
	
	065dddf8d5
	
	
	
		
			
			- Add FUTURE_DEVELOPMENT.md with comprehensive v2 protocol specification - Add MCP integration design and implementation foundation - Add infrastructure and deployment configurations - Update system architecture for v2 evolution 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
		
			
				
	
	
		
			581 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			581 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # BZZZ v2 Deployment Runbook
 | |
| 
 | |
| ## Overview
 | |
| 
 | |
| This runbook provides step-by-step procedures for deploying, operating, and maintaining BZZZ v2 infrastructure. It covers normal operations, emergency procedures, and troubleshooting guidelines.
 | |
| 
 | |
| ## Prerequisites
 | |
| 
 | |
| ### System Requirements
 | |
| 
 | |
| - **Cluster**: 3 nodes (WALNUT, IRONWOOD, ACACIA)
 | |
| - **OS**: Ubuntu 22.04 LTS or newer
 | |
| - **Docker**: Version 24+ with Swarm mode enabled
 | |
| - **Storage**: NFS mount at `/rust/` with 500GB+ available
 | |
| - **Network**: Internal 192.168.1.0/24 with external internet access
 | |
| - **Secrets**: OpenAI API key and database credentials
 | |
| 
 | |
| ### Access Requirements
 | |
| 
 | |
| - SSH access to all cluster nodes
 | |
| - Docker Swarm manager privileges
 | |
| - Sudo access for system configuration
 | |
| - GitLab access for CI/CD pipeline management
 | |
| 
 | |
| ## Pre-Deployment Checklist
 | |
| 
 | |
| ### Infrastructure Verification
 | |
| 
 | |
| ```bash
 | |
| # Verify Docker Swarm status
 | |
| docker node ls
 | |
| docker network ls | grep tengig
 | |
| 
 | |
| # Check available storage
 | |
| df -h /rust/
 | |
| 
 | |
| # Verify network connectivity
 | |
| ping -c 3 192.168.1.27  # WALNUT
 | |
| ping -c 3 192.168.1.113 # IRONWOOD  
 | |
| ping -c 3 192.168.1.xxx # ACACIA
 | |
| 
 | |
| # Test registry access
 | |
| docker pull registry.home.deepblack.cloud/hello-world || echo "Registry access test"
 | |
| ```
 | |
| 
 | |
| ### Security Hardening
 | |
| 
 | |
| ```bash
 | |
| # Run security hardening script
 | |
| cd /home/tony/chorus/project-queues/active/BZZZ/infrastructure/security
 | |
| sudo ./security-hardening.sh
 | |
| 
 | |
| # Verify firewall status
 | |
| sudo ufw status verbose
 | |
| 
 | |
| # Check fail2ban status
 | |
| sudo fail2ban-client status
 | |
| ```
 | |
| 
 | |
| ## Deployment Procedures
 | |
| 
 | |
| ### 1. Initial Deployment (Fresh Install)
 | |
| 
 | |
| #### Step 1: Prepare Infrastructure
 | |
| 
 | |
| ```bash
 | |
| # Create directory structure
 | |
| mkdir -p /rust/bzzz-v2/{config,data,logs,backup}
 | |
| mkdir -p /rust/bzzz-v2/data/{blobs,conversations,dht,postgres,redis}
 | |
| mkdir -p /rust/bzzz-v2/config/{swarm,monitoring,security}
 | |
| 
 | |
| # Set permissions
 | |
| sudo chown -R tony:tony /rust/bzzz-v2
 | |
| chmod -R 755 /rust/bzzz-v2
 | |
| ```
 | |
| 
 | |
| #### Step 2: Configure Secrets and Configs
 | |
| 
 | |
| ```bash
 | |
| cd /home/tony/chorus/project-queues/active/BZZZ/infrastructure
 | |
| 
 | |
| # Create Docker secrets
 | |
| docker secret create bzzz_postgres_password config/secrets/postgres_password
 | |
| docker secret create bzzz_openai_api_key ~/chorus/business/secrets/openai-api-key
 | |
| docker secret create bzzz_grafana_admin_password config/secrets/grafana_admin_password
 | |
| 
 | |
| # Create Docker configs
 | |
| docker config create bzzz_v2_config config/bzzz-config.yaml
 | |
| docker config create bzzz_prometheus_config monitoring/configs/prometheus.yml
 | |
| docker config create bzzz_alertmanager_config monitoring/configs/alertmanager.yml
 | |
| ```
 | |
| 
 | |
| #### Step 3: Deploy Core Services
 | |
| 
 | |
| ```bash
 | |
| # Deploy main BZZZ v2 stack
 | |
| docker stack deploy -c docker-compose.swarm.yml bzzz-v2
 | |
| 
 | |
| # Wait for services to start (this may take 5-10 minutes)
 | |
| watch docker stack ps bzzz-v2
 | |
| ```
 | |
| 
 | |
| #### Step 4: Deploy Monitoring Stack
 | |
| 
 | |
| ```bash
 | |
| # Deploy monitoring services
 | |
| docker stack deploy -c monitoring/docker-compose.monitoring.yml bzzz-monitoring
 | |
| 
 | |
| # Verify monitoring services
 | |
| curl -f http://localhost:9090/-/healthy  # Prometheus
 | |
| curl -f http://localhost:3000/api/health # Grafana
 | |
| ```
 | |
| 
 | |
| #### Step 5: Verify Deployment
 | |
| 
 | |
| ```bash
 | |
| # Check all services are running
 | |
| docker service ls --filter label=com.docker.stack.namespace=bzzz-v2
 | |
| 
 | |
| # Test external endpoints
 | |
| curl -f https://bzzz.deepblack.cloud/health
 | |
| curl -f https://mcp.deepblack.cloud/health
 | |
| curl -f https://resolve.deepblack.cloud/health
 | |
| 
 | |
| # Check P2P mesh connectivity
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_bzzz-agent | head -1) \
 | |
|   curl -s http://localhost:9000/api/v2/peers | jq '.connected_peers | length'
 | |
| ```
 | |
| 
 | |
| ### 2. Update Deployment (Rolling Update)
 | |
| 
 | |
| #### Step 1: Pre-Update Checks
 | |
| 
 | |
| ```bash
 | |
| # Check current deployment health
 | |
| docker stack ps bzzz-v2 | grep -v "Shutdown\|Failed"
 | |
| 
 | |
| # Backup current configuration
 | |
| mkdir -p /rust/bzzz-v2/backup/$(date +%Y%m%d-%H%M%S)
 | |
| docker config ls | grep bzzz_ > /rust/bzzz-v2/backup/$(date +%Y%m%d-%H%M%S)/configs.txt
 | |
| docker secret ls | grep bzzz_ > /rust/bzzz-v2/backup/$(date +%Y%m%d-%H%M%S)/secrets.txt
 | |
| ```
 | |
| 
 | |
| #### Step 2: Update Images
 | |
| 
 | |
| ```bash
 | |
| # Update to new image version
 | |
| export NEW_IMAGE_TAG="v2.1.0"
 | |
| 
 | |
| # Update Docker Compose file with new image tags
 | |
| sed -i "s/registry.home.deepblack.cloud\/bzzz:.*$/registry.home.deepblack.cloud\/bzzz:${NEW_IMAGE_TAG}/g" \
 | |
|   docker-compose.swarm.yml
 | |
| 
 | |
| # Deploy updated stack (rolling update)
 | |
| docker stack deploy -c docker-compose.swarm.yml bzzz-v2
 | |
| ```
 | |
| 
 | |
| #### Step 3: Monitor Update Progress
 | |
| 
 | |
| ```bash
 | |
| # Watch rolling update progress
 | |
| watch "docker service ps bzzz-v2_bzzz-agent | head -20"
 | |
| 
 | |
| # Check for any failed updates
 | |
| docker service ps bzzz-v2_bzzz-agent --filter desired-state=running --filter current-state=failed
 | |
| ```
 | |
| 
 | |
| ### 3. Migration from v1 to v2
 | |
| 
 | |
| ```bash
 | |
| # Use the automated migration script
 | |
| cd /home/tony/chorus/project-queues/active/BZZZ/infrastructure/migration-scripts
 | |
| 
 | |
| # Dry run first to preview changes
 | |
| ./migrate-v1-to-v2.sh --dry-run
 | |
| 
 | |
| # Execute full migration
 | |
| ./migrate-v1-to-v2.sh
 | |
| 
 | |
| # If rollback is needed
 | |
| ./migrate-v1-to-v2.sh --rollback
 | |
| ```
 | |
| 
 | |
| ## Monitoring and Health Checks
 | |
| 
 | |
| ### Health Check Commands
 | |
| 
 | |
| ```bash
 | |
| # Service health checks
 | |
| docker service ls --filter label=com.docker.stack.namespace=bzzz-v2
 | |
| docker service ps bzzz-v2_bzzz-agent --filter desired-state=running
 | |
| 
 | |
| # Application health checks
 | |
| curl -f https://bzzz.deepblack.cloud/health
 | |
| curl -f https://mcp.deepblack.cloud/health
 | |
| curl -f https://resolve.deepblack.cloud/health
 | |
| curl -f https://openai.deepblack.cloud/health
 | |
| 
 | |
| # P2P network health
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_bzzz-agent | head -1) \
 | |
|   curl -s http://localhost:9000/api/v2/dht/stats | jq '.'
 | |
| 
 | |
| # Database connectivity
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_postgres) \
 | |
|   pg_isready -U bzzz -d bzzz_v2
 | |
| 
 | |
| # Cache connectivity  
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_redis) \
 | |
|   redis-cli ping
 | |
| ```
 | |
| 
 | |
| ### Performance Monitoring
 | |
| 
 | |
| ```bash
 | |
| # Check resource usage
 | |
| docker stats --no-stream
 | |
| 
 | |
| # Monitor disk usage
 | |
| df -h /rust/bzzz-v2/data/
 | |
| 
 | |
| # Check network connections
 | |
| netstat -tuln | grep -E ":(9000|3001|3002|3003|9101|9102|9103)"
 | |
| 
 | |
| # Monitor OpenAI API usage
 | |
| curl -s http://localhost:9203/metrics | grep openai_cost
 | |
| ```
 | |
| 
 | |
| ## Troubleshooting Guide
 | |
| 
 | |
| ### Common Issues and Solutions
 | |
| 
 | |
| #### 1. Service Won't Start
 | |
| 
 | |
| **Symptoms:** Service stuck in `preparing` or constantly restarting
 | |
| 
 | |
| **Diagnosis:**
 | |
| ```bash
 | |
| # Check service logs
 | |
| docker service logs bzzz-v2_bzzz-agent --tail 50
 | |
| 
 | |
| # Check node resources
 | |
| docker node ls
 | |
| docker system df
 | |
| 
 | |
| # Verify secrets and configs
 | |
| docker secret ls | grep bzzz_
 | |
| docker config ls | grep bzzz_
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| - Check resource constraints and availability
 | |
| - Verify secrets and configs are accessible
 | |
| - Ensure image is available and correct
 | |
| - Check node labels and placement constraints
 | |
| 
 | |
| #### 2. P2P Network Issues
 | |
| 
 | |
| **Symptoms:** Agents not discovering each other, DHT lookups failing
 | |
| 
 | |
| **Diagnosis:**
 | |
| ```bash
 | |
| # Check peer connections
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_bzzz-agent | head -1) \
 | |
|   curl -s http://localhost:9000/api/v2/peers
 | |
| 
 | |
| # Check DHT bootstrap nodes
 | |
| curl http://localhost:9101/health
 | |
| curl http://localhost:9102/health  
 | |
| curl http://localhost:9103/health
 | |
| 
 | |
| # Check network connectivity
 | |
| docker network inspect bzzz-internal
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| - Restart DHT bootstrap services
 | |
| - Check firewall rules for P2P ports
 | |
| - Verify Docker Swarm overlay network
 | |
| - Check for port conflicts
 | |
| 
 | |
| #### 3. High OpenAI Costs
 | |
| 
 | |
| **Symptoms:** Cost alerts triggering, rate limits being hit
 | |
| 
 | |
| **Diagnosis:**
 | |
| ```bash
 | |
| # Check current usage
 | |
| curl -s http://localhost:9203/metrics | grep -E "openai_(cost|requests|tokens)"
 | |
| 
 | |
| # Check rate limiting
 | |
| docker service logs bzzz-v2_openai-proxy --tail 100 | grep "rate limit"
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| - Adjust rate limiting parameters
 | |
| - Review conversation patterns for excessive API calls
 | |
| - Implement request caching
 | |
| - Consider model selection optimization
 | |
| 
 | |
| #### 4. Database Connection Issues
 | |
| 
 | |
| **Symptoms:** Service errors related to database connectivity
 | |
| 
 | |
| **Diagnosis:**
 | |
| ```bash
 | |
| # Check PostgreSQL status
 | |
| docker service logs bzzz-v2_postgres --tail 50
 | |
| 
 | |
| # Test connection from agent
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_bzzz-agent | head -1) \
 | |
|   pg_isready -h postgres -U bzzz
 | |
| 
 | |
| # Check connection limits
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_postgres) \
 | |
|   psql -U bzzz -d bzzz_v2 -c "SELECT count(*) FROM pg_stat_activity;"
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| - Restart PostgreSQL service
 | |
| - Check connection pool settings
 | |
| - Increase max_connections if needed
 | |
| - Review long-running queries
 | |
| 
 | |
| #### 5. Storage Issues
 | |
| 
 | |
| **Symptoms:** Disk full alerts, content store errors
 | |
| 
 | |
| **Diagnosis:**
 | |
| ```bash
 | |
| # Check disk usage
 | |
| df -h /rust/bzzz-v2/data/
 | |
| du -sh /rust/bzzz-v2/data/blobs/
 | |
| 
 | |
| # Check content store health
 | |
| curl -s http://localhost:9202/metrics | grep content_store
 | |
| ```
 | |
| 
 | |
| **Solutions:**
 | |
| - Run garbage collection on old blobs
 | |
| - Clean up old conversation threads
 | |
| - Increase storage capacity
 | |
| - Adjust retention policies
 | |
| 
 | |
| ## Emergency Procedures
 | |
| 
 | |
| ### Service Outage Response
 | |
| 
 | |
| #### Priority 1: Complete Service Outage
 | |
| 
 | |
| ```bash
 | |
| # 1. Check cluster status
 | |
| docker node ls
 | |
| docker service ls --filter label=com.docker.stack.namespace=bzzz-v2
 | |
| 
 | |
| # 2. Emergency restart of critical services
 | |
| docker service update --force bzzz-v2_bzzz-agent
 | |
| docker service update --force bzzz-v2_postgres
 | |
| docker service update --force bzzz-v2_redis
 | |
| 
 | |
| # 3. If stack is corrupted, redeploy
 | |
| docker stack rm bzzz-v2
 | |
| sleep 60
 | |
| docker stack deploy -c docker-compose.swarm.yml bzzz-v2
 | |
| 
 | |
| # 4. Monitor recovery
 | |
| watch docker stack ps bzzz-v2
 | |
| ```
 | |
| 
 | |
| #### Priority 2: Partial Service Degradation
 | |
| 
 | |
| ```bash
 | |
| # 1. Identify problematic services
 | |
| docker service ps bzzz-v2_bzzz-agent --filter desired-state=running --filter current-state=failed
 | |
| 
 | |
| # 2. Scale up healthy replicas
 | |
| docker service update --replicas 3 bzzz-v2_bzzz-agent
 | |
| 
 | |
| # 3. Remove unhealthy tasks
 | |
| docker service update --force bzzz-v2_bzzz-agent
 | |
| ```
 | |
| 
 | |
| ### Security Incident Response
 | |
| 
 | |
| #### Step 1: Immediate Containment
 | |
| 
 | |
| ```bash
 | |
| # 1. Block suspicious IPs
 | |
| sudo ufw insert 1 deny from SUSPICIOUS_IP
 | |
| 
 | |
| # 2. Check for compromise indicators
 | |
| sudo fail2ban-client status
 | |
| sudo tail -100 /var/log/audit/audit.log | grep -i "denied\|failed\|error"
 | |
| 
 | |
| # 3. Isolate affected services
 | |
| docker service update --replicas 0 AFFECTED_SERVICE
 | |
| ```
 | |
| 
 | |
| #### Step 2: Investigation
 | |
| 
 | |
| ```bash
 | |
| # 1. Check access logs
 | |
| docker service logs bzzz-v2_bzzz-agent --since 1h | grep -i "error\|failed\|unauthorized"
 | |
| 
 | |
| # 2. Review monitoring alerts
 | |
| curl -s http://localhost:9093/api/v1/alerts | jq '.data[] | select(.state=="firing")'
 | |
| 
 | |
| # 3. Examine network connections
 | |
| netstat -tuln
 | |
| ss -tulpn | grep -E ":(9000|3001|3002|3003)"
 | |
| ```
 | |
| 
 | |
| #### Step 3: Recovery
 | |
| 
 | |
| ```bash
 | |
| # 1. Update security rules
 | |
| ./infrastructure/security/security-hardening.sh
 | |
| 
 | |
| # 2. Rotate secrets if compromised
 | |
| docker secret rm bzzz_postgres_password
 | |
| openssl rand -base64 32 | docker secret create bzzz_postgres_password -
 | |
| 
 | |
| # 3. Restart services with new secrets
 | |
| docker stack deploy -c docker-compose.swarm.yml bzzz-v2
 | |
| ```
 | |
| 
 | |
| ### Data Recovery Procedures
 | |
| 
 | |
| #### Backup Restoration
 | |
| 
 | |
| ```bash
 | |
| # 1. Stop services
 | |
| docker stack rm bzzz-v2
 | |
| 
 | |
| # 2. Restore from backup
 | |
| BACKUP_DATE="20241201-120000"
 | |
| rsync -av /rust/bzzz-v2/backup/$BACKUP_DATE/ /rust/bzzz-v2/data/
 | |
| 
 | |
| # 3. Restart services
 | |
| docker stack deploy -c docker-compose.swarm.yml bzzz-v2
 | |
| ```
 | |
| 
 | |
| #### Database Recovery
 | |
| 
 | |
| ```bash
 | |
| # 1. Stop application services
 | |
| docker service scale bzzz-v2_bzzz-agent=0
 | |
| 
 | |
| # 2. Create database backup
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_postgres) \
 | |
|   pg_dump -U bzzz bzzz_v2 > /rust/bzzz-v2/backup/database-$(date +%Y%m%d-%H%M%S).sql
 | |
| 
 | |
| # 3. Restore database
 | |
| docker exec -i $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_postgres) \
 | |
|   psql -U bzzz -d bzzz_v2 < /rust/bzzz-v2/backup/database-backup.sql
 | |
| 
 | |
| # 4. Restart application services
 | |
| docker service scale bzzz-v2_bzzz-agent=3
 | |
| ```
 | |
| 
 | |
| ## Maintenance Procedures
 | |
| 
 | |
| ### Routine Maintenance (Weekly)
 | |
| 
 | |
| ```bash
 | |
| #!/bin/bash
 | |
| # Weekly maintenance script
 | |
| 
 | |
| # 1. Check service health
 | |
| docker service ls --filter label=com.docker.stack.namespace=bzzz-v2
 | |
| docker system df
 | |
| 
 | |
| # 2. Clean up unused resources
 | |
| docker system prune -f
 | |
| docker volume prune -f
 | |
| 
 | |
| # 3. Backup critical data
 | |
| pg_dump -h localhost -U bzzz bzzz_v2 | gzip > \
 | |
|   /rust/bzzz-v2/backup/weekly-db-$(date +%Y%m%d).sql.gz
 | |
| 
 | |
| # 4. Rotate logs
 | |
| find /rust/bzzz-v2/logs -name "*.log" -mtime +7 -delete
 | |
| 
 | |
| # 5. Check certificate expiration
 | |
| openssl x509 -in /rust/bzzz-v2/config/tls/server/walnut.pem -noout -dates
 | |
| 
 | |
| # 6. Update security rules
 | |
| fail2ban-client reload
 | |
| 
 | |
| # 7. Generate maintenance report
 | |
| echo "Maintenance completed on $(date)" >> /rust/bzzz-v2/logs/maintenance.log
 | |
| ```
 | |
| 
 | |
| ### Scaling Procedures
 | |
| 
 | |
| #### Scale Up
 | |
| 
 | |
| ```bash
 | |
| # Increase replica count
 | |
| docker service scale bzzz-v2_bzzz-agent=5
 | |
| docker service scale bzzz-v2_mcp-server=5
 | |
| 
 | |
| # Add new node to cluster (run on new node)
 | |
| docker swarm join --token $WORKER_TOKEN $MANAGER_IP:2377
 | |
| 
 | |
| # Label new node
 | |
| docker node update --label-add bzzz.role=agent NEW_NODE_HOSTNAME
 | |
| ```
 | |
| 
 | |
| #### Scale Down
 | |
| 
 | |
| ```bash
 | |
| # Gracefully reduce replicas
 | |
| docker service scale bzzz-v2_bzzz-agent=2
 | |
| docker service scale bzzz-v2_mcp-server=2
 | |
| 
 | |
| # Remove node from cluster
 | |
| docker node update --availability drain NODE_HOSTNAME
 | |
| docker node rm NODE_HOSTNAME
 | |
| ```
 | |
| 
 | |
| ## Performance Tuning
 | |
| 
 | |
| ### Database Optimization
 | |
| 
 | |
| ```bash
 | |
| # PostgreSQL tuning
 | |
| docker exec $(docker ps -q -f label=com.docker.swarm.service.name=bzzz-v2_postgres) \
 | |
|   psql -U bzzz -d bzzz_v2 -c "
 | |
|     ALTER SYSTEM SET shared_buffers = '1GB';
 | |
|     ALTER SYSTEM SET max_connections = 200;
 | |
|     ALTER SYSTEM SET checkpoint_timeout = '15min';
 | |
|     SELECT pg_reload_conf();
 | |
|   "
 | |
| ```
 | |
| 
 | |
| ### Storage Optimization
 | |
| 
 | |
| ```bash
 | |
| # Content store optimization
 | |
| find /rust/bzzz-v2/data/blobs -name "*.tmp" -mtime +1 -delete
 | |
| find /rust/bzzz-v2/data/blobs -type f -size 0 -delete
 | |
| 
 | |
| # Compress old logs
 | |
| find /rust/bzzz-v2/logs -name "*.log" -mtime +3 -exec gzip {} \;
 | |
| ```
 | |
| 
 | |
| ### Network Optimization
 | |
| 
 | |
| ```bash
 | |
| # Optimize network buffer sizes
 | |
| echo 'net.core.rmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
 | |
| echo 'net.core.wmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
 | |
| echo 'net.ipv4.tcp_rmem = 4096 87380 134217728' | sudo tee -a /etc/sysctl.conf
 | |
| echo 'net.ipv4.tcp_wmem = 4096 65536 134217728' | sudo tee -a /etc/sysctl.conf
 | |
| sudo sysctl -p
 | |
| ```
 | |
| 
 | |
| ## Contact Information
 | |
| 
 | |
| ### On-Call Procedures
 | |
| 
 | |
| - **Primary Contact**: DevOps Team Lead
 | |
| - **Secondary Contact**: Senior Site Reliability Engineer  
 | |
| - **Escalation**: Platform Engineering Manager
 | |
| 
 | |
| ### Communication Channels
 | |
| 
 | |
| - **Slack**: #bzzz-incidents
 | |
| - **Email**: devops@deepblack.cloud
 | |
| - **Phone**: Emergency On-Call Rotation
 | |
| 
 | |
| ### Documentation
 | |
| 
 | |
| - **Runbooks**: This document
 | |
| - **Architecture**: `/docs/BZZZ_V2_INFRASTRUCTURE_ARCHITECTURE.md`
 | |
| - **API Documentation**: https://bzzz.deepblack.cloud/docs
 | |
| - **Monitoring Dashboards**: https://grafana.deepblack.cloud
 | |
| 
 | |
| ---
 | |
| 
 | |
| *This runbook should be reviewed and updated monthly. Last updated: $(date)* |