tony/bzzz

Files

anthonyrawlins be761cfe20 Enhance deployment system with retry functionality and improved UX

Major Improvements:
- Added retry deployment buttons in machine list for failed deployments
- Added retry button in SSH console modal footer for enhanced UX
- Enhanced deployment process with comprehensive cleanup of existing services
- Improved binary installation with password-based sudo authentication
- Updated configuration generation to include all required sections (agent, ai, network, security)
- Fixed deployment verification and error handling

Security Enhancements:
- Enhanced verifiedStopExistingServices with thorough cleanup process
- Improved binary copying with proper sudo authentication
- Added comprehensive configuration validation

UX Improvements:
- Users can retry deployments without re-running machine discovery
- Retry buttons available from both machine list and console modal
- Real-time deployment progress with detailed console output
- Clear error states with actionable retry options

Technical Changes:
- Modified ServiceDeployment.tsx with retry button components
- Enhanced api/setup_manager.go with improved deployment functions
- Updated main.go with command line argument support (--config, --setup)
- Added comprehensive zero-trust security validation system

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-31 10:23:27 +10:00

30 KiB

Raw Permalink Blame History

BZZZ Human Agent Portal (HAP) - Phase 1 Technical Specification

Version: 1.0
Author: Senior Software Architect
Date: 2025-08-29

Executive Summary

This specification defines the detailed technical architecture for transforming the BZZZ autonomous agent system from a monolithic single-binary architecture into a dual-binary system supporting both autonomous agents (bzzz-agent) and human agent portals (bzzz-hap) while maintaining all existing functionality and P2P infrastructure.

1. Current Architecture Analysis

1.1 Existing Monolithic Structure

BZZZ/
├── main.go                    # Single entry point (1,663 lines)
├── pkg/                       # 14 major subsystems
│   ├── agentid/              # Agent identity and crypto
│   ├── config/               # Configuration management  
│   ├── crypto/               # Age encryption, Shamir shares
│   ├── dht/                  # Distributed hash table
│   ├── election/             # Admin election system
│   ├── health/               # Health monitoring
│   ├── slurp/                # Task coordination (7 submodules)
│   ├── ucxi/                 # Context resolution server
│   ├── ucxl/                 # Universal Context eXchange Language
│   └── [9 other subsystems]
├── p2p/                      # libp2p networking
├── pubsub/                   # HMMM collaborative messaging
├── api/                      # HTTP API server
└── coordinator/              # Task coordination

1.2 Key Shared Infrastructure Components

P2P Mesh: libp2p with mDNS discovery
Agent Identity: Cryptographic agent records with role-based access
HMMM Messaging: Collaborative reasoning protocol integration
DHT Storage: Distributed storage with Age encryption
UCXL System: Context resolution and addressing
SLURP Coordination: Task distribution and leadership election
Configuration: YAML-based role definitions and capabilities

2. Multi-Binary Architecture Design

2.1 Target Structure

BZZZ/
├── cmd/
│   ├── agent/
│   │   └── main.go           # Autonomous agent binary entry point
│   └── hap/
│       └── main.go           # Human agent portal binary entry point
├── internal/
│   └── common/
│       └── runtime/          # Shared initialization and runtime components
│           ├── agent.go      # Agent identity and role initialization
│           ├── config.go     # Configuration loading and validation
│           ├── p2p.go        # P2P node initialization
│           ├── services.go   # Core service initialization
│           ├── storage.go    # DHT and encrypted storage setup
│           └── shutdown.go   # Graceful shutdown management
├── internal/
│   ├── agent/               # Autonomous agent specific code
│   │   ├── runner.go        # Agent execution loop
│   │   └── handlers.go      # Autonomous task handlers
│   └── hap/                 # Human agent portal specific code
│       ├── terminal/        # Terminal interface
│       ├── forms/           # Message composition templates
│       ├── context/         # UCXL browsing interface
│       └── prompts/         # Human interaction prompts
├── pkg/                     # Unchanged - shared libraries
└── [existing directories]   # Unchanged

2.2 Build System Enhancement

# Updated Makefile targets
build-agent: build-ui embed-ui
	CGO_ENABLED=0 go build -ldflags="-s -w" -o $(BUILD_DIR)/bzzz-agent ./cmd/agent

build-hap: build-ui embed-ui  
	CGO_ENABLED=0 go build -ldflags="-s -w" -o $(BUILD_DIR)/bzzz-hap ./cmd/hap

build: build-agent build-hap

3. Shared Runtime Architecture

3.1 Runtime Initialization Pipeline

// internal/common/runtime/services.go
type RuntimeServices struct {
    Config           *config.Config
    Node             *p2p.Node  
    PubSub          *pubsub.PubSub
    DHT             *dht.LibP2PDHT
    EncryptedStorage *dht.EncryptedDHTStorage
    ElectionManager  *election.ElectionManager
    HealthManager    *health.Manager
    ShutdownManager  *shutdown.Manager
    DecisionPublisher *ucxl.DecisionPublisher
    UCXIServer       *ucxi.Server
    HTTPServer       *api.HTTPServer
    Logger           logging.Logger
}

type RuntimeConfig struct {
    ConfigPath      string
    BinaryType      BinaryType // Agent or HAP
    EnableSetupMode bool
    CustomPorts     PortConfig
}

type BinaryType int
const (
    BinaryTypeAgent BinaryType = iota
    BinaryTypeHAP
)

3.2 Core Runtime Interface

// internal/common/runtime/runtime.go
type Runtime interface {
    Initialize(ctx context.Context, cfg RuntimeConfig) (*RuntimeServices, error)
    Start(ctx context.Context, services *RuntimeServices) error
    Stop(ctx context.Context, services *RuntimeServices) error
    GetHealthStatus() health.Status
}

type StandardRuntime struct {
    services *RuntimeServices
    logger   logging.Logger
}

func NewRuntime(logger logging.Logger) Runtime {
    return &StandardRuntime{logger: logger}
}

3.3 Initialization Sequence

// internal/common/runtime/services.go
func (r *StandardRuntime) Initialize(ctx context.Context, cfg RuntimeConfig) (*RuntimeServices, error) {
    services := &RuntimeServices{}
    
    // Phase 1: Configuration
    if err := r.initializeConfig(cfg.ConfigPath, &services); err != nil {
        return nil, fmt.Errorf("config initialization failed: %w", err)
    }
    
    // Phase 2: P2P Infrastructure  
    if err := r.initializeP2P(ctx, services); err != nil {
        return nil, fmt.Errorf("P2P initialization failed: %w", err)
    }
    
    // Phase 3: Core Services
    if err := r.initializeCoreServices(ctx, services); err != nil {
        return nil, fmt.Errorf("core services initialization failed: %w", err)
    }
    
    // Phase 4: Binary-specific configuration
    if err := r.applyBinarySpecificConfig(cfg.BinaryType, services); err != nil {
        return nil, fmt.Errorf("binary-specific config failed: %w", err)
    }
    
    // Phase 5: Health and Monitoring
    if err := r.initializeMonitoring(services); err != nil {
        return nil, fmt.Errorf("monitoring initialization failed: %w", err)  
    }
    
    return services, nil
}

4. Binary-Specific Implementations

4.1 Autonomous Agent Binary (`cmd/agent/main.go`)

package main

import (
    "context"
    "fmt"
    "log"
    "os"
    "os/signal"
    "syscall"
    
    "chorus.services/bzzz/internal/agent"
    "chorus.services/bzzz/internal/common/runtime"
    "chorus.services/bzzz/pkg/logging"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    logger := logging.NewStandardLogger("bzzz-agent")
    rt := runtime.NewRuntime(logger)
    
    // Initialize shared runtime
    runtimeConfig := runtime.RuntimeConfig{
        ConfigPath:      getConfigPath(),
        BinaryType:      runtime.BinaryTypeAgent,
        EnableSetupMode: needsSetup(),
    }
    
    services, err := rt.Initialize(ctx, runtimeConfig)
    if err != nil {
        log.Fatalf("Failed to initialize runtime: %v", err)
    }
    
    // Start shared services
    if err := rt.Start(ctx, services); err != nil {
        log.Fatalf("Failed to start runtime: %v", err)
    }
    
    // Initialize agent-specific components
    agentRunner := agent.NewRunner(services, logger)
    if err := agentRunner.Start(ctx); err != nil {
        log.Fatalf("Failed to start agent runner: %v", err)
    }
    
    logger.Info("🤖 BZZZ Autonomous Agent started successfully")
    logger.Info("📍 Node ID: %s", services.Node.ID().ShortString())
    logger.Info("🎯 Agent ID: %s", services.Config.Agent.ID)
    
    // Wait for shutdown signals
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    <-sigChan
    
    logger.Info("🛑 Shutting down agent...")
    if err := rt.Stop(ctx, services); err != nil {
        logger.Error("Shutdown error: %v", err)
    }
}

4.2 Human Agent Portal Binary (`cmd/hap/main.go`)

package main

import (
    "context" 
    "fmt"
    "log"
    "os"
    "os/signal"
    "syscall"
    
    "chorus.services/bzzz/internal/hap"
    "chorus.services/bzzz/internal/common/runtime"
    "chorus.services/bzzz/pkg/logging"
)

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    logger := logging.NewStandardLogger("bzzz-hap")
    rt := runtime.NewRuntime(logger)
    
    // Initialize shared runtime
    runtimeConfig := runtime.RuntimeConfig{
        ConfigPath:      getConfigPath(),
        BinaryType:      runtime.BinaryTypeHAP,
        EnableSetupMode: needsSetup(),
        CustomPorts: runtime.PortConfig{
            HTTPPort:   8090, // Different from agent to avoid conflicts
            HealthPort: 8091,
        },
    }
    
    services, err := rt.Initialize(ctx, runtimeConfig)
    if err != nil {
        log.Fatalf("Failed to initialize runtime: %v", err)
    }
    
    // Start shared services
    if err := rt.Start(ctx, services); err != nil {
        log.Fatalf("Failed to start runtime: %v", err)
    }
    
    // Initialize HAP-specific components
    hapInterface := hap.NewTerminalInterface(services, logger)
    if err := hapInterface.Start(ctx); err != nil {
        log.Fatalf("Failed to start HAP interface: %v", err)
    }
    
    logger.Info("👤 BZZZ Human Agent Portal started successfully")
    logger.Info("📍 Node ID: %s", services.Node.ID().ShortString())
    logger.Info("🎯 Agent ID: %s", services.Config.Agent.ID)
    logger.Info("💬 Terminal interface ready for human interaction")
    
    // Wait for shutdown signals
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    <-sigChan
    
    logger.Info("🛑 Shutting down HAP...")
    if err := rt.Stop(ctx, services); err != nil {
        logger.Error("Shutdown error: %v", err)
    }
}

5. Interface Contracts and API Boundaries

5.1 Runtime Service Interface

// internal/common/runtime/interfaces.go
type RuntimeService interface {
    Name() string
    Initialize(ctx context.Context, config *config.Config) error
    Start(ctx context.Context) error 
    Stop(ctx context.Context) error
    IsHealthy() bool
    Dependencies() []string
}

type ServiceManager interface {
    Register(service RuntimeService)
    Start(ctx context.Context) error
    Stop(ctx context.Context) error
    GetService(name string) RuntimeService
    GetHealthStatus() map[string]bool
}

5.2 Binary-Specific Execution Interface

// internal/common/runtime/execution.go
type ExecutionMode interface {
    Run(ctx context.Context, services *RuntimeServices) error
    Stop(ctx context.Context) error
    GetType() BinaryType
}

// Agent implementation
type AgentExecution struct {
    services *RuntimeServices
    runner   *agent.Runner
}

// HAP implementation  
type HAPExecution struct {
    services  *RuntimeServices
    interface *hap.TerminalInterface
}

5.3 Shared P2P Participation

Both binaries participate identically in the P2P mesh:

// Binary-agnostic P2P participation
type P2PParticipant interface {
    JoinMesh(ctx context.Context) error
    PublishMessage(topic string, data interface{}) error
    SubscribeToTopic(topic string, handler MessageHandler) error
    GetPeerID() string
    GetConnectedPeers() []string
}

6. Configuration Strategy

6.1 Shared Configuration Structure

// pkg/config/binary_config.go
type BinarySpecificConfig struct {
    BinaryType    string            `yaml:"binary_type"` // "agent" or "hap"
    Ports         PortConfiguration `yaml:"ports"`
    Interface     InterfaceConfig   `yaml:"interface"`
    Capabilities  []string          `yaml:"capabilities"`
}

type PortConfiguration struct {
    HTTPPort     int `yaml:"http_port"`
    HealthPort   int `yaml:"health_port"`  
    UCXIPort     int `yaml:"ucxi_port"`
    AdminUIPort  int `yaml:"admin_ui_port,omitempty"`
}

type InterfaceConfig struct {
    Mode                string `yaml:"mode"` // "terminal", "web", "headless"
    AutoStartInterface  bool   `yaml:"auto_start_interface"`
    MessageTemplates    string `yaml:"message_templates_path,omitempty"`
    PromptLibrary       string `yaml:"prompt_library_path,omitempty"`
}

6.2 Configuration Validation

// internal/common/runtime/config.go
type ConfigValidator struct {
    binaryType BinaryType
}

func (v *ConfigValidator) ValidateForBinary(cfg *config.Config) error {
    // Common validation
    if err := v.validateCommonConfig(cfg); err != nil {
        return fmt.Errorf("common config validation failed: %w", err)
    }
    
    // Binary-specific validation
    switch v.binaryType {
    case BinaryTypeAgent:
        return v.validateAgentConfig(cfg)
    case BinaryTypeHAP:  
        return v.validateHAPConfig(cfg)
    default:
        return fmt.Errorf("unknown binary type: %v", v.binaryType)
    }
}

7. Data Flow Architecture

7.1 Message Flow Between Binaries

graph TB
    subgraph "Autonomous Agent (bzzz-agent)"
        AA[Agent Runner]
        AT[Task Processor] 
        AM[Message Handler]
    end
    
    subgraph "Human Agent Portal (bzzz-hap)"
        HI[Terminal Interface]
        HF[Form Templates]
        HP[Prompt Engine]
    end
    
    subgraph "Shared P2P Infrastructure"
        PS[PubSub System]
        DHT[Distributed Storage]
        EL[Election System]
    end
    
    AA --> PS
    AT --> DHT
    AM --> PS
    
    HI --> PS
    HF --> DHT  
    HP --> PS
    
    PS --> AA
    PS --> HI
    DHT --> AT
    DHT --> HF

7.2 Shared State Management

// internal/common/runtime/state.go
type SharedState struct {
    ActiveTasks     map[string]*TaskInfo
    PeerRegistry    map[string]*PeerInfo
    ElectionState   *ElectionInfo
    ConfigSnapshot  *config.Config
    HealthStatus    *SystemHealth
    mutex           sync.RWMutex
}

func (s *SharedState) UpdateTaskState(taskID string, state TaskState) error {
    s.mutex.Lock()
    defer s.mutex.Unlock()
    
    if task, exists := s.ActiveTasks[taskID]; exists {
        task.State = state
        task.LastUpdated = time.Now()
        return nil
    }
    return fmt.Errorf("task not found: %s", taskID)
}

8. Security and Access Control

8.1 Shared Cryptographic Identity

Both binaries use identical agent identity and cryptographic systems:

// pkg/agentid/shared_identity.go  
type SharedAgentIdentity struct {
    AgentID       string
    PrivateKey    crypto.PrivateKey
    PublicKey     crypto.PublicKey
    Role          string
    Capabilities  []string
    BinaryType    BinaryType // Added to distinguish binary type in P2P
}

func (id *SharedAgentIdentity) SignMessage(message []byte) ([]byte, error) {
    // Identical signing for both binaries
    return crypto.Sign(id.PrivateKey, message)
}

func (id *SharedAgentIdentity) CreateAgentRecord() (*agentid.AgentRecord, error) {
    return &agentid.AgentRecord{
        ID:            id.AgentID,
        PublicKey:     id.PublicKey,
        Role:          id.Role, 
        Capabilities:  id.Capabilities,
        BinaryType:    id.BinaryType.String(), // New field for P2P identification
        Timestamp:     time.Now(),
    }, nil
}

8.2 Role-Based Access Control

// pkg/crypto/rbac.go
type RoleBasedAccess struct {
    agentRole    string
    binaryType   BinaryType
    capabilities []string
}

func (r *RoleBasedAccess) CanAccessResource(resource string, operation string) bool {
    // Both binaries use same RBAC rules
    return r.checkPermission(resource, operation, r.agentRole, r.capabilities)
}

func (r *RoleBasedAccess) GetEncryptionRecipients(contentType string) ([]string, error) {
    // Same encryption recipients for both binaries
    return crypto.GetRecipientsForRole(r.agentRole, contentType)
}

9. Error Handling and Resilience

9.1 Shared Error Handling Strategy

// internal/common/runtime/errors.go
type RuntimeError struct {
    Code        ErrorCode
    Message     string
    BinaryType  BinaryType
    ServiceName string
    Timestamp   time.Time
    Cause       error
}

type ErrorCode int
const (
    ErrConfigInvalid ErrorCode = iota
    ErrP2PInitFailed
    ErrDHTUnavailable
    ErrElectionFailed
    ErrServiceStartFailed
)

func NewRuntimeError(code ErrorCode, service string, binType BinaryType, msg string, cause error) *RuntimeError {
    return &RuntimeError{
        Code:        code,
        Message:     msg,
        BinaryType:  binType,
        ServiceName: service,
        Timestamp:   time.Now(),
        Cause:       cause,
    }
}

9.2 Circuit Breaker Pattern

// internal/common/runtime/resilience.go
type ServiceCircuitBreaker struct {
    serviceName     string
    failureCount    int
    lastFailureTime time.Time
    state          CircuitState
    maxFailures    int
    timeout        time.Duration
}

func (cb *ServiceCircuitBreaker) Call(operation func() error) error {
    if cb.state == CircuitOpen {
        if time.Since(cb.lastFailureTime) > cb.timeout {
            cb.state = CircuitHalfOpen
        } else {
            return fmt.Errorf("circuit breaker open for service: %s", cb.serviceName)
        }
    }
    
    err := operation()
    if err != nil {
        cb.recordFailure()
        return err
    }
    
    cb.recordSuccess()
    return nil
}

10. Testing Strategy

10.1 Shared Runtime Testing

// internal/common/runtime/runtime_test.go
func TestRuntimeInitialization(t *testing.T) {
    tests := []struct {
        name       string
        binaryType BinaryType
        configPath string
        wantError  bool
    }{
        {
            name:       "Agent runtime initialization",
            binaryType: BinaryTypeAgent,
            configPath: "testdata/agent_config.yaml",
            wantError:  false,
        },
        {
            name:       "HAP runtime initialization", 
            binaryType: BinaryTypeHAP,
            configPath: "testdata/hap_config.yaml",
            wantError:  false,
        },
    }
    
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            ctx := context.Background()
            logger := logging.NewTestLogger(t)
            runtime := NewRuntime(logger)
            
            cfg := RuntimeConfig{
                ConfigPath: tt.configPath,
                BinaryType: tt.binaryType,
            }
            
            services, err := runtime.Initialize(ctx, cfg)
            
            if tt.wantError && err == nil {
                t.Error("expected error but got none")
            }
            if !tt.wantError && err != nil {
                t.Errorf("unexpected error: %v", err)
            }
            if !tt.wantError && services == nil {
                t.Error("expected services but got nil")
            }
        })
    }
}

10.2 Integration Testing

// test/integration/dual_binary_test.go
func TestDualBinaryP2PInteraction(t *testing.T) {
    // Start agent binary
    agentCtx, agentCancel := context.WithCancel(context.Background())
    defer agentCancel()
    
    agentRuntime := startTestAgent(t, agentCtx, "testdata/agent_config.yaml")
    defer agentRuntime.Shutdown()
    
    // Start HAP binary
    hapCtx, hapCancel := context.WithCancel(context.Background())
    defer hapCancel()
    
    hapRuntime := startTestHAP(t, hapCtx, "testdata/hap_config.yaml")
    defer hapRuntime.Shutdown()
    
    // Wait for P2P mesh formation
    waitForPeerConnection(t, agentRuntime, hapRuntime, 10*time.Second)
    
    // Test message exchange
    testMessage := "test collaboration message"
    err := hapRuntime.SendMessage("coordination", testMessage)
    assert.NoError(t, err)
    
    // Verify agent receives message
    receivedMsg := waitForMessage(t, agentRuntime, 5*time.Second)
    assert.Equal(t, testMessage, receivedMsg)
}

11. Deployment Strategy

11.1 Docker Multi-Stage Build

# Dockerfile.multi-stage
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .

# Build both binaries
RUN go mod download
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o bzzz-agent ./cmd/agent
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o bzzz-hap ./cmd/hap

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/

# Copy both binaries
COPY --from=builder /app/bzzz-agent .
COPY --from=builder /app/bzzz-hap .

# Default to agent mode, can be overridden
CMD ["./bzzz-agent"]

11.2 Kubernetes Deployment

# deployments/kubernetes/agent-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bzzz-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bzzz-agent
  template:
    metadata:
      labels:
        app: bzzz-agent
        binary-type: agent
    spec:
      containers:
      - name: bzzz-agent
        image: bzzz:latest
        command: ["./bzzz-agent"]
        ports:
        - containerPort: 8080
        - containerPort: 8081
        env:
        - name: BZZZ_CONFIG_PATH
          value: "/config/agent-config.yaml"
        volumeMounts:
        - name: config
          mountPath: /config
      volumes:
      - name: config
        configMap:
          name: bzzz-agent-config

---
# deployments/kubernetes/hap-deployment.yaml  
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bzzz-hap
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bzzz-hap
  template:
    metadata:
      labels:
        app: bzzz-hap
        binary-type: hap
    spec:
      containers:
      - name: bzzz-hap
        image: bzzz:latest
        command: ["./bzzz-hap"]
        ports:
        - containerPort: 8090
        - containerPort: 8091
        env:
        - name: BZZZ_CONFIG_PATH
          value: "/config/hap-config.yaml"
        volumeMounts:
        - name: config
          mountPath: /config
      volumes:
      - name: config
        configMap:
          name: bzzz-hap-config

12. Risk Analysis and Mitigation

12.1 Identified Architectural Risks

Risk Category	Risk Description	Impact	Probability	Mitigation Strategy
Configuration Drift	Agent and HAP configs diverge, causing P2P incompatibility	High	Medium	Shared config validation, integration tests
Port Conflicts	Both binaries try to use same ports when co-deployed	Medium	High	Binary-specific default ports, config validation
Shared State Race Conditions	Concurrent access to DHT/PubSub from both binaries	High	Medium	Proper locking, message deduplication
P2P Identity Collision	Same agent ID used by both binaries simultaneously	High	Low	Startup checks, unique binary identifiers
Resource Contention	Memory/CPU competition when both binaries run on same node	Medium	Medium	Resource monitoring, circuit breakers
Deployment Complexity	Increased operational complexity with two binaries	Medium	High	Docker multi-stage builds, K8s deployments

12.2 Specific Mitigation Strategies

Configuration Validation

// internal/common/runtime/validation.go
func ValidateMultiBinaryDeployment(agentConfig, hapConfig *config.Config) error {
    validators := []func(*config.Config, *config.Config) error{
        validateP2PCompatibility,
        validatePortAssignments, 
        validateAgentIdentities,
        validateEncryptionKeys,
    }
    
    for _, validator := range validators {
        if err := validator(agentConfig, hapConfig); err != nil {
            return err
        }
    }
    return nil
}

Startup Collision Detection

// internal/common/runtime/collision.go
func CheckForRunningInstance(agentID string, binaryType BinaryType) error {
    lockFile := fmt.Sprintf("/tmp/bzzz-%s-%s.lock", agentID, binaryType)
    
    if _, err := os.Stat(lockFile); err == nil {
        return fmt.Errorf("instance already running: %s %s", binaryType, agentID)
    }
    
    // Create lock file
    return os.WriteFile(lockFile, []byte(fmt.Sprintf("%d", os.Getpid())), 0644)
}

13. Performance Considerations

13.1 Memory Usage Optimization

// internal/common/runtime/optimization.go
type ResourceOptimizer struct {
    binaryType    BinaryType
    maxMemoryMB   int64
    gcPercent     int
}

func (o *ResourceOptimizer) OptimizeForBinary() {
    switch o.binaryType {
    case BinaryTypeAgent:
        // Agent needs more memory for task processing
        debug.SetGCPercent(100)
        debug.SetMemoryLimit(o.maxMemoryMB * 1024 * 1024)
    case BinaryTypeHAP:
        // HAP can be more memory conservative
        debug.SetGCPercent(50)
        debug.SetMemoryLimit((o.maxMemoryMB/2) * 1024 * 1024)
    }
}

13.2 P2P Message Optimization

// internal/common/runtime/p2p_optimization.go
func OptimizePubSubForBinary(ps *pubsub.PubSub, binaryType BinaryType) {
    switch binaryType {
    case BinaryTypeAgent:
        // Agents need fast task coordination
        ps.SetMessageTimeout(5 * time.Second)
        ps.SetMaxMessageSize(1024 * 1024) // 1MB
    case BinaryTypeHAP:
        // HAP can tolerate slower human-paced interaction
        ps.SetMessageTimeout(30 * time.Second)
        ps.SetMaxMessageSize(512 * 1024) // 512KB
    }
}

14. Success Metrics and Validation

14.1 Phase 1 Success Criteria

Criteria	Measurement	Target	Validation Method
Build Success	Both binaries compile without errors	100%	CI/CD pipeline
Runtime Compatibility	Agent maintains existing functionality	100%	Regression test suite
P2P Interoperability	Both binaries join same mesh	100%	Integration tests
Resource Isolation	No port/resource conflicts	100%	Co-deployment tests
Configuration Validation	Invalid configs rejected	100%	Unit tests
Graceful Shutdown	Clean shutdown under load	100%	Stress tests

14.2 Performance Benchmarks

// test/benchmarks/runtime_bench_test.go
func BenchmarkAgentStartup(b *testing.B) {
    for i := 0; i < b.N; i++ {
        ctx := context.Background()
        runtime := NewRuntime(logging.NewNullLogger())
        
        start := time.Now()
        _, err := runtime.Initialize(ctx, RuntimeConfig{
            BinaryType: BinaryTypeAgent,
            ConfigPath: "testdata/agent_config.yaml",
        })
        duration := time.Since(start)
        
        if err != nil {
            b.Fatalf("initialization failed: %v", err)
        }
        
        // Target: < 5 seconds startup
        if duration > 5*time.Second {
            b.Errorf("startup too slow: %v", duration)
        }
    }
}

15. Implementation Roadmap

15.1 Development Phases

Phase 1.1: Infrastructure Setup (Week 1)

Create cmd/ directory structure
Create internal/common/runtime/ package structure
Move existing main.go to cmd/agent/main.go
Update Makefile for dual-binary builds
Basic smoke tests for both binaries

Phase 1.2: Runtime Extraction (Week 2)

Extract shared initialization logic to runtime/services.go
Extract configuration loading to runtime/config.go
Extract P2P initialization to runtime/p2p.go
Extract health monitoring to runtime/monitoring.go
Comprehensive unit tests for runtime package

Phase 1.3: HAP Binary Implementation (Week 3)

Implement cmd/hap/main.go
Create stub HAP interface in internal/hap/
Implement basic terminal interaction
P2P mesh participation tests
Message send/receive validation

Phase 1.4: Integration & Validation (Week 4)

Dual-binary integration tests
Performance benchmarking
Resource conflict validation
Documentation updates
Deployment guide creation

15.2 Testing Strategy

# Phase 1 Testing Commands
make test-unit                    # Unit tests for all packages
make test-integration            # Integration tests between binaries  
make test-performance           # Performance benchmarks
make test-deployment            # Docker/K8s deployment tests
make test-regression           # Ensure existing functionality unchanged

16. Documentation Requirements

16.1 Developer Documentation

Architecture Overview: Updated system architecture diagrams
API Reference: Runtime service interfaces and contracts
Configuration Guide: Binary-specific configuration examples
Testing Guide: How to test dual-binary scenarios
Troubleshooting: Common issues and solutions

16.2 Operations Documentation

Deployment Guide: Docker and Kubernetes deployment patterns
Monitoring Setup: Health check endpoints and metrics
Performance Tuning: Resource optimization recommendations
Security Configuration: Role-based access control setup

17. Conclusion

This technical specification provides a comprehensive blueprint for transforming BZZZ from a monolithic single-binary system into a dual-binary architecture that supports both autonomous agents and human agent portals. The design maintains all existing functionality while enabling new human interaction capabilities through a shared runtime infrastructure.

Key Benefits:

Zero Regression: Autonomous agents retain 100% existing functionality
Shared Infrastructure: Maximum code reuse and consistency
Operational Flexibility: Deploy agents and HAP independently
Future Extensibility: Architecture supports additional binary types

Implementation Priority: This specification focuses on Phase 1 structural reorganization, which is marked as HIGH PRIORITY in the HAP Action Plan. Successful completion of Phase 1 will provide a solid foundation for subsequent phases that add sophisticated human interaction features.

The architecture balances complexity with maintainability, ensuring that the dual-binary system is operationally manageable while providing the flexibility needed for human-agent collaboration in the BZZZ ecosystem.

30 KiB Raw Permalink Blame History