 c5b7311a8b
			
		
	
	c5b7311a8b
	
	
	
		
			
			Comprehensive documentation for coordination, messaging, discovery, and internal systems. Core Coordination Packages: - pkg/election - Democratic leader election (uptime-based, heartbeat mechanism, SLURP integration) - pkg/coordination - Meta-coordination with dependency detection (4 built-in rules) - coordinator/ - Task orchestration and assignment (AI-powered scoring) - discovery/ - mDNS peer discovery (automatic LAN detection) Messaging & P2P Infrastructure: - pubsub/ - GossipSub messaging (31 message types, role-based topics, HMMM integration) - p2p/ - libp2p networking (DHT modes, connection management, security) Monitoring & Health: - pkg/metrics - Prometheus metrics (80+ metrics across 12 categories) - pkg/health - Health monitoring (4 HTTP endpoints, enhanced checks, graceful degradation) Internal Systems: - internal/licensing - License validation (KACHING integration, cluster leases, fail-closed) - internal/hapui - Human Agent Portal UI (9 commands, HMMM wizard, UCXL browser, decision voting) - internal/backbeat - P2P operation telemetry (6 phases, beat synchronization, health reporting) Documentation Statistics (Phase 3): - 10 packages documented (~18,000 lines) - 31 PubSub message types cataloged - 80+ Prometheus metrics documented - Complete API references with examples - Integration patterns and best practices Key Features Documented: - Election: 5 triggers, candidate scoring (5 weighted components), stability windows - Coordination: AI-powered dependency detection, cross-repo sessions, escalation handling - PubSub: Topic patterns, message envelopes, SHHH redaction, Hypercore logging - Metrics: All metric types with labels, Prometheus scrape config, alert rules - Health: Liveness vs readiness, critical checks, Kubernetes integration - Licensing: Grace periods, circuit breaker, cluster lease management - HAP UI: Interactive terminal commands, HMMM composition wizard, web interface (beta) - BACKBEAT: 6-phase operation tracking, beat budget estimation, drift detection Implementation Status Marked: - ✅ Production: Election, metrics, health, licensing, pubsub, p2p, discovery, coordinator - 🔶 Beta: HAP web interface, BACKBEAT telemetry, advanced coordination - 🔷 Alpha: SLURP election scoring - ⚠️ Experimental: Meta-coordination, AI-powered dependency detection Progress: 22/62 files complete (35%) Next Phase: AI providers, SLURP system, API layer, reasoning engine 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
		
			
				
	
	
	
		
			15 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Package: discovery
Location: /home/tony/chorus/project-queues/active/CHORUS/discovery/
Overview
The discovery package provides mDNS-based peer discovery for automatic detection and connection of CHORUS agents on the local network. It enables zero-configuration peer discovery using multicast DNS (mDNS), allowing agents to find and connect to each other without manual configuration or central coordination.
Architecture
mDNS Overview
Multicast DNS (mDNS) is a protocol that resolves hostnames to IP addresses within small networks that do not include a local name server. It uses:
- Multicast IP: 224.0.0.251 (IPv4) or FF02::FB (IPv6)
- UDP Port: 5353
- Service Discovery: Advertises and discovers services on the local network
CHORUS Service Tag
Default Service Name: "CHORUS-peer-discovery"
This service tag identifies CHORUS peers on the network. All CHORUS agents advertise themselves with this tag and listen for other agents using the same tag.
Core Components
MDNSDiscovery
Main structure managing mDNS discovery operations.
type MDNSDiscovery struct {
    host        host.Host                 // libp2p host
    service     mdns.Service              // mDNS service
    notifee     *mdnsNotifee             // Peer notification handler
    ctx         context.Context           // Discovery context
    cancel      context.CancelFunc        // Context cancellation
    serviceTag  string                    // Service name (default: "CHORUS-peer-discovery")
}
Key Responsibilities:
- Advertise local agent as mDNS service
- Listen for mDNS announcements from other agents
- Automatically connect to discovered peers
- Handle peer connection lifecycle
mdnsNotifee
Internal notification handler for discovered peers.
type mdnsNotifee struct {
    h         host.Host                // libp2p host
    ctx       context.Context          // Context for operations
    peersChan chan peer.AddrInfo       // Channel for discovered peers (buffer: 10)
}
Implements the mDNS notification interface to receive peer discovery events.
Discovery Flow
1. Service Initialization
discovery, err := NewMDNSDiscovery(ctx, host, "CHORUS-peer-discovery")
if err != nil {
    return fmt.Errorf("failed to start mDNS discovery: %w", err)
}
Initialization Steps:
- Create discovery context with cancellation
- Initialize mdnsNotifee with peer channel
- Create mDNS service with service tag
- Start mDNS service (begins advertising and listening)
- Launch background peer connection handler
2. Service Advertisement
When the service starts, it automatically advertises:
Service Type: _CHORUS-peer-discovery._udp.local
Port: libp2p host port
Addresses: All local IP addresses (IPv4 and IPv6)
This allows other CHORUS agents on the network to discover this peer.
3. Peer Discovery
Discovery Process:
1. mDNS Service listens for multicast announcements
   ├─ Receives service announcement from peer
   └─ Extracts peer.AddrInfo (ID + addresses)
2. mdnsNotifee.HandlePeerFound() called
   ├─ Peer info sent to peersChan
   └─ Non-blocking send (drops if channel full)
3. handleDiscoveredPeers() goroutine receives
   ├─ Skip if peer is self
   ├─ Skip if already connected
   └─ Attempt connection
4. Automatic Connection
func (d *MDNSDiscovery) handleDiscoveredPeers() {
    for {
        select {
        case <-d.ctx.Done():
            return
        case peerInfo := <-d.notifee.peersChan:
            // Skip self
            if peerInfo.ID == d.host.ID() {
                continue
            }
            // Check if already connected
            if d.host.Network().Connectedness(peerInfo.ID) == 1 {
                continue
            }
            // Attempt connection with timeout
            connectCtx, cancel := context.WithTimeout(d.ctx, 10*time.Second)
            err := d.host.Connect(connectCtx, peerInfo)
            cancel()
            if err != nil {
                fmt.Printf("❌ Failed to connect to peer %s: %v\n",
                          peerInfo.ID.ShortString(), err)
            } else {
                fmt.Printf("✅ Successfully connected to peer %s\n",
                          peerInfo.ID.ShortString())
            }
        }
    }
}
Connection Features:
- 10-second timeout per connection attempt
- Idempotent: Safe to attempt connection to already-connected peer
- Self-filtering: Ignores own mDNS announcements
- Duplicate filtering: Checks existing connections before attempting
- Non-blocking: Runs in background goroutine
Usage
Basic Usage
import (
    "context"
    "chorus/discovery"
    "github.com/libp2p/go-libp2p/core/host"
)
func setupDiscovery(ctx context.Context, h host.Host) (*discovery.MDNSDiscovery, error) {
    // Start mDNS discovery with default service tag
    disc, err := discovery.NewMDNSDiscovery(ctx, h, "")
    if err != nil {
        return nil, err
    }
    fmt.Println("🔍 mDNS discovery started")
    return disc, nil
}
Custom Service Tag
// Use custom service tag for specific environments
disc, err := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-dev-network")
if err != nil {
    return nil, err
}
Monitoring Discovered Peers
// Access peer channel for custom handling
peersChan := disc.PeersChan()
go func() {
    for peerInfo := range peersChan {
        fmt.Printf("🔍 Discovered peer: %s with %d addresses\n",
                  peerInfo.ID.ShortString(),
                  len(peerInfo.Addrs))
        // Custom peer processing
        handleNewPeer(peerInfo)
    }
}()
Graceful Shutdown
// Close discovery service
if err := disc.Close(); err != nil {
    log.Printf("Error closing discovery: %v", err)
}
Peer Information Structure
peer.AddrInfo
Discovered peers are represented as libp2p peer.AddrInfo:
type AddrInfo struct {
    ID    peer.ID           // Unique peer identifier
    Addrs []multiaddr.Multiaddr  // Peer addresses
}
Example Multiaddresses:
/ip4/192.168.1.100/tcp/4001/p2p/QmPeerID...
/ip6/fe80::1/tcp/4001/p2p/QmPeerID...
Network Configuration
Firewall Requirements
mDNS requires the following ports to be open:
- UDP 5353: mDNS multicast
- TCP/UDP 4001 (or configured libp2p port): libp2p connections
Network Scope
mDNS operates on local network only:
- Same subnet required for discovery
- Does not traverse routers (by design)
- Ideal for LAN-based agent clusters
Multicast Group
mDNS uses standard multicast groups:
- IPv4: 224.0.0.251
- IPv6: FF02::FB
Integration with CHORUS
Cluster Formation
mDNS discovery enables automatic cluster formation:
Startup Sequence:
1. Agent starts with libp2p host
2. mDNS discovery initialized
3. Agent advertises itself via mDNS
4. Agent listens for other agents
5. Auto-connects to discovered peers
6. PubSub gossip network forms
7. Task coordination begins
Multi-Node Cluster Example
Network: 192.168.1.0/24
Node 1 (walnut):     192.168.1.27  - Agent: backend-dev
Node 2 (ironwood):   192.168.1.72  - Agent: frontend-dev
Node 3 (rosewood):   192.168.1.113 - Agent: devops-specialist
Discovery Flow:
1. All nodes start with CHORUS-peer-discovery tag
2. Each node multicasts to 224.0.0.251:5353
3. All nodes receive each other's announcements
4. Automatic connection establishment:
   walnut ↔ ironwood
   walnut ↔ rosewood
   ironwood ↔ rosewood
5. Full mesh topology formed
6. PubSub topics synchronized
Error Handling
Service Start Failure
disc, err := discovery.NewMDNSDiscovery(ctx, h, serviceTag)
if err != nil {
    // Common causes:
    // - Port 5353 already in use
    // - Insufficient permissions (require multicast)
    // - Network interface unavailable
    return fmt.Errorf("failed to start mDNS discovery: %w", err)
}
Connection Failures
Connection failures are logged but do not stop the discovery process:
❌ Failed to connect to peer Qm... : context deadline exceeded
Common Causes:
- Peer behind firewall
- Network congestion
- Peer offline/restarting
- Connection limit reached
Behavior: Discovery continues, will retry on next mDNS announcement.
Channel Full
If peer discovery is faster than connection handling:
⚠️ Discovery channel full, skipping peer Qm...
Buffer Size: 10 peers Mitigation: Non-critical, peer will be rediscovered on next announcement cycle
Performance Characteristics
Discovery Latency
- Initial Advertisement: ~1-2 seconds after service start
- Discovery Response: Typically < 1 second on LAN
- Connection Establishment: 1-10 seconds (with 10s timeout)
- Re-announcement: Periodic (standard mDNS timing)
Resource Usage
- Memory: Minimal (~1MB per discovery service)
- CPU: Very low (event-driven)
- Network: Minimal (periodic multicast announcements)
- Concurrent Connections: Handled by libp2p connection manager
Configuration Options
Service Tag Customization
// Production environment
disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-production")
// Development environment
disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-dev")
// Testing environment
disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-test")
Use Case: Isolate environments on same physical network.
Connection Timeout Adjustment
Currently hardcoded to 10 seconds. For customization:
// In handleDiscoveredPeers():
connectTimeout := 30 * time.Second  // Longer for slow networks
connectCtx, cancel := context.WithTimeout(d.ctx, connectTimeout)
Advanced Usage
Custom Peer Handling
Bypass automatic connection and implement custom logic:
// Subscribe to peer channel
peersChan := disc.PeersChan()
go func() {
    for peerInfo := range peersChan {
        // Custom filtering
        if shouldConnectToPeer(peerInfo) {
            // Custom connection logic
            connectWithRetry(peerInfo)
        }
    }
}()
Discovery Metrics
type DiscoveryMetrics struct {
    PeersDiscovered   int
    ConnectionsSuccess int
    ConnectionsFailed  int
    LastDiscovery     time.Time
}
// Track metrics
var metrics DiscoveryMetrics
// In handleDiscoveredPeers():
metrics.PeersDiscovered++
if err := host.Connect(ctx, peerInfo); err != nil {
    metrics.ConnectionsFailed++
} else {
    metrics.ConnectionsSuccess++
}
metrics.LastDiscovery = time.Now()
Comparison with Other Discovery Methods
mDNS vs DHT
| Feature | mDNS | DHT (Kademlia) | 
|---|---|---|
| Network Scope | Local network only | Global | 
| Setup | Zero-config | Requires bootstrap nodes | 
| Speed | Very fast (< 1s) | Slower (seconds to minutes) | 
| Privacy | Local only | Public network | 
| Reliability | High on LAN | Depends on DHT health | 
| Use Case | LAN clusters | Internet-wide P2P | 
CHORUS Choice: mDNS for local agent clusters, DHT could be added for internet-wide coordination.
mDNS vs Bootstrap List
| Feature | mDNS | Bootstrap List | 
|---|---|---|
| Configuration | None | Manual list | 
| Maintenance | Automatic | Manual updates | 
| Scalability | Limited to LAN | Unlimited | 
| Flexibility | Dynamic | Static | 
| Failure Handling | Auto-discovery | Manual intervention | 
CHORUS Choice: mDNS for local discovery, bootstrap list as fallback.
libp2p Integration
Host Requirement
mDNS discovery requires a libp2p host:
import (
    "github.com/libp2p/go-libp2p"
    "github.com/libp2p/go-libp2p/core/host"
)
// Create libp2p host
h, err := libp2p.New(
    libp2p.ListenAddrStrings(
        "/ip4/0.0.0.0/tcp/4001",
        "/ip6/::/tcp/4001",
    ),
)
if err != nil {
    return err
}
// Initialize mDNS discovery with host
disc, err := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-peer-discovery")
Connection Manager Integration
mDNS discovery works with libp2p connection manager:
h, err := libp2p.New(
    libp2p.ListenAddrStrings("/ip4/0.0.0.0/tcp/4001"),
    libp2p.ConnectionManager(connmgr.NewConnManager(
        100,  // Low water mark
        400,  // High water mark
        time.Minute,
    )),
)
// mDNS-discovered connections managed by connection manager
disc, err := discovery.NewMDNSDiscovery(ctx, h, "")
Security Considerations
Trust Model
mDNS operates on local network trust:
- Assumes local network is trusted
- No authentication at mDNS layer
- Authentication handled by libp2p security transport
Attack Vectors
- Peer ID Spoofing: Mitigated by libp2p peer ID verification
- DoS via Fake Peers: Limited by channel buffer and connection timeout
- Network Snooping: mDNS announcements are plaintext (by design)
Best Practices
- Use libp2p Security: TLS or Noise transport for encrypted connections
- Peer Authentication: Verify peer identities after connection
- Network Isolation: Deploy on trusted networks
- Connection Limits: Use libp2p connection manager
- Monitoring: Log all discovery and connection events
Troubleshooting
No Peers Discovered
Symptoms: Service starts but no peers found.
Checks:
- Verify all agents on same subnet
- Check firewall rules (UDP 5353)
- Verify mDNS/multicast not blocked by network
- Check service tag matches across agents
- Verify no mDNS conflicts with other services
Connection Failures
Symptoms: Peers discovered but connections fail.
Checks:
- Verify libp2p port open (default: TCP 4001)
- Check connection manager limits
- Verify peer addresses are reachable
- Check for NAT/firewall between peers
- Verify sufficient system resources (file descriptors, memory)
High CPU/Network Usage
Symptoms: Excessive mDNS traffic or CPU usage.
Causes:
- Rapid peer restarts (re-announcements)
- Many peers on network
- Short announcement intervals
Solutions:
- Implement connection caching
- Adjust mDNS announcement timing
- Use connection limits
Monitoring and Debugging
Discovery Events
// Log all discovery events
disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-peer-discovery")
peersChan := disc.PeersChan()
go func() {
    for peerInfo := range peersChan {
        logger.Info("Discovered peer",
            "peer_id", peerInfo.ID.String(),
            "addresses", peerInfo.Addrs,
            "timestamp", time.Now())
    }
}()
Connection Status
// Monitor connection status
func monitorConnections(h host.Host) {
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()
    for range ticker.C {
        peers := h.Network().Peers()
        fmt.Printf("📊 Connected to %d peers: %v\n",
                  len(peers), peers)
    }
}
See Also
- coordinator/ - Task coordination using discovered peers
- pubsub/ - PubSub over discovered peer network
- internal/runtime/ - Runtime initialization with discovery
- libp2p Documentation - libp2p concepts and APIs
- mDNS RFC 6762 - mDNS protocol specification