# Package: discovery **Location**: `/home/tony/chorus/project-queues/active/CHORUS/discovery/` ## Overview The `discovery` package provides **mDNS-based peer discovery** for automatic detection and connection of CHORUS agents on the local network. It enables zero-configuration peer discovery using multicast DNS (mDNS), allowing agents to find and connect to each other without manual configuration or central coordination. ## Architecture ### mDNS Overview Multicast DNS (mDNS) is a protocol that resolves hostnames to IP addresses within small networks that do not include a local name server. It uses: - **Multicast IP**: 224.0.0.251 (IPv4) or FF02::FB (IPv6) - **UDP Port**: 5353 - **Service Discovery**: Advertises and discovers services on the local network ### CHORUS Service Tag **Default Service Name**: `"CHORUS-peer-discovery"` This service tag identifies CHORUS peers on the network. All CHORUS agents advertise themselves with this tag and listen for other agents using the same tag. ## Core Components ### MDNSDiscovery Main structure managing mDNS discovery operations. ```go type MDNSDiscovery struct { host host.Host // libp2p host service mdns.Service // mDNS service notifee *mdnsNotifee // Peer notification handler ctx context.Context // Discovery context cancel context.CancelFunc // Context cancellation serviceTag string // Service name (default: "CHORUS-peer-discovery") } ``` **Key Responsibilities:** - Advertise local agent as mDNS service - Listen for mDNS announcements from other agents - Automatically connect to discovered peers - Handle peer connection lifecycle ### mdnsNotifee Internal notification handler for discovered peers. ```go type mdnsNotifee struct { h host.Host // libp2p host ctx context.Context // Context for operations peersChan chan peer.AddrInfo // Channel for discovered peers (buffer: 10) } ``` Implements the mDNS notification interface to receive peer discovery events. ## Discovery Flow ### 1. Service Initialization ```go discovery, err := NewMDNSDiscovery(ctx, host, "CHORUS-peer-discovery") if err != nil { return fmt.Errorf("failed to start mDNS discovery: %w", err) } ``` **Initialization Steps:** 1. Create discovery context with cancellation 2. Initialize mdnsNotifee with peer channel 3. Create mDNS service with service tag 4. Start mDNS service (begins advertising and listening) 5. Launch background peer connection handler ### 2. Service Advertisement When the service starts, it automatically advertises: ``` Service Type: _CHORUS-peer-discovery._udp.local Port: libp2p host port Addresses: All local IP addresses (IPv4 and IPv6) ``` This allows other CHORUS agents on the network to discover this peer. ### 3. Peer Discovery **Discovery Process:** ``` 1. mDNS Service listens for multicast announcements ├─ Receives service announcement from peer └─ Extracts peer.AddrInfo (ID + addresses) 2. mdnsNotifee.HandlePeerFound() called ├─ Peer info sent to peersChan └─ Non-blocking send (drops if channel full) 3. handleDiscoveredPeers() goroutine receives ├─ Skip if peer is self ├─ Skip if already connected └─ Attempt connection ``` ### 4. Automatic Connection ```go func (d *MDNSDiscovery) handleDiscoveredPeers() { for { select { case <-d.ctx.Done(): return case peerInfo := <-d.notifee.peersChan: // Skip self if peerInfo.ID == d.host.ID() { continue } // Check if already connected if d.host.Network().Connectedness(peerInfo.ID) == 1 { continue } // Attempt connection with timeout connectCtx, cancel := context.WithTimeout(d.ctx, 10*time.Second) err := d.host.Connect(connectCtx, peerInfo) cancel() if err != nil { fmt.Printf("❌ Failed to connect to peer %s: %v\n", peerInfo.ID.ShortString(), err) } else { fmt.Printf("✅ Successfully connected to peer %s\n", peerInfo.ID.ShortString()) } } } } ``` **Connection Features:** - **10-second timeout** per connection attempt - **Idempotent**: Safe to attempt connection to already-connected peer - **Self-filtering**: Ignores own mDNS announcements - **Duplicate filtering**: Checks existing connections before attempting - **Non-blocking**: Runs in background goroutine ## Usage ### Basic Usage ```go import ( "context" "chorus/discovery" "github.com/libp2p/go-libp2p/core/host" ) func setupDiscovery(ctx context.Context, h host.Host) (*discovery.MDNSDiscovery, error) { // Start mDNS discovery with default service tag disc, err := discovery.NewMDNSDiscovery(ctx, h, "") if err != nil { return nil, err } fmt.Println("🔍 mDNS discovery started") return disc, nil } ``` ### Custom Service Tag ```go // Use custom service tag for specific environments disc, err := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-dev-network") if err != nil { return nil, err } ``` ### Monitoring Discovered Peers ```go // Access peer channel for custom handling peersChan := disc.PeersChan() go func() { for peerInfo := range peersChan { fmt.Printf("🔍 Discovered peer: %s with %d addresses\n", peerInfo.ID.ShortString(), len(peerInfo.Addrs)) // Custom peer processing handleNewPeer(peerInfo) } }() ``` ### Graceful Shutdown ```go // Close discovery service if err := disc.Close(); err != nil { log.Printf("Error closing discovery: %v", err) } ``` ## Peer Information Structure ### peer.AddrInfo Discovered peers are represented as libp2p `peer.AddrInfo`: ```go type AddrInfo struct { ID peer.ID // Unique peer identifier Addrs []multiaddr.Multiaddr // Peer addresses } ``` **Example Multiaddresses:** ``` /ip4/192.168.1.100/tcp/4001/p2p/QmPeerID... /ip6/fe80::1/tcp/4001/p2p/QmPeerID... ``` ## Network Configuration ### Firewall Requirements mDNS requires the following ports to be open: - **UDP 5353**: mDNS multicast - **TCP/UDP 4001** (or configured libp2p port): libp2p connections ### Network Scope mDNS operates on **local network** only: - Same subnet required for discovery - Does not traverse routers (by design) - Ideal for LAN-based agent clusters ### Multicast Group mDNS uses standard multicast groups: - **IPv4**: 224.0.0.251 - **IPv6**: FF02::FB ## Integration with CHORUS ### Cluster Formation mDNS discovery enables automatic cluster formation: ``` Startup Sequence: 1. Agent starts with libp2p host 2. mDNS discovery initialized 3. Agent advertises itself via mDNS 4. Agent listens for other agents 5. Auto-connects to discovered peers 6. PubSub gossip network forms 7. Task coordination begins ``` ### Multi-Node Cluster Example ``` Network: 192.168.1.0/24 Node 1 (walnut): 192.168.1.27 - Agent: backend-dev Node 2 (ironwood): 192.168.1.72 - Agent: frontend-dev Node 3 (rosewood): 192.168.1.113 - Agent: devops-specialist Discovery Flow: 1. All nodes start with CHORUS-peer-discovery tag 2. Each node multicasts to 224.0.0.251:5353 3. All nodes receive each other's announcements 4. Automatic connection establishment: walnut ↔ ironwood walnut ↔ rosewood ironwood ↔ rosewood 5. Full mesh topology formed 6. PubSub topics synchronized ``` ## Error Handling ### Service Start Failure ```go disc, err := discovery.NewMDNSDiscovery(ctx, h, serviceTag) if err != nil { // Common causes: // - Port 5353 already in use // - Insufficient permissions (require multicast) // - Network interface unavailable return fmt.Errorf("failed to start mDNS discovery: %w", err) } ``` ### Connection Failures Connection failures are logged but do not stop the discovery process: ``` ❌ Failed to connect to peer Qm... : context deadline exceeded ``` **Common Causes:** - Peer behind firewall - Network congestion - Peer offline/restarting - Connection limit reached **Behavior**: Discovery continues, will retry on next mDNS announcement. ### Channel Full If peer discovery is faster than connection handling: ``` ⚠️ Discovery channel full, skipping peer Qm... ``` **Buffer Size**: 10 peers **Mitigation**: Non-critical, peer will be rediscovered on next announcement cycle ## Performance Characteristics ### Discovery Latency - **Initial Advertisement**: ~1-2 seconds after service start - **Discovery Response**: Typically < 1 second on LAN - **Connection Establishment**: 1-10 seconds (with 10s timeout) - **Re-announcement**: Periodic (standard mDNS timing) ### Resource Usage - **Memory**: Minimal (~1MB per discovery service) - **CPU**: Very low (event-driven) - **Network**: Minimal (periodic multicast announcements) - **Concurrent Connections**: Handled by libp2p connection manager ## Configuration Options ### Service Tag Customization ```go // Production environment disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-production") // Development environment disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-dev") // Testing environment disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-test") ``` **Use Case**: Isolate environments on same physical network. ### Connection Timeout Adjustment Currently hardcoded to 10 seconds. For customization: ```go // In handleDiscoveredPeers(): connectTimeout := 30 * time.Second // Longer for slow networks connectCtx, cancel := context.WithTimeout(d.ctx, connectTimeout) ``` ## Advanced Usage ### Custom Peer Handling Bypass automatic connection and implement custom logic: ```go // Subscribe to peer channel peersChan := disc.PeersChan() go func() { for peerInfo := range peersChan { // Custom filtering if shouldConnectToPeer(peerInfo) { // Custom connection logic connectWithRetry(peerInfo) } } }() ``` ### Discovery Metrics ```go type DiscoveryMetrics struct { PeersDiscovered int ConnectionsSuccess int ConnectionsFailed int LastDiscovery time.Time } // Track metrics var metrics DiscoveryMetrics // In handleDiscoveredPeers(): metrics.PeersDiscovered++ if err := host.Connect(ctx, peerInfo); err != nil { metrics.ConnectionsFailed++ } else { metrics.ConnectionsSuccess++ } metrics.LastDiscovery = time.Now() ``` ## Comparison with Other Discovery Methods ### mDNS vs DHT | Feature | mDNS | DHT (Kademlia) | |---------|------|----------------| | Network Scope | Local network only | Global | | Setup | Zero-config | Requires bootstrap nodes | | Speed | Very fast (< 1s) | Slower (seconds to minutes) | | Privacy | Local only | Public network | | Reliability | High on LAN | Depends on DHT health | | Use Case | LAN clusters | Internet-wide P2P | **CHORUS Choice**: mDNS for local agent clusters, DHT could be added for internet-wide coordination. ### mDNS vs Bootstrap List | Feature | mDNS | Bootstrap List | |---------|------|----------------| | Configuration | None | Manual list | | Maintenance | Automatic | Manual updates | | Scalability | Limited to LAN | Unlimited | | Flexibility | Dynamic | Static | | Failure Handling | Auto-discovery | Manual intervention | **CHORUS Choice**: mDNS for local discovery, bootstrap list as fallback. ## libp2p Integration ### Host Requirement mDNS discovery requires a libp2p host: ```go import ( "github.com/libp2p/go-libp2p" "github.com/libp2p/go-libp2p/core/host" ) // Create libp2p host h, err := libp2p.New( libp2p.ListenAddrStrings( "/ip4/0.0.0.0/tcp/4001", "/ip6/::/tcp/4001", ), ) if err != nil { return err } // Initialize mDNS discovery with host disc, err := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-peer-discovery") ``` ### Connection Manager Integration mDNS discovery works with libp2p connection manager: ```go h, err := libp2p.New( libp2p.ListenAddrStrings("/ip4/0.0.0.0/tcp/4001"), libp2p.ConnectionManager(connmgr.NewConnManager( 100, // Low water mark 400, // High water mark time.Minute, )), ) // mDNS-discovered connections managed by connection manager disc, err := discovery.NewMDNSDiscovery(ctx, h, "") ``` ## Security Considerations ### Trust Model mDNS operates on **local network trust**: - Assumes local network is trusted - No authentication at mDNS layer - Authentication handled by libp2p security transport ### Attack Vectors 1. **Peer ID Spoofing**: Mitigated by libp2p peer ID verification 2. **DoS via Fake Peers**: Limited by channel buffer and connection timeout 3. **Network Snooping**: mDNS announcements are plaintext (by design) ### Best Practices 1. **Use libp2p Security**: TLS or Noise transport for encrypted connections 2. **Peer Authentication**: Verify peer identities after connection 3. **Network Isolation**: Deploy on trusted networks 4. **Connection Limits**: Use libp2p connection manager 5. **Monitoring**: Log all discovery and connection events ## Troubleshooting ### No Peers Discovered **Symptoms**: Service starts but no peers found. **Checks:** 1. Verify all agents on same subnet 2. Check firewall rules (UDP 5353) 3. Verify mDNS/multicast not blocked by network 4. Check service tag matches across agents 5. Verify no mDNS conflicts with other services ### Connection Failures **Symptoms**: Peers discovered but connections fail. **Checks:** 1. Verify libp2p port open (default: TCP 4001) 2. Check connection manager limits 3. Verify peer addresses are reachable 4. Check for NAT/firewall between peers 5. Verify sufficient system resources (file descriptors, memory) ### High CPU/Network Usage **Symptoms**: Excessive mDNS traffic or CPU usage. **Causes:** - Rapid peer restarts (re-announcements) - Many peers on network - Short announcement intervals **Solutions:** - Implement connection caching - Adjust mDNS announcement timing - Use connection limits ## Monitoring and Debugging ### Discovery Events ```go // Log all discovery events disc, _ := discovery.NewMDNSDiscovery(ctx, h, "CHORUS-peer-discovery") peersChan := disc.PeersChan() go func() { for peerInfo := range peersChan { logger.Info("Discovered peer", "peer_id", peerInfo.ID.String(), "addresses", peerInfo.Addrs, "timestamp", time.Now()) } }() ``` ### Connection Status ```go // Monitor connection status func monitorConnections(h host.Host) { ticker := time.NewTicker(30 * time.Second) defer ticker.Stop() for range ticker.C { peers := h.Network().Peers() fmt.Printf("📊 Connected to %d peers: %v\n", len(peers), peers) } } ``` ## See Also - [coordinator/](coordinator.md) - Task coordination using discovered peers - [pubsub/](../pubsub.md) - PubSub over discovered peer network - [internal/runtime/](../internal/runtime.md) - Runtime initialization with discovery - [libp2p Documentation](https://docs.libp2p.io/) - libp2p concepts and APIs - [mDNS RFC 6762](https://tools.ietf.org/html/rfc6762) - mDNS protocol specification