Files
Claude Code 131868bdca feat: Production readiness improvements for WHOOSH council formation
Major security, observability, and configuration improvements:

## Security Hardening
- Implemented configurable CORS (no more wildcards)
- Added comprehensive auth middleware for admin endpoints
- Enhanced webhook HMAC validation
- Added input validation and rate limiting
- Security headers and CSP policies

## Configuration Management
- Made N8N webhook URL configurable (WHOOSH_N8N_BASE_URL)
- Replaced all hardcoded endpoints with environment variables
- Added feature flags for LLM vs heuristic composition
- Gitea fetch hardening with EAGER_FILTER and FULL_RESCAN options

## API Completeness
- Implemented GetCouncilComposition function
- Added GET /api/v1/councils/{id} endpoint
- Council artifacts API (POST/GET /api/v1/councils/{id}/artifacts)
- /admin/health/details endpoint with component status
- Database lookup for repository URLs (no hardcoded fallbacks)

## Observability & Performance
- Added OpenTelemetry distributed tracing with goal/pulse correlation
- Performance optimization database indexes
- Comprehensive health monitoring
- Enhanced logging and error handling

## Infrastructure
- Production-ready P2P discovery (replaces mock implementation)
- Removed unused Redis configuration
- Enhanced Docker Swarm integration
- Added migration files for performance indexes

## Code Quality
- Comprehensive input validation
- Graceful error handling and failsafe fallbacks
- Backwards compatibility maintained
- Following security best practices

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-12 20:34:17 +10:00

110 lines
3.5 KiB
Go

// Package sdk provides the BACKBEAT Go SDK for enabling CHORUS services
// to become BACKBEAT-aware with beat synchronization and status emission.
//
// The BACKBEAT SDK enables services to:
// - Subscribe to cluster-wide beat events with jitter tolerance
// - Emit status claims with automatic metadata population
// - Use beat budgets for timeout management
// - Operate in local degradation mode when pulse unavailable
// - Integrate comprehensive observability and health reporting
//
// # Quick Start
//
// config := sdk.DefaultConfig()
// config.ClusterID = "chorus-dev"
// config.AgentID = "my-service"
// config.NATSUrl = "nats://localhost:4222"
//
// client := sdk.NewClient(config)
//
// client.OnBeat(func(beat sdk.BeatFrame) {
// // Called every beat
// client.EmitStatusClaim(sdk.StatusClaim{
// State: "executing",
// Progress: 0.5,
// Notes: "Processing data",
// })
// })
//
// ctx := context.Background()
// client.Start(ctx)
// defer client.Stop()
//
// # Beat Subscription
//
// Register callbacks for beat and downbeat events:
//
// client.OnBeat(func(beat sdk.BeatFrame) {
// // Called every beat (~1-4 times per second depending on tempo)
// fmt.Printf("Beat %d\n", beat.BeatIndex)
// })
//
// client.OnDownbeat(func(beat sdk.BeatFrame) {
// // Called at the start of each bar (every 4 beats typically)
// fmt.Printf("Bar started: %s\n", beat.WindowID)
// })
//
// # Status Emission
//
// Emit status claims to report current state and progress:
//
// err := client.EmitStatusClaim(sdk.StatusClaim{
// State: "executing", // executing|planning|waiting|review|done|failed
// BeatsLeft: 10, // estimated beats remaining
// Progress: 0.75, // progress ratio (0.0-1.0)
// Notes: "Processing batch 5/10",
// })
//
// # Beat Budgets
//
// Execute functions with beat-based timeouts:
//
// err := client.WithBeatBudget(10, func() error {
// // This function has 10 beats to complete
// return performLongRunningTask()
// })
//
// if err != nil {
// // Handle timeout or task error
// log.Printf("Task failed or exceeded budget: %v", err)
// }
//
// # Health and Observability
//
// Monitor client health and metrics:
//
// health := client.Health()
// fmt.Printf("Connected: %v\n", health.Connected)
// fmt.Printf("Last Beat: %d\n", health.LastBeat)
// fmt.Printf("Reconnects: %d\n", health.ReconnectCount)
//
// # Local Degradation
//
// The SDK automatically handles network issues by entering local degradation mode:
// - Generates synthetic beats when pulse service unavailable
// - Uses fallback timing to maintain callback schedules
// - Automatically recovers when pulse service returns
// - Provides seamless operation during network partitions
//
// # Security
//
// The SDK implements BACKBEAT security requirements:
// - Ed25519 signing of all status claims when key provided
// - Required x-window-id and x-hlc headers
// - Agent identification for proper message routing
//
// # Performance
//
// Designed for production use with:
// - Beat callback latency target ≤5ms
// - Timer drift ≤1% over 1 hour without leader
// - Goroutine-safe concurrent operations
// - Bounded memory usage for metrics and errors
//
// # Examples
//
// See the examples subdirectory for complete usage patterns:
// - examples/simple_agent.go: Basic integration
// - examples/task_processor.go: Beat budget usage
// - examples/service_monitor.go: Health monitoring
package sdk