# CHORUS License Validation System **Package**: `internal/licensing` **Purpose**: KACHING license authority integration with fail-closed validation **Critical**: License validation is **MANDATORY** at startup - invalid license = immediate exit ## Overview The CHORUS licensing system enforces software licensing through integration with the KACHING license authority. The system implements a **fail-closed** security model: if license validation fails, CHORUS will not start. This ensures that all running instances are properly licensed and authorized. ### Key Components - **Validator**: Core license validation with KACHING server communication - **LicenseGate**: Enhanced validation with caching, circuit breaker, and cluster lease management - **LicenseConfig**: Configuration structure for licensing parameters ### Security Model: FAIL-CLOSED ``` ┌─────────────────────────────────────────────────────────────┐ │ CHORUS STARTUP │ ├─────────────────────────────────────────────────────────────┤ │ │ │ 1. Load Configuration │ │ 2. Initialize Logger │ │ 3. ⚠️ VALIDATE LICENSE (CRITICAL GATE) │ │ │ │ │ ├─── SUCCESS ──→ Continue startup │ │ │ │ │ └─── FAILURE ──→ return error → IMMEDIATE EXIT │ │ │ │ 4. Initialize AI Provider │ │ 5. Start P2P Network │ │ 6. ... rest of initialization │ │ │ └─────────────────────────────────────────────────────────────┘ NO BYPASS: License validation cannot be skipped or bypassed. ``` --- ## Architecture ### 1. LicenseConfig Structure **Location**: `internal/licensing/validator.go` and `pkg/config/config.go` ```go // Core licensing configuration type LicenseConfig struct { LicenseID string // Unique license identifier ClusterID string // Cluster/deployment identifier KachingURL string // KACHING server URL } // Extended configuration in pkg/config type LicenseConfig struct { LicenseID string `yaml:"license_id"` ClusterID string `yaml:"cluster_id"` OrganizationName string `yaml:"organization_name"` KachingURL string `yaml:"kaching_url"` IsActive bool `yaml:"is_active"` LastValidated time.Time `yaml:"last_validated"` GracePeriodHours int `yaml:"grace_period_hours"` LicenseType string `yaml:"license_type"` ExpiresAt time.Time `yaml:"expires_at"` MaxNodes int `yaml:"max_nodes"` } ``` **Configuration Fields**: | Field | Required | Purpose | |-------|----------|---------| | `LicenseID` | ✅ Yes | Unique identifier for the license | | `ClusterID` | ✅ Yes | Identifies the cluster/deployment | | `KachingURL` | No | KACHING server URL (defaults to `http://localhost:8083`) | | `OrganizationName` | No | Organization name for tracking | | `LicenseType` | No | Type of license (e.g., "enterprise", "developer") | | `ExpiresAt` | No | License expiration timestamp | | `MaxNodes` | No | Maximum nodes allowed in cluster | --- ## Validation Flow ### Standard Validation Sequence ``` ┌──────────────────────────────────────────────────────────────────┐ │ License Validation Flow │ └──────────────────────────────────────────────────────────────────┘ 1. NewValidator(config) → Initialize Validator │ ├─→ Set KachingURL (default: http://localhost:8083) ├─→ Create HTTP client (timeout: 30s) └─→ Initialize LicenseGate │ └─→ Initialize Circuit Breaker └─→ Set Grace Period (90 seconds from start) 2. Validate() → Perform validation │ ├─→ ValidateWithContext(ctx) │ │ │ ├─→ Check required fields (LicenseID, ClusterID) │ │ │ ├─→ LicenseGate.Validate(ctx, agentID) │ │ │ │ │ ├─→ Check cached lease (if valid, use it) │ │ │ ├─→ validateCachedLease() │ │ │ │ ├─→ POST /api/v1/licenses/validate-lease │ │ │ │ ├─→ Include: lease_token, cluster_id, agent_id │ │ │ │ └─→ Response: valid, remaining_replicas, expires_at │ │ │ │ │ │ │ └─→ Cache hit? → SUCCESS │ │ │ │ │ ├─→ Cache miss? → Request new lease │ │ │ │ │ │ │ ├─→ breaker.Execute() [Circuit Breaker] │ │ │ │ │ │ │ │ │ ├─→ requestOrRenewLease() │ │ │ │ │ ├─→ POST /api/v1/licenses/{id}/cluster-lease │ │ │ │ │ ├─→ Request: cluster_id, requested_replicas, duration_minutes │ │ │ │ │ └─→ Response: lease_token, max_replicas, expires_at, lease_id │ │ │ │ │ │ │ │ │ ├─→ validateLease(lease, agentID) │ │ │ │ │ └─→ POST /api/v1/licenses/validate-lease │ │ │ │ │ │ │ │ │ └─→ storeLease() → Cache the valid lease │ │ │ │ │ │ │ └─→ Extend grace period (90s) │ │ │ │ │ └─→ Validation failed? │ │ │ │ │ ├─→ In grace period? → Log warning, ALLOW startup │ │ └─→ Outside grace period? → RETURN ERROR │ │ │ └─→ Fallback to validateLegacy() on LicenseGate failure │ │ │ ├─→ POST /v1/license/activate │ ├─→ Request: license_id, cluster_id, metadata │ └─→ Response: validation result │ └─→ Return validation result 3. Result Handling (in runtime/shared.go) │ ├─→ SUCCESS → Log "✅ License validation successful" │ → Continue initialization │ └─→ FAILURE → return error → CHORUS EXITS IMMEDIATELY ``` --- ## Component Details ### 1. Validator Component **File**: `internal/licensing/validator.go` The Validator is the primary component for license validation, providing communication with the KACHING license authority. #### Key Methods ##### NewValidator(config LicenseConfig) ```go func NewValidator(config LicenseConfig) *Validator ``` Creates a new license validator with: - HTTP client with 30-second timeout - Default KACHING URL if not specified - Initialized LicenseGate for enhanced validation ##### Validate() ```go func (v *Validator) Validate() error ``` Performs license validation with KACHING authority: - Validates required configuration fields - Uses LicenseGate for cached/enhanced validation - Falls back to legacy validation if needed - Returns error if validation fails ##### validateLegacy() ```go func (v *Validator) validateLegacy() error ``` Legacy validation method (fallback): - Direct HTTP POST to `/v1/license/activate` - Sends license metadata (product, version, container flag) - **Fail-closed**: Network error = validation failure - Parses and validates response status **Request Example**: ```json { "license_id": "lic_abc123", "cluster_id": "cluster_xyz789", "metadata": { "product": "CHORUS", "version": "0.1.0-dev", "container": "true" } } ``` **Response Example (Success)**: ```json { "status": "ok", "message": "License valid", "expires_at": "2025-12-31T23:59:59Z" } ``` **Response Example (Failure)**: ```json { "status": "error", "message": "License expired" } ``` --- ### 2. LicenseGate Component **File**: `internal/licensing/license_gate.go` Enhanced license validation with caching, circuit breaker, and cluster lease management for production scalability. #### Key Features - **Caching**: Stores valid lease tokens to reduce KACHING load - **Circuit Breaker**: Prevents cascade failures during KACHING outages - **Grace Period**: 90-second startup grace period for transient failures - **Cluster Leases**: Supports multi-replica deployments with lease tokens - **Burst Protection**: Rate limiting and retry logic #### Data Structures ##### cachedLease ```go type cachedLease struct { LeaseToken string `json:"lease_token"` ExpiresAt time.Time `json:"expires_at"` ClusterID string `json:"cluster_id"` Valid bool `json:"valid"` CachedAt time.Time `json:"cached_at"` } ``` **Lease Validation**: - Lease considered invalid 2 minutes before actual expiry (safety margin) - Invalid leases are evicted from cache automatically ##### LeaseRequest ```go type LeaseRequest struct { ClusterID string `json:"cluster_id"` RequestedReplicas int `json:"requested_replicas"` DurationMinutes int `json:"duration_minutes"` } ``` ##### LeaseResponse ```go type LeaseResponse struct { LeaseToken string `json:"lease_token"` MaxReplicas int `json:"max_replicas"` ExpiresAt time.Time `json:"expires_at"` ClusterID string `json:"cluster_id"` LeaseID string `json:"lease_id"` } ``` ##### LeaseValidationRequest ```go type LeaseValidationRequest struct { LeaseToken string `json:"lease_token"` ClusterID string `json:"cluster_id"` AgentID string `json:"agent_id"` } ``` ##### LeaseValidationResponse ```go type LeaseValidationResponse struct { Valid bool `json:"valid"` RemainingReplicas int `json:"remaining_replicas"` ExpiresAt time.Time `json:"expires_at"` } ``` #### Circuit Breaker Configuration ```go breakerSettings := gobreaker.Settings{ Name: "license-validation", MaxRequests: 3, // Allow 3 requests in half-open state Interval: 60 * time.Second, // Reset failure count every minute Timeout: 30 * time.Second, // Stay open for 30 seconds ReadyToTrip: func(counts gobreaker.Counts) bool { return counts.ConsecutiveFailures >= 3 // Trip after 3 failures }, OnStateChange: func(name string, from, to gobreaker.State) { fmt.Printf("🔌 License validation circuit breaker: %s -> %s\n", from, to) }, } ``` **Circuit Breaker States**: | State | Behavior | Transition | |-------|----------|------------| | **Closed** | Normal operation, requests pass through | 3 consecutive failures → **Open** | | **Open** | All requests fail immediately (30s) | After timeout → **Half-Open** | | **Half-Open** | Allow 3 test requests | Success → **Closed**, Failure → **Open** | #### Key Methods ##### NewLicenseGate(config LicenseConfig) ```go func NewLicenseGate(config LicenseConfig) *LicenseGate ``` Initializes license gate with: - Circuit breaker with production settings - HTTP client with 10-second timeout - 90-second grace period from startup ##### Validate(ctx context.Context, agentID string) ```go func (g *LicenseGate) Validate(ctx context.Context, agentID string) error ``` Primary validation method: 1. **Check cache**: If valid cached lease exists, validate it 2. **Cache miss**: Request new lease through circuit breaker 3. **Store result**: Cache successful lease for future requests 4. **Grace period**: Allow startup during grace period even if validation fails 5. **Extend grace**: Extend grace period on successful validation ##### validateCachedLease(ctx, lease, agentID) ```go func (g *LicenseGate) validateCachedLease(ctx context.Context, lease *cachedLease, agentID string) error ``` Validates cached lease token: - POST to `/api/v1/licenses/validate-lease` - Invalidates cache if validation fails - Returns error if lease is no longer valid ##### requestOrRenewLease(ctx) ```go func (g *LicenseGate) requestOrRenewLease(ctx context.Context) (*LeaseResponse, error) ``` Requests new cluster lease: - POST to `/api/v1/licenses/{license_id}/cluster-lease` - Default: 1 replica, 60-minute duration - Handles rate limiting (429 Too Many Requests) - Returns lease token and metadata ##### GetCacheStats() ```go func (g *LicenseGate) GetCacheStats() map[string]interface{} ``` Returns cache statistics for monitoring: ```json { "cache_valid": true, "cache_hit": true, "expires_at": "2025-09-30T15:30:00Z", "cached_at": "2025-09-30T14:30:00Z", "in_grace_period": false, "breaker_state": "closed", "grace_until": "2025-09-30T14:31:30Z" } ``` --- ## KACHING Server Integration ### API Endpoints #### 1. Legacy Activation Endpoint **Endpoint**: `POST /v1/license/activate` **Purpose**: Legacy license validation (fallback) **Request**: ```json { "license_id": "lic_abc123", "cluster_id": "cluster_xyz789", "metadata": { "product": "CHORUS", "version": "0.1.0-dev", "container": "true" } } ``` **Response (Success)**: ```json { "status": "ok", "message": "License valid", "expires_at": "2025-12-31T23:59:59Z" } ``` **Response (Failure)**: ```json { "status": "error", "message": "License expired" } ``` #### 2. Cluster Lease Endpoint **Endpoint**: `POST /api/v1/licenses/{license_id}/cluster-lease` **Purpose**: Request cluster deployment lease **Request**: ```json { "cluster_id": "cluster_xyz789", "requested_replicas": 1, "duration_minutes": 60 } ``` **Response (Success)**: ```json { "lease_token": "lease_def456", "max_replicas": 5, "expires_at": "2025-09-30T15:30:00Z", "cluster_id": "cluster_xyz789", "lease_id": "lease_def456" } ``` **Response (Rate Limited)**: ``` HTTP 429 Too Many Requests Retry-After: 60 ``` #### 3. Lease Validation Endpoint **Endpoint**: `POST /api/v1/licenses/validate-lease` **Purpose**: Validate lease token for agent startup **Request**: ```json { "lease_token": "lease_def456", "cluster_id": "cluster_xyz789", "agent_id": "agent_001" } ``` **Response (Success)**: ```json { "valid": true, "remaining_replicas": 4, "expires_at": "2025-09-30T15:30:00Z" } ``` **Response (Invalid)**: ```json { "valid": false, "remaining_replicas": 0, "expires_at": "2025-09-30T14:30:00Z" } ``` --- ## Validation Sequence Diagram ``` ┌─────────┐ ┌───────────┐ ┌──────────────┐ ┌──────────┐ │ CHORUS │ │ Validator │ │ LicenseGate │ │ KACHING │ │ Runtime │ │ │ │ │ │ Server │ └────┬────┘ └─────┬─────┘ └──────┬───────┘ └────┬─────┘ │ │ │ │ │ InitializeRuntime()│ │ │ │───────────────────>│ │ │ │ │ │ │ │ │ Validate() │ │ │ │──────────────────────>│ │ │ │ │ │ │ │ │ Check cache │ │ │ │────────┐ │ │ │ │ │ │ │ │ │<───────┘ │ │ │ │ │ │ │ │ Cache miss │ │ │ │ │ │ │ │ POST /cluster-lease │ │ │ │─────────────────────>│ │ │ │ │ │ │ │ Lease Response │ │ │ │<─────────────────────│ │ │ │ │ │ │ │ POST /validate-lease │ │ │ │─────────────────────>│ │ │ │ │ │ │ │ Validation Response │ │ │ │<─────────────────────│ │ │ │ │ │ │ │ Store in cache │ │ │ │────────┐ │ │ │ │ │ │ │ │ │<───────┘ │ │ │ │ │ │ │ SUCCESS │ │ │ │<──────────────────────│ │ │ │ │ │ │ Continue startup │ │ │ │<───────────────────│ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ FAILURE SCENARIO │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ POST /validate-lease │ │ │ │─────────────────────>│ │ │ │ │ │ │ │ INVALID LICENSE │ │ │ │<─────────────────────│ │ │ │ │ │ │ │ Check grace period │ │ │ │────────┐ │ │ │ │ │ │ │ │ │<───────┘ │ │ │ │ │ │ │ │ Outside grace period │ │ │ │ │ │ │ ERROR │ │ │ │<──────────────────────│ │ │ │ │ │ │ return error │ │ │ │<───────────────────│ │ │ │ │ │ │ │ EXIT │ │ │ │────────X │ │ │ ``` --- ## Error Handling ### Error Categories #### 1. Configuration Errors **Condition**: Missing required configuration fields ```go if v.config.LicenseID == "" || v.config.ClusterID == "" { return fmt.Errorf("license ID and cluster ID are required") } ``` **Result**: Immediate validation failure → CHORUS exits #### 2. Network Errors **Condition**: Cannot contact KACHING server ```go resp, err := v.client.Post(licenseURL, "application/json", bytes.NewReader(requestBody)) if err != nil { // FAIL-CLOSED: No network = No license = No operation return fmt.Errorf("unable to contact license authority: %w", err) } ``` **Result**: - Outside grace period: Immediate validation failure → CHORUS exits - Inside grace period: Log warning, allow startup **Fail-Closed Behavior**: Network unavailability does NOT allow bypass #### 3. Invalid License Errors **Condition**: KACHING rejects license ```go if resp.StatusCode != http.StatusOK { message := "license validation failed" if msg, ok := licenseResponse["message"].(string); ok { message = msg } return fmt.Errorf("license validation failed: %s", message) } ``` **Possible Messages**: - "License expired" - "License revoked" - "License not found" - "Cluster ID mismatch" - "Maximum nodes exceeded" **Result**: Immediate validation failure → CHORUS exits #### 4. Rate Limiting Errors **Condition**: Too many requests to KACHING ```go if resp.StatusCode == http.StatusTooManyRequests { return nil, fmt.Errorf("rate limited by KACHING, retry after: %s", resp.Header.Get("Retry-After")) } ``` **Result**: - Circuit breaker may trip after repeated rate limiting - Grace period allows startup if rate limiting is transient #### 5. Circuit Breaker Errors **Condition**: Circuit breaker is open (too many failures) **Result**: - All requests fail immediately - Grace period allows startup if breaker trips during initialization - Circuit breaker auto-recovers after timeout (30s) --- ## Error Messages Reference ### User-Facing Error Messages | Error Message | Cause | Resolution | |--------------|-------|------------| | `license ID and cluster ID are required` | Missing configuration | Set `CHORUS_LICENSE_ID` and `CHORUS_CLUSTER_ID` | | `unable to contact license authority` | Network error | Check KACHING server accessibility | | `license validation failed: License expired` | Expired license | Renew license with vendor | | `license validation failed: License revoked` | Revoked license | Contact vendor | | `license validation failed: Cluster ID mismatch` | Wrong cluster | Use correct cluster configuration | | `rate limited by KACHING` | Too many requests | Wait for rate limit reset | | `lease token is invalid` | Expired or invalid lease | System will auto-request new lease | | `lease validation failed with status 404` | Lease not found | System will auto-request new lease | | `License validation failed but in grace period` | Transient failure during startup | System continues with warning | --- ## Grace Period Mechanism ### Purpose The grace period allows CHORUS to start even when license validation temporarily fails, preventing service disruption due to transient network issues or KACHING server maintenance. ### Behavior - **Duration**: 90 seconds from startup - **Triggered**: When validation fails but grace period is active - **Effect**: Validation returns success with warning log - **Extension**: Grace period extends by 90s on each successful validation - **Expiry**: After grace period expires, validation failures cause immediate exit ### Grace Period States ``` ┌──────────────────────────────────────────────────────────────┐ │ Grace Period Timeline │ └──────────────────────────────────────────────────────────────┘ T+0s ┌─────────────────────────────────────────────┐ │ GRACE PERIOD ACTIVE (90s) │ │ Validation failures allowed with warning │ └─────────────────────────────────────────────┘ T+30s │ Validation SUCCESS │ └──> Grace period extended to T+120s │ T+90s │ Grace period expires (no successful validation) └──> Next validation failure causes exit │ T+120s │ (Extended) Grace period expires └──> Next validation failure causes exit │ ``` ### Implementation ```go // Initialize grace period at startup func NewLicenseGate(config LicenseConfig) *LicenseGate { gate := &LicenseGate{...} gate.graceUntil.Store(time.Now().Add(90 * time.Second)) return gate } // Check grace period during validation if err != nil { if g.isInGracePeriod() { fmt.Printf("⚠️ License validation failed but in grace period: %v\n", err) return nil // Allow startup } return fmt.Errorf("license validation failed: %w", err) } // Extend grace period on success g.extendGracePeriod() // Adds 90s to current time ``` --- ## Startup Integration ### Location **File**: `internal/runtime/shared.go` **Function**: `InitializeRuntime()` ### Integration Point ```go func InitializeRuntime(cfg *config.CHORUSConfig) (*RuntimeContext, error) { // ... early initialization ... // CRITICAL: Validate license before any P2P operations runtime.Logger.Info("🔐 Validating CHORUS license with KACHING...") licenseValidator := licensing.NewValidator(licensing.LicenseConfig{ LicenseID: cfg.License.LicenseID, ClusterID: cfg.License.ClusterID, KachingURL: cfg.License.KachingURL, }) if err := licenseValidator.Validate(); err != nil { // This error causes InitializeRuntime to return error // which causes main() to exit immediately return nil, fmt.Errorf("license validation failed: %v", err) } runtime.Logger.Info("✅ License validation successful - CHORUS authorized to run") // ... continue with P2P, AI provider initialization, etc ... } ``` ### Execution Order ``` 1. Load configuration from YAML 2. Initialize logger 3. ⚠️ VALIDATE LICENSE ⚠️ └─→ FAILURE → return error → main() exits 4. Initialize AI provider 5. Initialize metrics collector 6. Initialize SHHH sentinel 7. Initialize P2P network 8. Start HAP server 9. Enter main runtime loop ``` **Critical Note**: License validation occurs **BEFORE** any P2P networking or AI provider initialization. If validation fails, no network connections are made and no services are started. --- ## Configuration Examples ### Minimal Configuration ```yaml license: license_id: "lic_abc123" cluster_id: "cluster_xyz789" ``` KACHING URL defaults to `http://localhost:8083` ### Production Configuration ```yaml license: license_id: "lic_prod_abc123" cluster_id: "cluster_production_xyz789" kaching_url: "https://kaching.chorus.services" organization_name: "Acme Corporation" license_type: "enterprise" max_nodes: 10 ``` ### Development Configuration ```yaml license: license_id: "lic_dev_abc123" cluster_id: "cluster_dev_local" kaching_url: "http://localhost:8083" organization_name: "Development Team" license_type: "developer" max_nodes: 1 ``` ### Environment Variables Licensing configuration can also be set via environment variables: ```bash export CHORUS_LICENSE_ID="lic_abc123" export CHORUS_CLUSTER_ID="cluster_xyz789" export CHORUS_KACHING_URL="http://localhost:8083" ``` --- ## Monitoring and Observability ### Log Messages #### Successful Validation ``` 🔐 Validating CHORUS license with KACHING... ✅ License validation successful - CHORUS authorized to run ``` #### Validation with Cached Lease ``` 🔐 Validating CHORUS license with KACHING... [Using cached lease token: lease_def456] ✅ License validation successful - CHORUS authorized to run ``` #### Validation During Grace Period ``` 🔐 Validating CHORUS license with KACHING... ⚠️ License validation failed but in grace period: unable to contact license authority ✅ License validation successful - CHORUS authorized to run ``` #### Circuit Breaker State Changes ``` 🔌 License validation circuit breaker: closed -> open 🔌 License validation circuit breaker: open -> half-open 🔌 License validation circuit breaker: half-open -> closed ``` #### Validation Failure (Fatal) ``` 🔐 Validating CHORUS license with KACHING... ❌ License validation failed: License expired Error: license validation failed: License expired [CHORUS exits] ``` ### Cache Statistics API ```go stats := licenseGate.GetCacheStats() ``` Returns: ```json { "cache_valid": true, "cache_hit": true, "expires_at": "2025-09-30T15:30:00Z", "cached_at": "2025-09-30T14:30:00Z", "in_grace_period": false, "breaker_state": "closed", "grace_until": "2025-09-30T14:31:30Z" } ``` ### Recommended Monitoring Metrics | Metric | Type | Description | |--------|------|-------------| | `license_validation_success` | Counter | Successful validations | | `license_validation_failure` | Counter | Failed validations | | `license_validation_duration_ms` | Histogram | Validation latency | | `license_cache_hit_rate` | Gauge | Percentage of cache hits | | `license_grace_period_active` | Gauge | 1 if in grace period, 0 otherwise | | `license_circuit_breaker_state` | Gauge | 0=closed, 1=half-open, 2=open | | `license_lease_expiry_seconds` | Gauge | Seconds until lease expiry | --- ## Cluster Lease Management ### Lease Lifecycle ``` ┌──────────────────────────────────────────────────────────────────┐ │ Cluster Lease Lifecycle │ └──────────────────────────────────────────────────────────────────┘ 1. REQUEST LEASE ├─→ POST /api/v1/licenses/{license_id}/cluster-lease ├─→ cluster_id: "cluster_xyz789" ├─→ requested_replicas: 1 └─→ duration_minutes: 60 2. RECEIVE LEASE ├─→ lease_token: "lease_def456" ├─→ max_replicas: 5 ├─→ expires_at: T+60m └─→ Store in cache 3. USE LEASE (per agent startup) ├─→ POST /api/v1/licenses/validate-lease ├─→ lease_token: "lease_def456" ├─→ cluster_id: "cluster_xyz789" ├─→ agent_id: "agent_001" └─→ Decrements remaining_replicas 4. LEASE EXPIRY ├─→ Cache invalidated at T+58m (2min safety margin) └─→ Next validation requests new lease 5. LEASE RENEWAL └─→ Automatic on cache invalidation ``` ### Multi-Replica Support The lease system supports multiple CHORUS agent replicas: - **max_replicas**: Maximum concurrent agents allowed - **remaining_replicas**: Available agent slots - **agent_id**: Unique identifier for each agent instance **Example**: License allows 5 replicas - Request lease → `max_replicas: 5` - Agent 1 validates → `remaining_replicas: 4` - Agent 2 validates → `remaining_replicas: 3` - Agent 6 validates → **FAILURE** (exceeds max_replicas) --- ## Security Considerations ### Fail-Closed Architecture The licensing system implements **fail-closed** security: - ✅ Network unavailable → Validation fails → CHORUS exits (unless in grace period) - ✅ KACHING server down → Validation fails → CHORUS exits (unless in grace period) - ✅ Invalid license → Validation fails → CHORUS exits (no grace period) - ✅ Expired license → Validation fails → CHORUS exits (no grace period) - ❌ No "development mode" bypass - ❌ No "skip validation" flag ### Grace Period Security The grace period is designed for transient failures, NOT as a bypass: - Limited to 90 seconds initially - Only extends on successful validation - Does NOT apply to invalid/expired licenses - Primarily for network/KACHING server availability issues ### License Token Security - Lease tokens are short-lived (default: 60 minutes) - Tokens cached in memory only (not persisted to disk) - Tokens include cluster_id binding (cannot be used by other clusters) - Agent ID tracking prevents token sharing between agents ### Network Security - HTTPS recommended for production KACHING URLs - 30-second timeout prevents hanging on network issues - Circuit breaker prevents cascade failures --- ## Troubleshooting ### Issue: "license ID and cluster ID are required" **Cause**: Missing configuration **Resolution**: ```yaml # config.yml license: license_id: "your_license_id" cluster_id: "your_cluster_id" ``` Or via environment: ```bash export CHORUS_LICENSE_ID="your_license_id" export CHORUS_CLUSTER_ID="your_cluster_id" ``` --- ### Issue: "unable to contact license authority" **Cause**: KACHING server unreachable **Resolution**: 1. Verify KACHING server is running 2. Check network connectivity: `curl http://localhost:8083/health` 3. Verify `kaching_url` configuration 4. Check firewall rules 5. If transient, grace period allows startup --- ### Issue: "license validation failed: License expired" **Cause**: License has expired **Resolution**: 1. Contact license vendor to renew 2. Update license_id in configuration 3. Restart CHORUS **Note**: Grace period does NOT apply to expired licenses --- ### Issue: "rate limited by KACHING" **Cause**: Too many validation requests **Resolution**: 1. Check for rapid restart loops 2. Verify cache is working (should reduce requests) 3. Wait for rate limit reset (check Retry-After header) 4. Consider increasing lease duration_minutes --- ### Issue: Circuit breaker stuck in "open" state **Cause**: Repeated validation failures **Resolution**: 1. Check KACHING server health 2. Verify license configuration 3. Circuit breaker auto-recovers after 30 seconds 4. Check grace period status: may allow startup during recovery --- ### Issue: "lease token is invalid" **Cause**: Lease expired or revoked **Resolution**: - System should auto-request new lease - If persistent, check license status with vendor - Verify cluster_id matches license configuration --- ## Testing ### Unit Testing ```go // Test license validation success func TestValidatorSuccess(t *testing.T) { // Mock KACHING server server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.WriteHeader(http.StatusOK) json.NewEncoder(w).Encode(map[string]interface{}{ "status": "ok", "message": "License valid", }) })) defer server.Close() validator := licensing.NewValidator(licensing.LicenseConfig{ LicenseID: "test_license", ClusterID: "test_cluster", KachingURL: server.URL, }) err := validator.Validate() assert.NoError(t, err) } // Test license validation failure func TestValidatorFailure(t *testing.T) { server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.WriteHeader(http.StatusForbidden) json.NewEncoder(w).Encode(map[string]interface{}{ "status": "error", "message": "License expired", }) })) defer server.Close() validator := licensing.NewValidator(licensing.LicenseConfig{ LicenseID: "test_license", ClusterID: "test_cluster", KachingURL: server.URL, }) err := validator.Validate() assert.Error(t, err) assert.Contains(t, err.Error(), "License expired") } ``` ### Integration Testing ```bash # Start KACHING test server docker run -p 8083:8083 kaching:latest # Test CHORUS startup with valid license export CHORUS_LICENSE_ID="test_lic_123" export CHORUS_CLUSTER_ID="test_cluster" ./chorus-agent # Expected output: # 🔐 Validating CHORUS license with KACHING... # ✅ License validation successful - CHORUS authorized to run # Test CHORUS startup with invalid license export CHORUS_LICENSE_ID="invalid_license" ./chorus-agent # Expected output: # 🔐 Validating CHORUS license with KACHING... # ❌ License validation failed: License not found # Error: license validation failed: License not found # [Exit code 1] ``` --- ## Future Enhancements ### Planned Features 1. **Offline License Support** - JWT-based license files for air-gapped deployments - Signature verification without KACHING connectivity 2. **License Renewal Automation** - Background renewal of expiring licenses - Alert system for upcoming expirations 3. **Multi-License Support** - Support for multiple license tiers - Feature flag based on license type 4. **License Analytics** - Usage metrics reporting to KACHING - License utilization dashboards 5. **Enhanced Lease Management** - Lease renewal before expiry - Dynamic replica scaling based on license --- ## API Constants ### Timeouts ```go const ( DefaultKachingURL = "http://localhost:8083" LicenseTimeout = 30 * time.Second // Validator HTTP timeout GateCTimeout = 10 * time.Second // LicenseGate HTTP timeout ) ``` ### Grace Period ```go const ( GracePeriodDuration = 90 * time.Second ) ``` ### Circuit Breaker ```go const ( MaxRequests = 3 // Half-open state test requests FailureThreshold = 3 // Consecutive failures to trip CircuitTimeout = 30 * time.Second // Open state duration FailureResetInterval = 60 * time.Second // Failure count reset ) ``` ### Lease Safety Margin ```go const ( LeaseSafetyMargin = 2 * time.Minute // Cache invalidation before expiry ) ``` --- ## Related Documentation - **KACHING License Server**: See KACHING documentation for server setup and API details - **CHORUS Configuration**: `/docs/comprehensive/pkg/config.md` - **CHORUS Runtime**: `/docs/comprehensive/internal/runtime.md` - **Deployment Guide**: `/docs/deployment.md` --- ## Summary The CHORUS licensing system provides robust, fail-closed license enforcement through integration with the KACHING license authority. Key characteristics: - **Mandatory**: License validation is required at startup - **Fail-Closed**: Invalid license or network failure prevents startup (outside grace period) - **Cached**: Lease tokens cached to reduce KACHING load - **Resilient**: Circuit breaker and grace period handle transient failures - **Scalable**: Cluster lease system supports multi-replica deployments - **Secure**: No bypass mechanisms, short-lived tokens, cluster binding The system ensures that all running CHORUS instances are properly licensed while providing operational flexibility through caching and grace periods for transient failures.