Files
CHORUS/pkg/slurp/storage/README.md
anthonyrawlins 9bdcbe0447 Integrate BACKBEAT SDK and resolve KACHING license validation
Major integrations and fixes:
- Added BACKBEAT SDK integration for P2P operation timing
- Implemented beat-aware status tracking for distributed operations
- Added Docker secrets support for secure license management
- Resolved KACHING license validation via HTTPS/TLS
- Updated docker-compose configuration for clean stack deployment
- Disabled rollback policies to prevent deployment failures
- Added license credential storage (CHORUS-DEV-MULTI-001)

Technical improvements:
- BACKBEAT P2P operation tracking with phase management
- Enhanced configuration system with file-based secrets
- Improved error handling for license validation
- Clean separation of KACHING and CHORUS deployment stacks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-06 07:56:26 +10:00

356 lines
16 KiB
Markdown

# SLURP Encrypted Context Storage Architecture
This package implements the complete encrypted context storage architecture for the SLURP (Storage, Logic, Understanding, Retrieval, Processing) system, providing production-ready storage capabilities with multi-tier architecture, role-based encryption, and comprehensive monitoring.
## Architecture Overview
The storage architecture consists of several key components working together to provide a robust, scalable, and secure storage system:
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ SLURP Storage Architecture │
├─────────────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────────────────┐ │
│ │ Application │ │ Intelligence │ │ Leader │ │
│ │ Layer │ │ Engine │ │ Manager │ │
│ └─────────────────┘ └──────────────────┘ └─────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────────────┤
│ ContextStore Interface │
├─────────────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────────────────┐ │
│ │ Encrypted │ │ Cache │ │ Index │ │
│ │ Storage │ │ Manager │ │ Manager │ │
│ └─────────────────┘ └──────────────────┘ └─────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────────────────┐ │
│ │ Local │ │ Distributed │ │ Backup │ │
│ │ Storage │ │ Storage │ │ Manager │ │
│ └─────────────────┘ └──────────────────┘ └─────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ Monitoring System │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
```
## Core Components
### 1. Context Store (`context_store.go`)
The main orchestrator that coordinates between all storage layers:
- **Multi-tier storage** with local and distributed backends
- **Role-based access control** with transparent encryption/decryption
- **Automatic caching** with configurable TTL and eviction policies
- **Search indexing** integration for fast context retrieval
- **Batch operations** for efficient bulk processing
- **Background processes** for sync, compaction, and cleanup
### 2. Encrypted Storage (`encrypted_storage.go`)
Role-based encrypted storage with enterprise-grade security:
- **Per-role encryption** using the existing CHORUS crypto system
- **Key rotation** with automatic re-encryption
- **Access control validation** with audit logging
- **Encryption metrics** tracking for performance monitoring
- **Key fingerprinting** for integrity verification
### 3. Local Storage (`local_storage.go`)
High-performance local storage using LevelDB:
- **LevelDB backend** with optimized configuration
- **Compression support** with automatic size optimization
- **TTL support** for automatic data expiration
- **Background compaction** for storage optimization
- **Metrics collection** for performance monitoring
### 4. Distributed Storage (`distributed_storage.go`)
DHT-based distributed storage with consensus:
- **Consistent hashing** for data distribution
- **Replication** with configurable replication factor
- **Consensus protocols** for consistency guarantees
- **Node health monitoring** with automatic failover
- **Rebalancing** for optimal data distribution
### 5. Cache Manager (`cache_manager.go`)
Redis-based high-performance caching:
- **Redis backend** with connection pooling
- **LRU/LFU eviction** policies
- **Compression** for large cache entries
- **TTL management** with refresh thresholds
- **Hit/miss metrics** for performance analysis
### 6. Index Manager (`index_manager.go`)
Full-text search using Bleve:
- **Multiple indexes** with different configurations
- **Full-text search** with highlighting and faceting
- **Index optimization** with background maintenance
- **Query performance** tracking and optimization
- **Index rebuild** capabilities for data recovery
### 7. Database Schema (`schema.go`)
Comprehensive database schema for all storage needs:
- **Context records** with versioning and metadata
- **Encrypted context records** with role-based access
- **Hierarchy relationships** for context inheritance
- **Decision hop tracking** for temporal analysis
- **Access control records** with permission management
- **Search indexes** with performance optimization
- **Backup metadata** with integrity verification
### 8. Monitoring System (`monitoring.go`)
Production-ready monitoring with Prometheus integration:
- **Comprehensive metrics** for all storage operations
- **Health checks** for system components
- **Alert management** with notification systems
- **Performance profiling** with bottleneck detection
- **Structured logging** with configurable output
### 9. Backup Manager (`backup_manager.go`)
Enterprise backup and recovery system:
- **Scheduled backups** with cron expressions
- **Incremental backups** for efficiency
- **Backup validation** with integrity checks
- **Encryption support** for backup security
- **Retention policies** with automatic cleanup
### 10. Batch Operations (`batch_operations.go`)
Optimized bulk operations:
- **Concurrent processing** with configurable worker pools
- **Error handling** with partial failure support
- **Progress tracking** for long-running operations
- **Transaction support** for consistency
- **Resource optimization** for large datasets
## Key Features
### Security
- **Role-based encryption** at the storage layer
- **Key rotation** with zero-downtime re-encryption
- **Access audit logging** for compliance
- **Secure key management** integration
- **Encryption performance** optimization
### Performance
- **Multi-tier caching** with Redis and in-memory layers
- **Batch operations** for bulk processing efficiency
- **Connection pooling** for database connections
- **Background optimization** with compaction and indexing
- **Query optimization** with proper indexing strategies
### Reliability
- **Distributed replication** with consensus protocols
- **Automatic failover** with health monitoring
- **Data consistency** guarantees across the cluster
- **Backup and recovery** with point-in-time restore
- **Error handling** with graceful degradation
### Monitoring
- **Prometheus metrics** for operational visibility
- **Health checks** for proactive monitoring
- **Performance profiling** for optimization insights
- **Structured logging** for debugging and analysis
- **Alert management** with notification systems
### Scalability
- **Horizontal scaling** with distributed storage
- **Consistent hashing** for data distribution
- **Load balancing** across storage nodes
- **Resource optimization** with compression and caching
- **Connection management** with pooling and limits
## Configuration
### Context Store Options
```go
type ContextStoreOptions struct {
PreferLocal bool // Prefer local storage for reads
AutoReplicate bool // Automatically replicate to distributed storage
DefaultReplicas int // Default replication factor
EncryptionEnabled bool // Enable role-based encryption
CompressionEnabled bool // Enable data compression
CachingEnabled bool // Enable caching layer
CacheTTL time.Duration // Default cache TTL
IndexingEnabled bool // Enable search indexing
SyncInterval time.Duration // Sync with distributed storage interval
CompactionInterval time.Duration // Local storage compaction interval
CleanupInterval time.Duration // Cleanup expired data interval
BatchSize int // Default batch operation size
MaxConcurrentOps int // Maximum concurrent operations
OperationTimeout time.Duration // Default operation timeout
}
```
### Performance Tuning
- **Cache size**: Configure based on available memory
- **Replication factor**: Balance between consistency and performance
- **Batch sizes**: Optimize for your typical workload
- **Timeout values**: Set appropriate timeouts for your network
- **Background intervals**: Balance between performance and resource usage
## Integration with CHORUS Systems
### DHT Integration
The distributed storage layer integrates seamlessly with the existing CHORUS DHT system:
- Uses existing node discovery and communication protocols
- Leverages consistent hashing algorithms
- Integrates with leader election for coordination
### Crypto Integration
The encryption layer uses the existing CHORUS crypto system:
- Role-based key management
- Shamir's Secret Sharing for key distribution
- Age encryption for data protection
- Audit logging for access tracking
### Election Integration
The leader coordination uses existing election systems:
- Context generation coordination
- Backup scheduling management
- Cluster-wide maintenance operations
## Usage Examples
### Basic Context Storage
```go
// Create context store
store := NewContextStore(nodeID, localStorage, distributedStorage,
encryptedStorage, cacheManager, indexManager, backupManager,
eventNotifier, options)
// Store a context
err := store.StoreContext(ctx, contextNode, []string{"developer", "architect"})
// Retrieve a context
context, err := store.RetrieveContext(ctx, ucxlAddress, "developer")
// Search contexts
results, err := store.SearchContexts(ctx, &SearchQuery{
Query: "authentication system",
Tags: []string{"security", "backend"},
Limit: 10,
})
```
### Batch Operations
```go
// Batch store multiple contexts
batch := &BatchStoreRequest{
Contexts: []*ContextStoreItem{
{Context: context1, Roles: []string{"developer"}},
{Context: context2, Roles: []string{"architect"}},
},
Roles: []string{"developer"}, // Default roles
FailOnError: false,
}
result, err := store.BatchStore(ctx, batch)
```
### Backup Management
```go
// Create a backup
backupConfig := &BackupConfig{
Name: "daily-backup",
Destination: "/backups/contexts",
IncludeIndexes: true,
IncludeCache: false,
Encryption: true,
Retention: 30 * 24 * time.Hour,
}
backupInfo, err := backupManager.CreateBackup(ctx, backupConfig)
// Schedule automatic backups
schedule := &BackupSchedule{
ID: "daily-schedule",
Name: "Daily Backup",
Cron: "0 2 * * *", // Daily at 2 AM
BackupConfig: backupConfig,
Enabled: true,
}
err = backupManager.ScheduleBackup(ctx, schedule)
```
## Monitoring and Alerts
### Prometheus Metrics
The system exports comprehensive metrics to Prometheus:
- Operation counters and latencies
- Error rates and types
- Cache hit/miss ratios
- Storage size and utilization
- Replication health
- Encryption performance
### Health Checks
Built-in health checks monitor:
- Storage backend connectivity
- Cache system availability
- Index system health
- Distributed node connectivity
- Encryption system status
### Alert Rules
Pre-configured alert rules for:
- High error rates
- Storage capacity issues
- Replication failures
- Performance degradation
- Security violations
## Security Considerations
### Data Protection
- All context data is encrypted at rest using role-based keys
- Key rotation is performed automatically without service interruption
- Access is strictly controlled and audited
- Backup data is encrypted with separate keys
### Access Control
- Role-based access control at the storage layer
- Fine-grained permissions for different operations
- Access audit logging for compliance
- Time-based and IP-based access restrictions
### Network Security
- All distributed communications use encrypted channels
- Node authentication and authorization
- Protection against replay attacks
- Secure key distribution using Shamir's Secret Sharing
## Performance Characteristics
### Throughput
- **Local operations**: Sub-millisecond latency
- **Cached operations**: 1-2ms latency
- **Distributed operations**: 10-50ms latency (network dependent)
- **Search operations**: 5-20ms latency (index size dependent)
### Scalability
- **Horizontal scaling**: Linear scaling with additional nodes
- **Storage capacity**: Petabyte-scale with proper cluster sizing
- **Concurrent operations**: Thousands of concurrent requests
- **Search performance**: Sub-second for most queries
### Resource Usage
- **Memory**: Configurable cache sizes, typically 1-8GB per node
- **Disk**: Local storage with compression, network replication
- **CPU**: Optimized for multi-core systems with worker pools
- **Network**: Efficient data distribution with minimal overhead
## Future Enhancements
### Planned Features
- **Geo-replication** for multi-region deployments
- **Query optimization** with machine learning insights
- **Advanced analytics** for context usage patterns
- **Integration APIs** for third-party systems
- **Performance auto-tuning** based on workload patterns
### Extensibility
The architecture is designed for extensibility:
- Plugin system for custom storage backends
- Configurable encryption algorithms
- Custom index analyzers for domain-specific search
- Extensible monitoring and alerting systems
- Custom batch operation processors
This storage architecture provides a solid foundation for the SLURP contextual intelligence system, offering enterprise-grade features while maintaining high performance and scalability.