Files
HCFS/docs/ARCHITECTURE.md
Claude Code 8f19eaab25 Initial HCFS project scaffold
🚀 Generated with Claude Code

- Project plan and architecture documentation
- Python package structure with core modules
- API design and basic usage examples
- Development environment configuration
- Literature review and research foundation

Ready for Phase 1 implementation.

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-29 12:13:16 +10:00

146 lines
4.7 KiB
Markdown

# HCFS Architecture
## Overview
The Hierarchical Context File System (HCFS) is designed as a layered architecture that bridges filesystem navigation with semantic context storage and retrieval.
## System Components
### 1. Virtual Filesystem Layer (`src/hcfs/filesystem/`)
The virtual filesystem presents a standard POSIX-like directory structure backed by context blobs rather than traditional files.
**Key Components:**
- **HCFSFilesystem**: Main filesystem interface
- **HCFSFuseOperations**: FUSE-based filesystem operations (readdir, getattr, etc.)
**Responsibilities:**
- Present hierarchical path structure to agents
- Map filesystem operations to context queries
- Handle path-based navigation (`cd`, `ls`, etc.)
- Maintain current context scope per session
### 2. Storage Backend (`src/hcfs/storage/`)
The storage layer manages persistent context blob storage with versioning and metadata.
**Key Components:**
- **ContextStorage**: Abstract storage interface
- **SQLiteBackend**: SQLite-based implementation
- **StoredContextBlob**: Storage data models
- **ContextMetadata**: Metadata and versioning
**Responsibilities:**
- Persist context blobs with versioning
- Store path-to-context mappings
- Manage hierarchical inheritance relationships
- Provide ACID guarantees for context operations
### 3. Indexing & Semantic Search (`src/hcfs/indexing/`)
The indexing layer provides semantic search capabilities over context blobs.
**Key Components:**
- **EmbeddingEngine**: Generate embeddings for context content
- **SemanticSearch**: Vector similarity search
- **HybridRanker**: Combines BM25 + embedding scores
**Responsibilities:**
- Generate and store embeddings for context blobs
- Provide semantic similarity search
- Rank results by relevance (hybrid BM25 + vector)
- Support context folding and summarization
### 4. Agent API (`src/hcfs/api/`)
The API layer exposes syscall-style functions for agent interaction.
**Key Components:**
- **ContextAPI**: Main agent-facing API
- **ContextBlob**: Context data models
- **ContextPath**: Path representation
- **ContextQuery**: Query models
**Core API Functions:**
```python
# Navigation
context_cd(path: str) -> bool
context_pwd() -> str
# Retrieval
context_get(depth: int = 1) -> List[ContextBlob]
context_list(path: str = None) -> List[str]
# Manipulation
context_push(path: str, blob: ContextBlob) -> str
context_delete(path: str, blob_id: str) -> bool
# Subscription
context_subscribe(path: str, callback: Callable) -> str
context_unsubscribe(subscription_id: str) -> bool
```
### 5. Utilities (`src/hcfs/utils/`)
Common utilities and configuration management.
**Key Components:**
- **HCFSConfig**: Configuration management
- **path_utils**: Path manipulation utilities
- **logging**: Structured logging
## Data Flow
```
Agent → ContextAPI → HCFSFilesystem → ContextStorage
↓ ↓
SemanticSearch ← EmbeddingEngine
```
### Example: Context Retrieval
1. Agent calls `context_cd("/project/src/")`
2. ContextAPI validates path and sets current scope
3. HCFSFilesystem updates virtual directory state
4. Agent calls `context_get(depth=2)`
5. ContextAPI queries ContextStorage for context at `/project/src/` and `/project/`
6. SemanticSearch ranks and filters results
7. Merged context returned to agent
### Example: Context Publishing
1. Agent calls `context_push("/project/src/module.py", blob)`
2. ContextAPI validates blob and path
3. EmbeddingEngine generates embeddings for blob content
4. ContextStorage persists blob with versioning
5. Subscription notifications sent to interested agents
## Hierarchical Inheritance
Context blobs inherit from parent paths using configurable strategies:
- **Append**: Child context appends to parent context
- **Override**: Child context overrides parent context
- **Merge**: Intelligent merging based on content type
- **Isolate**: No inheritance, child context standalone
## Concurrency & Consistency
- **Read Scalability**: Multiple agents can read simultaneously
- **Write Coordination**: Optimistic locking with conflict resolution
- **Versioning**: All context changes create new versions
- **Subscription**: Pub/sub notifications for context changes
## Performance Considerations
- **Caching**: LRU cache for frequently accessed contexts
- **Lazy Loading**: Context blobs loaded on-demand
- **Batch Operations**: Bulk context operations for efficiency
- **Index Optimization**: Separate indices for path, content, and metadata queries
## Security Model
- **Path Permissions**: ACL-based access control per path
- **Agent Authentication**: Token-based agent identification
- **Content Validation**: Schema validation for context blobs
- **Audit Logging**: All context operations logged for accountability