Initial HCFS project scaffold

🚀 Generated with Claude Code

- Project plan and architecture documentation
- Python package structure with core modules
- API design and basic usage examples
- Development environment configuration
- Literature review and research foundation

Ready for Phase 1 implementation.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Claude Code
2025-07-29 12:13:16 +10:00
commit 8f19eaab25
14 changed files with 1150 additions and 0 deletions

314
docs/API_REFERENCE.md Normal file
View File

@@ -0,0 +1,314 @@
# HCFS API Reference
## Overview
The HCFS API provides syscall-style functions for agents to navigate, query, and manipulate hierarchical context. All operations are designed to be familiar to agents accustomed to filesystem operations.
## Core Navigation API
### `context_cd(path: str) -> bool`
Change the current context directory. Similar to the shell `cd` command.
**Parameters:**
- `path`: Target path (absolute or relative)
**Returns:**
- `True` if path exists and is accessible
- `False` if path does not exist or is inaccessible
**Example:**
```python
# Navigate to project root
success = context_cd("/project")
# Navigate to subdirectory
success = context_cd("src/models")
# Navigate up one level
success = context_cd("..")
```
### `context_pwd() -> str`
Get the current context working directory.
**Returns:**
- Current absolute path as string
**Example:**
```python
current_path = context_pwd()
# Returns: "/project/src/models"
```
### `context_ls(path: str = None) -> List[str]`
List available context paths at the specified directory.
**Parameters:**
- `path`: Directory path (default: current directory)
**Returns:**
- List of child path names
**Example:**
```python
# List current directory
paths = context_ls()
# Returns: ["models/", "utils/", "tests/", "README.md"]
# List specific directory
paths = context_ls("/project/docs")
# Returns: ["api/", "architecture/", "examples/"]
```
## Context Retrieval API
### `context_get(depth: int = 1, filters: dict = None) -> List[ContextBlob]`
Retrieve context blobs from current path and optionally parent paths.
**Parameters:**
- `depth`: How many levels up the hierarchy to include (1 = current only)
- `filters`: Optional filters (content_type, author, date_range, etc.)
**Returns:**
- List of `ContextBlob` objects ordered by relevance
**Example:**
```python
# Get context from current path only
context = context_get(depth=1)
# Get context from current path and 2 parent levels
context = context_get(depth=3)
# Get context with filters
context = context_get(
depth=2,
filters={
"content_type": "documentation",
"author": "claude",
"since": "2025-01-01"
}
)
```
### `context_search(query: str, scope: str = None) -> List[ContextBlob]`
Perform semantic search across context blobs.
**Parameters:**
- `query`: Search query string
- `scope`: Path scope to limit search (default: current path and children)
**Returns:**
- List of `ContextBlob` objects ranked by relevance
**Example:**
```python
# Search within current scope
results = context_search("error handling patterns")
# Search within specific scope
results = context_search(
"database connection",
scope="/project/src/models"
)
```
## Context Manipulation API
### `context_push(path: str, blob: ContextBlob) -> str`
Add or update context at the specified path.
**Parameters:**
- `path`: Target path for the context
- `blob`: ContextBlob object containing content and metadata
**Returns:**
- Blob ID of the created/updated context
**Example:**
```python
from hcfs.api import ContextBlob
# Create new context blob
blob = ContextBlob(
content="This module handles user authentication",
content_type="documentation",
tags=["auth", "security", "users"],
metadata={"priority": "high"}
)
# Push to specific path
blob_id = context_push("/project/src/auth.py", blob)
```
### `context_delete(path: str, blob_id: str = None) -> bool`
Delete context blob(s) at the specified path.
**Parameters:**
- `path`: Target path
- `blob_id`: Specific blob ID (if None, deletes all blobs at path)
**Returns:**
- `True` if deletion successful
- `False` if path/blob not found or permission denied
**Example:**
```python
# Delete specific blob
success = context_delete("/project/src/auth.py", blob_id)
# Delete all context at path
success = context_delete("/project/old_module/")
```
### `context_update(blob_id: str, updates: dict) -> bool`
Update an existing context blob.
**Parameters:**
- `blob_id`: ID of blob to update
- `updates`: Dictionary of fields to update
**Returns:**
- `True` if update successful
- `False` if blob not found or permission denied
**Example:**
```python
# Update blob content and tags
success = context_update(blob_id, {
"content": "Updated documentation with new examples",
"tags": ["auth", "security", "users", "examples"]
})
```
## Subscription API
### `context_subscribe(path: str, callback: Callable, filters: dict = None) -> str`
Subscribe to context changes at the specified path.
**Parameters:**
- `path`: Path to monitor
- `callback`: Function to call when changes occur
- `filters`: Optional filters for subscription
**Returns:**
- Subscription ID string
**Example:**
```python
def on_context_change(event):
print(f"Context changed at {event.path}: {event.change_type}")
# Subscribe to changes in current directory
sub_id = context_subscribe(
"/project/src/",
callback=on_context_change,
filters={"change_type": ["create", "update"]}
)
```
### `context_unsubscribe(subscription_id: str) -> bool`
Cancel a context subscription.
**Parameters:**
- `subscription_id`: ID returned from `context_subscribe`
**Returns:**
- `True` if unsubscribe successful
- `False` if subscription not found
**Example:**
```python
success = context_unsubscribe(sub_id)
```
## Data Models
### ContextBlob
```python
class ContextBlob:
id: str # Unique blob identifier
content: str # Main content text
content_type: str # Type: "code", "documentation", "config", etc.
tags: List[str] # Searchable tags
metadata: Dict[str, Any] # Additional metadata
author: str # Creator identifier
created_at: datetime # Creation timestamp
updated_at: datetime # Last update timestamp
version: int # Version number
parent_version: Optional[str] # Parent blob ID if forked
```
### ContextPath
```python
class ContextPath:
path: str # Full path string
components: List[str] # Path components
depth: int # Depth from root
is_absolute: bool # True if absolute path
exists: bool # True if path has context
```
### ContextQuery
```python
class ContextQuery:
query: str # Search query
filters: Dict[str, Any] # Search filters
scope: str # Search scope path
limit: int # Max results
offset: int # Results offset
sort_by: str # Sort field
sort_order: str # "asc" or "desc"
```
## Error Handling
All API functions raise specific exceptions for different error conditions:
- `PathNotFoundError`: Path does not exist
- `PermissionDeniedError`: Insufficient permissions
- `InvalidPathError`: Malformed path syntax
- `ContextNotFoundError`: Context blob not found
- `ValidationError`: Invalid data provided
- `StorageError`: Backend storage error
**Example:**
```python
from hcfs.api import PathNotFoundError, PermissionDeniedError
try:
context = context_get(depth=2)
except PathNotFoundError:
print("Current path has no context")
except PermissionDeniedError:
print("Access denied to context")
```
## Configuration
API behavior can be configured via `HCFSConfig`:
```python
from hcfs.utils import HCFSConfig
config = HCFSConfig(
max_depth=10, # Maximum traversal depth
cache_size=1000, # LRU cache size
default_content_type="text", # Default blob content type
enable_versioning=True, # Enable blob versioning
subscription_timeout=300 # Subscription timeout (seconds)
)
```

146
docs/ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,146 @@
# HCFS Architecture
## Overview
The Hierarchical Context File System (HCFS) is designed as a layered architecture that bridges filesystem navigation with semantic context storage and retrieval.
## System Components
### 1. Virtual Filesystem Layer (`src/hcfs/filesystem/`)
The virtual filesystem presents a standard POSIX-like directory structure backed by context blobs rather than traditional files.
**Key Components:**
- **HCFSFilesystem**: Main filesystem interface
- **HCFSFuseOperations**: FUSE-based filesystem operations (readdir, getattr, etc.)
**Responsibilities:**
- Present hierarchical path structure to agents
- Map filesystem operations to context queries
- Handle path-based navigation (`cd`, `ls`, etc.)
- Maintain current context scope per session
### 2. Storage Backend (`src/hcfs/storage/`)
The storage layer manages persistent context blob storage with versioning and metadata.
**Key Components:**
- **ContextStorage**: Abstract storage interface
- **SQLiteBackend**: SQLite-based implementation
- **StoredContextBlob**: Storage data models
- **ContextMetadata**: Metadata and versioning
**Responsibilities:**
- Persist context blobs with versioning
- Store path-to-context mappings
- Manage hierarchical inheritance relationships
- Provide ACID guarantees for context operations
### 3. Indexing & Semantic Search (`src/hcfs/indexing/`)
The indexing layer provides semantic search capabilities over context blobs.
**Key Components:**
- **EmbeddingEngine**: Generate embeddings for context content
- **SemanticSearch**: Vector similarity search
- **HybridRanker**: Combines BM25 + embedding scores
**Responsibilities:**
- Generate and store embeddings for context blobs
- Provide semantic similarity search
- Rank results by relevance (hybrid BM25 + vector)
- Support context folding and summarization
### 4. Agent API (`src/hcfs/api/`)
The API layer exposes syscall-style functions for agent interaction.
**Key Components:**
- **ContextAPI**: Main agent-facing API
- **ContextBlob**: Context data models
- **ContextPath**: Path representation
- **ContextQuery**: Query models
**Core API Functions:**
```python
# Navigation
context_cd(path: str) -> bool
context_pwd() -> str
# Retrieval
context_get(depth: int = 1) -> List[ContextBlob]
context_list(path: str = None) -> List[str]
# Manipulation
context_push(path: str, blob: ContextBlob) -> str
context_delete(path: str, blob_id: str) -> bool
# Subscription
context_subscribe(path: str, callback: Callable) -> str
context_unsubscribe(subscription_id: str) -> bool
```
### 5. Utilities (`src/hcfs/utils/`)
Common utilities and configuration management.
**Key Components:**
- **HCFSConfig**: Configuration management
- **path_utils**: Path manipulation utilities
- **logging**: Structured logging
## Data Flow
```
Agent → ContextAPI → HCFSFilesystem → ContextStorage
↓ ↓
SemanticSearch ← EmbeddingEngine
```
### Example: Context Retrieval
1. Agent calls `context_cd("/project/src/")`
2. ContextAPI validates path and sets current scope
3. HCFSFilesystem updates virtual directory state
4. Agent calls `context_get(depth=2)`
5. ContextAPI queries ContextStorage for context at `/project/src/` and `/project/`
6. SemanticSearch ranks and filters results
7. Merged context returned to agent
### Example: Context Publishing
1. Agent calls `context_push("/project/src/module.py", blob)`
2. ContextAPI validates blob and path
3. EmbeddingEngine generates embeddings for blob content
4. ContextStorage persists blob with versioning
5. Subscription notifications sent to interested agents
## Hierarchical Inheritance
Context blobs inherit from parent paths using configurable strategies:
- **Append**: Child context appends to parent context
- **Override**: Child context overrides parent context
- **Merge**: Intelligent merging based on content type
- **Isolate**: No inheritance, child context standalone
## Concurrency & Consistency
- **Read Scalability**: Multiple agents can read simultaneously
- **Write Coordination**: Optimistic locking with conflict resolution
- **Versioning**: All context changes create new versions
- **Subscription**: Pub/sub notifications for context changes
## Performance Considerations
- **Caching**: LRU cache for frequently accessed contexts
- **Lazy Loading**: Context blobs loaded on-demand
- **Batch Operations**: Bulk context operations for efficiency
- **Index Optimization**: Separate indices for path, content, and metadata queries
## Security Model
- **Path Permissions**: ACL-based access control per path
- **Agent Authentication**: Token-based agent identification
- **Content Validation**: Schema validation for context blobs
- **Audit Logging**: All context operations logged for accountability