🚀 Generated with Claude Code - Project plan and architecture documentation - Python package structure with core modules - API design and basic usage examples - Development environment configuration - Literature review and research foundation Ready for Phase 1 implementation. Co-Authored-By: Claude <noreply@anthropic.com>
7.3 KiB
1. Research Motivation & Literature Review 🧠
- Semantic and context‑aware file systems: Gifford et al. (1991) proposed early semantic file systems using directory paths as semantic queries (Wikipedia). Later work explored tag‑based and ontology‑based systems for richer metadata and context-aware retrieval (Wikipedia).
- LLM‑driven semantic FS (LSFS): The recent ICLR 2025 LSFS proposes integrating vector DBs and semantic indexing into a filesystem that supports prompt-driven file operations and semantic rollback (OpenReview).
- Path-structure embeddings: Recent Transformer-based work shows file paths can be modeled as sequences for semantic anomaly detection—capturing hierarchy and semantics in embeddings (MDPI).
- Context modeling frameworks: Ontology-driven context models (e.g. OWL/SOCAM) support representing, reasoning about, and sharing context hierarchically (arXiv).
Your HCFS merges these prior insights into a hybrid: directory navigation = query scope, backed by semantic context blobs in a DB, enabling agentic systems to zoom in/out contextually.
2. Objectives & Scope
-
Design a virtual filesystem layer that maps hierarchical paths to context blobs.
-
Build a context storage system (DB) to hold context units, versioned and indexed.
-
Define APIs and syscalls for agents to:
- navigate context scope (
cd‑style), - request context retrieval,
- push new context,
- merge or inherit context across levels.
- navigate context scope (
-
Enable decentralized context sharing: agents can publish updates at path-nodes; peer agents subscribe by tree‑paths.
-
Prototype on a controlled dataset / toy project tree to validate:
- latency,
- correct retrieval,
- hierarchical inheritance semantics.
3. System Architecture Overview
3.1 Virtual Filesystem Layer (e.g. FUSE or AIOS integration)
- Presents standard POSIX (or AIOS‑style) tree structure.
- Each directory or file node has metadata pointers into context‑blob IDs.
- Traversal (e.g.,
ls,cd) triggers context lookup for that path.
3.2 Context Database Backend
-
Two possible designs:
- Relational/SQLite + versioned tables: simple, transactional, supports hierarchical inheritance via path parent pointers.
- Graph DB (e.g., Neo4j): ideal for multi-parent contexts, symlink-like context inheritance.
-
Context blobs include:
- blob ID,
- path(s) bound,
- timestamp/version, author/agent,
- embedding or semantic tags,
- content or summary.
3.3 Indexing & Embeddings
- Generate embeddings of context blobs for semantic similarity retrieval (e.g. for context folding) (OpenReview, OpenReview, MDPI).
- Use combination of BM25 + embedding ranking (contextual retrieval) for accurate scope-based retrieval (TECHCOMMUNITY.MICROSOFT.COM).
3.4 API & Syscalls
context_cd(path): sets current context pointer.context_get(depth=N): retrieves cumulative context from current node up N levels.context_push(path, blob): insert new context tied to a path.context_list(path): lists available context blobs at that path.context_subscribe(path): agent registers to receive updates at a path.
4. Project Timeline & Milestones
| Phase | Duration | Deliverables |
|---|---|---|
| Phase 0: Research & Design | 2 weeks | Literature review doc, architecture draft |
| Phase 1: Prototype FS layer | 4 weeks | Minimal FUSE‑based path→context mapping, CLI demo |
| Phase 2: Backend DB & storage | 4 weeks | Context blob storage, path linkage, versioning |
| Phase 3: Embedding & retrieval integration | 3 weeks | Embeddings + BM25 hybrid ranking for context relevance |
| Phase 4: API/Syscall layer scripting | 3 weeks | Python (or AIOS) service exposing navigation + push APIs |
| Phase 5: Agent integration & simulation | 3 weeks | Dummy AI agents navigating, querying, publishing context |
| Phase 6: Evaluation & refinement | 2 weeks | Usability, latency, retrieval relevance metrics |
| Phase 7: Write-up & publication | 2 weeks | Report, possible poster/paper submission |
5. Risks & Alternatives
- Semantic vs hierarchical mismatch: Flat tag systems (e.g. Tagsistant) offer semantic tagging but lack path-based inheritance (research.ijcaonline.org, OpenReview, Wikipedia, arXiv, Anthropic, OpenReview, Wikipedia).
- Context explosion: many small blobs flooding the DB—mitigate via summarization/folding.
- Performance trade‑offs: FS lookups must stay acceptable; versioned graph storage might slow down. Consider caching snapshots at each node.
6. Peer‑Reviewed References
- David Gifford et al., Semantic file systems, ACM Operating Systems Review (1991) (Wikipedia)
- ICLR 2025: From Commands to Prompts: LLM-based Semantic File System for AIOS (LSFS) (OpenReview)
- Xiaoyu et al., Transformer-based path sequence modeling for file‑path anomaly detection (MDPI)
- Tao Gu et al., Ontology‑based Context Model in Intelligent Environments (SOCAM) (arXiv)
7. Next Steps
- Review cited literature, build an annotated bibliography.
- Choose backend stack (SQLite vs graph DB) and test embedding pipeline.
- Begin Phase 1: implementing minimal context‑aware FS mock.
Let me know if you’d like me to flesh out a proof‑of‑concept scaffold (for example, in Python + SQLite + FUSE), or write a full proposal for funding or conference submission!