Initial HCFS project scaffold

🚀 Generated with Claude Code

- Project plan and architecture documentation
- Python package structure with core modules
- API design and basic usage examples
- Development environment configuration
- Literature review and research foundation

Ready for Phase 1 implementation.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Claude Code
2025-07-29 12:13:16 +10:00
commit 8f19eaab25
14 changed files with 1150 additions and 0 deletions

129
PROJECT_PLAN.md Normal file
View File

@@ -0,0 +1,129 @@
---
# PROJECT\_PLAN.md
## 📘 Title
**ContextAware Hierarchical Context File System (HCFS)**: Unifying file system paths with context blobs for agentic AI cognition
---
## 1. Research Motivation & Literature Review 🧠
* **Semantic and contextaware file systems**: Gifford etal. (1991) proposed early semantic file systems using directory paths as semantic queries ([Wikipedia][1]). Later work explored tagbased and ontologybased systems for richer metadata and context-aware retrieval ([Wikipedia][1]).
* **LLMdriven semantic FS (LSFS)**: The recent ICLR 2025 LSFS proposes integrating vector DBs and semantic indexing into a filesystem that supports prompt-driven file operations and semantic rollback ([OpenReview][2]).
* **Path-structure embeddings**: Recent Transformer-based work shows file paths can be modeled as sequences for semantic anomaly detection—capturing hierarchy and semantics in embeddings ([MDPI][3]).
* **Context modeling frameworks**: Ontology-driven context models (e.g. OWL/SOCAM) support representing, reasoning about, and sharing context hierarchically ([arXiv][4]).
Your HCFS merges these prior insights into a hybrid: directory navigation = query scope, backed by semantic context blobs in a DB, enabling agentic systems to zoom in/out contextually.
---
## 2. Objectives & Scope
1. Design a **virtual filesystem layer** that maps hierarchical paths to context blobs.
2. Build a **context storage system** (DB) to hold context units, versioned and indexed.
3. Define **APIs and syscalls** for agents to:
* navigate context scope (`cd`style),
* request context retrieval,
* push new context,
* merge or inherit context across levels.
4. Enable **decentralized context sharing**: agents can publish updates at path-nodes; peer agents subscribe by treepaths.
5. Prototype on a controlled dataset / toy project tree to validate:
* latency,
* correct retrieval,
* hierarchical inheritance semantics.
---
## 3. System Architecture Overview
### 3.1 Virtual Filesystem Layer (e.g. FUSE or AIOS integration)
* Presents standard POSIX (or AIOSstyle) tree structure.
* Each directory or file node has metadata pointers into contextblob IDs.
* Traversal (e.g., `ls`, `cd`) triggers context lookup for that path.
### 3.2 Context Database Backend
* Two possible designs:
* **Relational/SQLite + versioned tables**: simple, transactional, supports hierarchical inheritance via path parent pointers.
* **Graph DB (e.g., Neo4j)**: ideal for multi-parent contexts, symlink-like context inheritance.
* Context blobs include:
* blob ID,
* path(s) bound,
* timestamp/version, author/agent,
* embedding or semantic tags,
* content or summary.
### 3.3 Indexing & Embeddings
* Generate embeddings of context blobs for semantic similarity retrieval (e.g. for context folding) ([OpenReview][5], [OpenReview][2], [MDPI][3]).
* Use combination of BM25 + embedding ranking (contextual retrieval) for accurate scope-based retrieval ([TECHCOMMUNITY.MICROSOFT.COM][6]).
### 3.4 API & Syscalls
* `context_cd(path)`: sets current context pointer.
* `context_get(depth=N)`: retrieves cumulative context from current node up N levels.
* `context_push(path, blob)`: insert new context tied to a path.
* `context_list(path)`: lists available context blobs at that path.
* `context_subscribe(path)`: agent registers to receive updates at a path.
---
## 4. Project Timeline & Milestones
| Phase | Duration | Deliverables |
| ---------------------------------------------- | -------- | -------------------------------------------------------- |
| **Phase 0: Research & Design** | 2weeks | Literature review doc, architecture draft |
| **Phase 1: Prototype FS layer** | 4weeks | Minimal FUSEbased path→context mapping, CLI demo |
| **Phase 2: Backend DB & storage** | 4weeks | Context blob storage, path linkage, versioning |
| **Phase 3: Embedding & retrieval integration** | 3weeks | Embeddings + BM25 hybrid ranking for context relevance |
| **Phase 4: API/Syscall layer scripting** | 3weeks | Python (or AIOS) service exposing navigation + push APIs |
| **Phase 5: Agent integration & simulation** | 3weeks | Dummy AI agents navigating, querying, publishing context |
| **Phase 6: Evaluation & refinement** | 2weeks | Usability, latency, retrieval relevance metrics |
| **Phase 7: Write-up & publication** | 2weeks | Report, possible poster/paper submission |
---
## 5. Risks & Alternatives
* **Semantic vs hierarchical mismatch**: Flat tag systems (e.g. Tagsistant) offer semantic tagging but lack path-based inheritance ([research.ijcaonline.org][7], [OpenReview][2], [Wikipedia][1], [arXiv][8], [Anthropic][9], [OpenReview][5], [Wikipedia][10]).
* **Context explosion**: many small blobs flooding the DB—mitigate via summarization/folding.
* **Performance tradeoffs**: FS lookups must stay acceptable; versioned graph storage might slow down. Consider caching snapshots at each node.
---
## 6. PeerReviewed References
* David Gifford etal., *Semantic file systems*, ACM Operating Systems Review (1991) ([Wikipedia][1])
* ICLR 2025: *From Commands to Prompts: LLM-based Semantic File System for AIOS* (LSFS) ([OpenReview][2])
* Xiaoyu etal., *Transformer-based path sequence modeling for filepath anomaly detection* ([MDPI][3])
* Tao Gu etal., *Ontologybased Context Model in Intelligent Environments* (SOCAM) ([arXiv][4])
---
## 7. Next Steps
* Review cited literature, build an annotated bibliography.
* Choose backend stack (SQLite vs graph DB) and test embedding pipeline.
* Begin Phase1: implementing minimal contextaware FS mock.
---
Let me know if youd like me to flesh out a proofofconcept scaffold (for example, in Python + SQLite + FUSE), or write a full proposal for funding or conference submission!
[1]: https://en.wikipedia.org/wiki/Semantic_file_system?utm_source=chatgpt.com "Semantic file system"
[2]: https://openreview.net/forum?id=2G021ZqUEZ&utm_source=chatgpt.com "From Commands to Prompts: LLM-based Semantic File System for AIOS"
[3]: https://www.mdpi.com/2079-8954/13/6/403?utm_source=chatgpt.com "Effective Context-Aware File Path Embeddings for Anomaly Detection - MDPI"
[4]: https://arxiv.org/abs/2003.05055?utm_source=chatgpt.com "An Ontology-based Context Model in Intelligent Environments"
[5]: https://openreview.net/pdf?id=2G021ZqUEZ&utm_source=chatgpt.com "F COMMANDS TO PROMPTS LLM- S FILE SYSTEM FOR AIOS - OpenReview"
[6]: https://techcommunity.microsoft.com/blog/azure-ai-services-blog/building-a-contextual-retrieval-system-for-improving-rag-accuracy/4271924?utm_source=chatgpt.com "Building a Contextual Retrieval System for Improving RAG Accuracy"
[7]: https://research.ijcaonline.org/volume121/number1/pxc3904433.pdf?utm_source=chatgpt.com "A Survey on Different File System Approach - research.ijcaonline.org"
[8]: https://arxiv.org/abs/1909.10123?utm_source=chatgpt.com "SplitFS: Reducing Software Overhead in File Systems for Persistent Memory"
[9]: https://www.anthropic.com/news/contextual-retrieval?utm_source=chatgpt.com "Introducing Contextual Retrieval \ Anthropic"
[10]: https://en.wikipedia.org/wiki/Tagsistant?utm_source=chatgpt.com "Tagsistant"