Files
HCFS/PROJECT_PLAN.md
Claude Code 8f19eaab25 Initial HCFS project scaffold
🚀 Generated with Claude Code

- Project plan and architecture documentation
- Python package structure with core modules
- API design and basic usage examples
- Development environment configuration
- Literature review and research foundation

Ready for Phase 1 implementation.

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-29 12:13:16 +10:00

130 lines
7.3 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
# PROJECT\_PLAN.md
## 📘 Title
**ContextAware Hierarchical Context File System (HCFS)**: Unifying file system paths with context blobs for agentic AI cognition
---
## 1. Research Motivation & Literature Review 🧠
* **Semantic and contextaware file systems**: Gifford etal. (1991) proposed early semantic file systems using directory paths as semantic queries ([Wikipedia][1]). Later work explored tagbased and ontologybased systems for richer metadata and context-aware retrieval ([Wikipedia][1]).
* **LLMdriven semantic FS (LSFS)**: The recent ICLR 2025 LSFS proposes integrating vector DBs and semantic indexing into a filesystem that supports prompt-driven file operations and semantic rollback ([OpenReview][2]).
* **Path-structure embeddings**: Recent Transformer-based work shows file paths can be modeled as sequences for semantic anomaly detection—capturing hierarchy and semantics in embeddings ([MDPI][3]).
* **Context modeling frameworks**: Ontology-driven context models (e.g. OWL/SOCAM) support representing, reasoning about, and sharing context hierarchically ([arXiv][4]).
Your HCFS merges these prior insights into a hybrid: directory navigation = query scope, backed by semantic context blobs in a DB, enabling agentic systems to zoom in/out contextually.
---
## 2. Objectives & Scope
1. Design a **virtual filesystem layer** that maps hierarchical paths to context blobs.
2. Build a **context storage system** (DB) to hold context units, versioned and indexed.
3. Define **APIs and syscalls** for agents to:
* navigate context scope (`cd`style),
* request context retrieval,
* push new context,
* merge or inherit context across levels.
4. Enable **decentralized context sharing**: agents can publish updates at path-nodes; peer agents subscribe by treepaths.
5. Prototype on a controlled dataset / toy project tree to validate:
* latency,
* correct retrieval,
* hierarchical inheritance semantics.
---
## 3. System Architecture Overview
### 3.1 Virtual Filesystem Layer (e.g. FUSE or AIOS integration)
* Presents standard POSIX (or AIOSstyle) tree structure.
* Each directory or file node has metadata pointers into contextblob IDs.
* Traversal (e.g., `ls`, `cd`) triggers context lookup for that path.
### 3.2 Context Database Backend
* Two possible designs:
* **Relational/SQLite + versioned tables**: simple, transactional, supports hierarchical inheritance via path parent pointers.
* **Graph DB (e.g., Neo4j)**: ideal for multi-parent contexts, symlink-like context inheritance.
* Context blobs include:
* blob ID,
* path(s) bound,
* timestamp/version, author/agent,
* embedding or semantic tags,
* content or summary.
### 3.3 Indexing & Embeddings
* Generate embeddings of context blobs for semantic similarity retrieval (e.g. for context folding) ([OpenReview][5], [OpenReview][2], [MDPI][3]).
* Use combination of BM25 + embedding ranking (contextual retrieval) for accurate scope-based retrieval ([TECHCOMMUNITY.MICROSOFT.COM][6]).
### 3.4 API & Syscalls
* `context_cd(path)`: sets current context pointer.
* `context_get(depth=N)`: retrieves cumulative context from current node up N levels.
* `context_push(path, blob)`: insert new context tied to a path.
* `context_list(path)`: lists available context blobs at that path.
* `context_subscribe(path)`: agent registers to receive updates at a path.
---
## 4. Project Timeline & Milestones
| Phase | Duration | Deliverables |
| ---------------------------------------------- | -------- | -------------------------------------------------------- |
| **Phase 0: Research & Design** | 2weeks | Literature review doc, architecture draft |
| **Phase 1: Prototype FS layer** | 4weeks | Minimal FUSEbased path→context mapping, CLI demo |
| **Phase 2: Backend DB & storage** | 4weeks | Context blob storage, path linkage, versioning |
| **Phase 3: Embedding & retrieval integration** | 3weeks | Embeddings + BM25 hybrid ranking for context relevance |
| **Phase 4: API/Syscall layer scripting** | 3weeks | Python (or AIOS) service exposing navigation + push APIs |
| **Phase 5: Agent integration & simulation** | 3weeks | Dummy AI agents navigating, querying, publishing context |
| **Phase 6: Evaluation & refinement** | 2weeks | Usability, latency, retrieval relevance metrics |
| **Phase 7: Write-up & publication** | 2weeks | Report, possible poster/paper submission |
---
## 5. Risks & Alternatives
* **Semantic vs hierarchical mismatch**: Flat tag systems (e.g. Tagsistant) offer semantic tagging but lack path-based inheritance ([research.ijcaonline.org][7], [OpenReview][2], [Wikipedia][1], [arXiv][8], [Anthropic][9], [OpenReview][5], [Wikipedia][10]).
* **Context explosion**: many small blobs flooding the DB—mitigate via summarization/folding.
* **Performance tradeoffs**: FS lookups must stay acceptable; versioned graph storage might slow down. Consider caching snapshots at each node.
---
## 6. PeerReviewed References
* David Gifford etal., *Semantic file systems*, ACM Operating Systems Review (1991) ([Wikipedia][1])
* ICLR 2025: *From Commands to Prompts: LLM-based Semantic File System for AIOS* (LSFS) ([OpenReview][2])
* Xiaoyu etal., *Transformer-based path sequence modeling for filepath anomaly detection* ([MDPI][3])
* Tao Gu etal., *Ontologybased Context Model in Intelligent Environments* (SOCAM) ([arXiv][4])
---
## 7. Next Steps
* Review cited literature, build an annotated bibliography.
* Choose backend stack (SQLite vs graph DB) and test embedding pipeline.
* Begin Phase1: implementing minimal contextaware FS mock.
---
Let me know if youd like me to flesh out a proofofconcept scaffold (for example, in Python + SQLite + FUSE), or write a full proposal for funding or conference submission!
[1]: https://en.wikipedia.org/wiki/Semantic_file_system?utm_source=chatgpt.com "Semantic file system"
[2]: https://openreview.net/forum?id=2G021ZqUEZ&utm_source=chatgpt.com "From Commands to Prompts: LLM-based Semantic File System for AIOS"
[3]: https://www.mdpi.com/2079-8954/13/6/403?utm_source=chatgpt.com "Effective Context-Aware File Path Embeddings for Anomaly Detection - MDPI"
[4]: https://arxiv.org/abs/2003.05055?utm_source=chatgpt.com "An Ontology-based Context Model in Intelligent Environments"
[5]: https://openreview.net/pdf?id=2G021ZqUEZ&utm_source=chatgpt.com "F COMMANDS TO PROMPTS LLM- S FILE SYSTEM FOR AIOS - OpenReview"
[6]: https://techcommunity.microsoft.com/blog/azure-ai-services-blog/building-a-contextual-retrieval-system-for-improving-rag-accuracy/4271924?utm_source=chatgpt.com "Building a Contextual Retrieval System for Improving RAG Accuracy"
[7]: https://research.ijcaonline.org/volume121/number1/pxc3904433.pdf?utm_source=chatgpt.com "A Survey on Different File System Approach - research.ijcaonline.org"
[8]: https://arxiv.org/abs/1909.10123?utm_source=chatgpt.com "SplitFS: Reducing Software Overhead in File Systems for Persistent Memory"
[9]: https://www.anthropic.com/news/contextual-retrieval?utm_source=chatgpt.com "Introducing Contextual Retrieval \ Anthropic"
[10]: https://en.wikipedia.org/wiki/Tagsistant?utm_source=chatgpt.com "Tagsistant"