Phase 2 build initial

2025-07-30 09:34:16 +10:00
parent 8f19eaab25
commit a6ee31f237
68 changed files with 18055 additions and 3 deletions
--- a/PROJECT_PLAN.md
+++ b/PROJECT_PLAN.md
@@ -115,8 +115,6 @@ Your HCFS merges these prior insights into a hybrid: directory navigation = quer

 ---

-Let me know if you’d like me to flesh out a proof‑of‑concept scaffold (for example, in Python + SQLite + FUSE), or write a full proposal for funding or conference submission!
-
 [1]: https://en.wikipedia.org/wiki/Semantic_file_system?utm_source=chatgpt.com "Semantic file system"
 [2]: https://openreview.net/forum?id=2G021ZqUEZ&utm_source=chatgpt.com "From Commands to Prompts: LLM-based Semantic File System for AIOS"
 [3]: https://www.mdpi.com/2079-8954/13/6/403?utm_source=chatgpt.com "Effective Context-Aware File Path Embeddings for Anomaly Detection - MDPI"
@@ -127,3 +125,134 @@ Let me know if you’d like me to flesh out a proof‑of‑concept scaffold (for
 [8]: https://arxiv.org/abs/1909.10123?utm_source=chatgpt.com "SplitFS: Reducing Software Overhead in File Systems for Persistent Memory"
 [9]: https://www.anthropic.com/news/contextual-retrieval?utm_source=chatgpt.com "Introducing Contextual Retrieval \ Anthropic"
 [10]: https://en.wikipedia.org/wiki/Tagsistant?utm_source=chatgpt.com "Tagsistant"
+
+
+---
+
+# Core Architecture Considerations
+**FS design on top of FUSE** and **DB schema selection with versioning**.
+---
+
+## 🖥️ 1. FS Architecture: FUSE Layer & Path‑Context Mapping
+
+### Why FUSE makes sense
+
+* FUSE (Filesystem in Userspace) provides a widely used, flexible interface for prototyping new FS models without kernel hacking, enabling rapid development of virtual filesystems that you can mount and interact with via standard POSIX tools ([IBM Research][1], [Wikipedia][2]).
+* Performance varies—but optimized designs or alternatives like RFUSE help improve kernel‑userspace communication latency and throughput, making user‑space FS viable even in demanding use cases ([USENIX][3]).
+
+### Path‑to‑Context Mapping Schema
+
+You’d implement a mapping where each path (directory or file) is bound to zero or more **context blob IDs**. Concepts:
+
+* Directory traversal (`cd`, `ls`) triggers path-based context lookups in your backend.
+* File reads (e.g. `readfile(context)`) return the merged or inherited context blob(s) for that node.
+* Inheritance: a context layer at `/a/b/c` implicitly inherits from `/a/b`, `/a`, and `/` context as-needed.
+
+### Caching & Merge Layers
+
+* Cache context snapshots at each directory layer to reduce repeated database hits.
+* Provide configurable merge strategies (union, override, summarization) to maintain efficient context retrieval.
+
+---
+
+## 📦 2. DB Design: Relational vs Graph & Context Versioning
+
+### Relational (e.g. SQLite/PostgreSQL)
+
+* Strong transactional guarantees and simple schema: tables like `Blobs(blob_id, path, version, content, timestamp)` plus a `PathHierarchy(path, parent_path)` table for inheritance.
+* Good for simple single-parent hierarchies and transactional versioning (with version numbers or history tables).
+* But joins across deep path hierarchies can get costly; semantic relationships or multi-parent inheritance are more cumbersome.
+
+### Graph Database (e.g. Neo4j)
+
+* Nodes represent paths and context blobs; edges represent parent-child, semantic relations, and "derived" links.
+* Ideal for multi-parent or symlink-like context inheritance, semantic network traversal, or hierarchy restructuring ([Wikipedia][4]).
+* Enables queries like: “find all context blobs reachable within N hops from path X,” or “retrieve peers with similar context semantics.”
+
+### Hybrid Approaches
+
+* A relational backend augmented with semantic tables or converted into a graph as needed for richer queries ([memgraph.com][5], [link.springer.com][6]).
+* Example: relational for version history and base structure, graph/cloud-based embeddings for semantic relationships.
+
+### Context Versioning
+
+* Must support **hierarchical version control**: each blob should have metadata like `blob_id`, `version_id`, `parent_version`, `agent_id`, `timestamp`.
+* You can implement simple version chains in relational DB or LTS support (e.g. graph edges representing “version-of” relationships).
+* Track changes with immutable blob storage; allow rollbacks or context diffs.
+
+---
+
+## 🔍 Comparison Table
+
+| Feature                  | Relational DB                                               | Graph DB                                      |
+| ------------------------ | ----------------------------------------------------------- | --------------------------------------------- |
+| **Hierarchy resilience** | Works well for strict tree; joins required for multi-parent | Native multi-parent and traversals            |
+| **Performance**          | Fast for simple lookups; may slow with joins                | O(1) traversal for connected queries          |
+| **Versioning**           | Straightforward with version tables; chronology easy        | Version graph edges, easier branching/merging |
+| **Semantic links**       | Requires additional tables or indexes                       | First-class properties/relationships          |
+| **Cost & tooling**       | SQLite heavy-light, well-known                              | Requires graph engine (Neo4j, etc.)           |
+
+---
+
+## 🧠 Integration Architecture
+
+### FS Layer
+
+* Run FUSE-based FS presenting standard directories/files.
+* On `lookup`, FS resolves the path and queries DB for context blobs.
+* On `read`, FS returns merged context string; `write` or `push` maps to `context_push(path, blob_content)` exposing MCP endpoints.
+
+### Backend DB Schema Sketch
+
+**Relational (SQL)**
+
+```sql
+CREATE TABLE path (
+  path TEXT PRIMARY KEY,
+  parent TEXT REFERENCES path(path)
+);
+CREATE TABLE context_blob (
+  blob_id SERIAL PRIMARY KEY,
+  path TEXT REFERENCES path(path),
+  version INT,
+  parent_blob INT REFERENCES context_blob(blob_id),
+  agent TEXT,
+  timestamp TIMESTAMP,
+  content TEXT
+);
+```
+
+**Graph (Property Graph)**
+
+* Node labels:
+
+  * `(:Path {path: "...", last_blob: id})`
+  * `(:Blob {blob_id, version, agent, timestamp, content})`
+* Edges:
+
+  * `(Path)-[:HAS_BLOB]->(Blob)`
+  * `(Blob)-[:PARENT_VERSION]->(Blob)`
+  * `(Path)-[:PARENT_PATH]->(Path)`
+
+---
+
+## 🧩 Summary & Recommendation
+
+* A **FUSE-based FS layer** is well-suited for interface compatibility and rapid prototyping; RFUSE-style frameworks may help with performance if you scale.
+* For backend, if you expect **strict single-parent hierarchical contexts**, relational DB is safe and simple.
+* If you want **multi-parent inheritance, semantic linking, branching, merging**, graph DB offers greater flexibility.
+* Versioning is supported in both: relational via version chains and history tables; graph via version edges.
+* Hybrid: use PostgreSQL with graph extensions or embed a graph layer atop SQL for embeddings and semantic dive queries ([academia.edu][7], [sciencedirect.com][8], [milvus.io][9], [filesystems.org][10]).
+
+---
+
+[1]: https://research.ibm.com/publications/to-fuse-or-not-to-fuse-performance-of-user-space-file-systems?utm_source=chatgpt.com "To fuse or not to fuse: Performance of user-space file systems"
+[2]: https://en.wikipedia.org/wiki/Filesystem_in_Userspace?utm_source=chatgpt.com "Filesystem in Userspace"
+[3]: https://www.usenix.org/system/files/fast24-cho.pdf?utm_source=chatgpt.com "RFUSE: Modernizing Userspace Filesystem Framework through Scalable ..."
+[4]: https://en.wikipedia.org/wiki/Graph_database?utm_source=chatgpt.com "Graph database"
+[5]: https://memgraph.com/docs/ai-ecosystem/graph-rag?utm_source=chatgpt.com "GraphRAG with Memgraph"
+[6]: https://link.springer.com/chapter/10.1007/978-3-031-74701-4_13?utm_source=chatgpt.com "Exploring the Hybrid Approach: Integrating Relational and Graph ..."
+[7]: https://www.academia.edu/13092788/SEMANTIC_BASED_DATA_STORAGE_WITH_NEXT_GENERATION_CATEGORIZER?utm_source=chatgpt.com "SEMANTIC BASED DATA STORAGE WITH NEXT GENERATION CATEGORIZER"
+[8]: https://www.sciencedirect.com/science/article/pii/S1319157822002920?utm_source=chatgpt.com "FUSE based file system for efficient storage and retrieval of ..."
+[9]: https://milvus.io/ai-quick-reference/what-strategies-exist-for-longterm-memory-in-model-context-protocol-mcp?utm_source=chatgpt.com "What strategies exist for long-term memory in Model Context Protocol (MCP)?"
+[10]: https://www.filesystems.org/docs/fuse/bharath-msthesis.pdf?utm_source=chatgpt.com "To FUSE or not to FUSE? Analysis and Performance ... - File System"