Files
HCFS/PROJECT_PLAN.md
Claude Code 8f19eaab25 Initial HCFS project scaffold
🚀 Generated with Claude Code

- Project plan and architecture documentation
- Python package structure with core modules
- API design and basic usage examples
- Development environment configuration
- Literature review and research foundation

Ready for Phase 1 implementation.

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-29 12:13:16 +10:00

7.3 KiB
Raw Blame History

# PROJECT_PLAN.md ## 📘 Title ContextAware Hierarchical Context File System (HCFS): Unifying file system paths with context blobs for agentic AI cognition

1. Research Motivation & Literature Review 🧠

  • Semantic and contextaware file systems: Gifford etal. (1991) proposed early semantic file systems using directory paths as semantic queries (Wikipedia). Later work explored tagbased and ontologybased systems for richer metadata and context-aware retrieval (Wikipedia).
  • LLMdriven semantic FS (LSFS): The recent ICLR 2025 LSFS proposes integrating vector DBs and semantic indexing into a filesystem that supports prompt-driven file operations and semantic rollback (OpenReview).
  • Path-structure embeddings: Recent Transformer-based work shows file paths can be modeled as sequences for semantic anomaly detection—capturing hierarchy and semantics in embeddings (MDPI).
  • Context modeling frameworks: Ontology-driven context models (e.g. OWL/SOCAM) support representing, reasoning about, and sharing context hierarchically (arXiv).

Your HCFS merges these prior insights into a hybrid: directory navigation = query scope, backed by semantic context blobs in a DB, enabling agentic systems to zoom in/out contextually.


2. Objectives & Scope

  1. Design a virtual filesystem layer that maps hierarchical paths to context blobs.

  2. Build a context storage system (DB) to hold context units, versioned and indexed.

  3. Define APIs and syscalls for agents to:

    • navigate context scope (cdstyle),
    • request context retrieval,
    • push new context,
    • merge or inherit context across levels.
  4. Enable decentralized context sharing: agents can publish updates at path-nodes; peer agents subscribe by treepaths.

  5. Prototype on a controlled dataset / toy project tree to validate:

    • latency,
    • correct retrieval,
    • hierarchical inheritance semantics.

3. System Architecture Overview

3.1 Virtual Filesystem Layer (e.g. FUSE or AIOS integration)

  • Presents standard POSIX (or AIOSstyle) tree structure.
  • Each directory or file node has metadata pointers into contextblob IDs.
  • Traversal (e.g., ls, cd) triggers context lookup for that path.

3.2 Context Database Backend

  • Two possible designs:

    • Relational/SQLite + versioned tables: simple, transactional, supports hierarchical inheritance via path parent pointers.
    • Graph DB (e.g., Neo4j): ideal for multi-parent contexts, symlink-like context inheritance.
  • Context blobs include:

    • blob ID,
    • path(s) bound,
    • timestamp/version, author/agent,
    • embedding or semantic tags,
    • content or summary.

3.3 Indexing & Embeddings

  • Generate embeddings of context blobs for semantic similarity retrieval (e.g. for context folding) (OpenReview, OpenReview, MDPI).
  • Use combination of BM25 + embedding ranking (contextual retrieval) for accurate scope-based retrieval (TECHCOMMUNITY.MICROSOFT.COM).

3.4 API & Syscalls

  • context_cd(path): sets current context pointer.
  • context_get(depth=N): retrieves cumulative context from current node up N levels.
  • context_push(path, blob): insert new context tied to a path.
  • context_list(path): lists available context blobs at that path.
  • context_subscribe(path): agent registers to receive updates at a path.

4. Project Timeline & Milestones

Phase Duration Deliverables
Phase 0: Research & Design 2weeks Literature review doc, architecture draft
Phase 1: Prototype FS layer 4weeks Minimal FUSEbased path→context mapping, CLI demo
Phase 2: Backend DB & storage 4weeks Context blob storage, path linkage, versioning
Phase 3: Embedding & retrieval integration 3weeks Embeddings + BM25 hybrid ranking for context relevance
Phase 4: API/Syscall layer scripting 3weeks Python (or AIOS) service exposing navigation + push APIs
Phase 5: Agent integration & simulation 3weeks Dummy AI agents navigating, querying, publishing context
Phase 6: Evaluation & refinement 2weeks Usability, latency, retrieval relevance metrics
Phase 7: Write-up & publication 2weeks Report, possible poster/paper submission

5. Risks & Alternatives

  • Semantic vs hierarchical mismatch: Flat tag systems (e.g. Tagsistant) offer semantic tagging but lack path-based inheritance (research.ijcaonline.org, OpenReview, Wikipedia, arXiv, Anthropic, OpenReview, Wikipedia).
  • Context explosion: many small blobs flooding the DB—mitigate via summarization/folding.
  • Performance tradeoffs: FS lookups must stay acceptable; versioned graph storage might slow down. Consider caching snapshots at each node.

6. PeerReviewed References

  • David Gifford etal., Semantic file systems, ACM Operating Systems Review (1991) (Wikipedia)
  • ICLR 2025: From Commands to Prompts: LLM-based Semantic File System for AIOS (LSFS) (OpenReview)
  • Xiaoyu etal., Transformer-based path sequence modeling for filepath anomaly detection (MDPI)
  • Tao Gu etal., Ontologybased Context Model in Intelligent Environments (SOCAM) (arXiv)

7. Next Steps

  • Review cited literature, build an annotated bibliography.
  • Choose backend stack (SQLite vs graph DB) and test embedding pipeline.
  • Begin Phase1: implementing minimal contextaware FS mock.

Let me know if youd like me to flesh out a proofofconcept scaffold (for example, in Python + SQLite + FUSE), or write a full proposal for funding or conference submission!