embeddenator-fs
A holographic filesystem implementation using Vector Symbolic Architecture (VSA) for encoding entire directory trees into high-dimensional sparse vectors with bit-perfect reconstruction guarantees.
Independent component extracted from the Embeddenator monolithic repository. Part of the Embeddenator workspace.
Repository: https://github.com/tzervas/embeddenator-fs
Status: Alpha - Core functionality complete, API may change. Suitable for experimental use and research.
What is EmbrFS?
EmbrFS (Embeddenator Filesystem) is a novel approach to filesystem storage that encodes files and directories as holographic "engrams" - bundled high-dimensional sparse vectors. Unlike traditional filesystems that store files as sequential blocks, EmbrFS distributes file information across a holographic representation, enabling:
- Bit-perfect reconstruction through algebraic correction layers
- Holographic properties - complete information distributed across the representation
- Hierarchical scalability - sub-engrams for bounded memory usage
- Read-only FUSE mounting - kernel-level filesystem integration
- Incremental operations - add/modify/remove files without full rebuilds
Realistic Scope & Limitations
What EmbrFS IS:
- A research-grade holographic encoding system for filesystems
- A read-only FUSE filesystem for browsing encoded directory trees
- An experimental VSA application demonstrating bit-perfect reconstruction
- A foundation for exploring holographic storage and retrieval patterns
What EmbrFS IS NOT:
- A replacement for production filesystems (ext4, btrfs, ZFS)
- A compression tool (overhead varies, typically 0-5% for correction layer)
- A writable filesystem (holographic engrams are immutable snapshots)
- A distributed storage system (single-machine only)
Current Limitations:
- Read-only FUSE operations (by design - engrams are immutable)
- No symbolic link support (returns ENOSYS)
- No extended attributes
- No write/modify operations through FUSE (modifications require re-encoding)
- Alpha API stability (breaking changes possible)
Features
Core Capabilities
- Holographic Encoding: Encodes files into SparseVec representations using VSA
- Bit-Perfect Reconstruction: 100% accurate file recovery via correction layer
- Primary SparseVec encoding
- Immediate verification on encode
- Algebraic correction store for exact differences
- Hierarchical Architecture: Scales to large filesystems via sub-engram trees
- Incremental Operations:
add_files- Add new files without full rebuildmodify_files- Update existing filesremove_files- Soft-delete filescompact- Hard rebuild to reclaim space
- FUSE Integration (optional
fusefeature):- Mount engrams as read-only filesystems
- Standard Unix tools (ls, cat, grep) work transparently
- Kernel-level integration via fuser library
- Correction Strategies:
BitFlips- Sparse bit-level correctionsTritFlips- Ternary value correctionsBlockReplace- Contiguous region replacementVerbatim- Full data storage (fallback)
Architecture Highlights
User Tools (ls, cat, etc.)
↓
FUSE Kernel Interface (fuse_shim.rs)
↓
Holographic Filesystem Core (embrfs.rs)
↓
Correction Layer (correction.rs)
↓
VSA Primitives (embeddenator-vsa)
File Structure:
embrfs.rs- Core filesystem logic (1,884 lines)fuse_shim.rs- FUSE integration (1,263 lines)correction.rs- Bit-perfect reconstruction (531 lines)
Test Coverage:
- 20 tests covering core functionality
- All tests passing in CI
- Unit tests for correction logic
- Integration tests for FUSE operations
Installation
Add to your Cargo.toml:
[]
= "0.20.0-alpha.3"
# Enable FUSE mounting support (Linux only)
= { = "0.20.0-alpha.3", = ["fuse"] }
Usage
Basic API
use ;
use Path;
// Create a new holographic filesystem
let mut fs = new;
// Ingest a directory tree
let options = default;
fs.ingest_directory?;
// Save the engram
fs.save?;
// Later: Load and extract
let fs = load?;
fs.extract_all?;
FUSE Mounting (Linux Only)
use ;
// Load an engram
let fs = load?;
// Mount as read-only filesystem
let mountpoint = new;
mount_embrfs?;
// Now access files normally:
// $ ls /mnt/embrfs
// $ cat /mnt/embrfs/file.txt
// $ grep "pattern" /mnt/embrfs/**/*.log
FUSE Limitations:
- Read-only operations only (writes return EROFS)
- No symbolic links (readlink returns ENOSYS)
- Simplified permission model
- Requires root or user_allow_other in /etc/fuse.conf
Command-Line Interface
The embeddenator-fs CLI provides convenient access to all filesystem operations:
Installation
# From source
# Or build locally
CLI Commands
Ingest files into engram
Extract files from engram
Query for similar files
List files in engram
Show engram information
Verify engram integrity
Incremental updates
# Add a new file
# Remove a file (soft delete)
# Modify an existing file
# Compact engram (hard rebuild)
CLI Features
- User-friendly progress indicators
- Verbose mode for detailed output
- Helpful error messages
- Performance statistics
- Bit-perfect verification
- Incremental operations
Examples
The examples/ directory contains runnable examples:
Basic Ingestion
Demonstrates simple file ingestion and extraction with verification.
Query Files
Shows how to query for similar files in an engram using VSA cosine similarity.
Incremental Updates
Demonstrates add/modify/remove operations and compaction.
Batch Processing
Tests performance with larger numbers of files (100+ files, 4KB each).
Benchmarks
Performance benchmarks using Criterion:
Running Benchmarks
# Run all benchmarks
# Run specific benchmark
Benchmark Coverage
- Ingestion benchmarks: Single files (1KB-10MB), multiple small files (10-100), large files, nested directories
- Query benchmarks: Codebook queries, path-sweep queries, scaling with file count, index build time
- Incremental benchmarks: Add file, remove file, modify file, compact, sequential adds
Expected Performance
- Ingestion: 20-50 MB/s (debug), 50-100+ MB/s (release)
- Extraction: 50-100 MB/s (debug), 100-200+ MB/s (release)
- Queries: Sub-millisecond for small codebooks, milliseconds for large
- Incremental adds: ~1-5ms per file
- Compaction: Similar to full re-ingestion
Hierarchical Sub-Engrams
For large filesystems, use hierarchical encoding:
let options = IngestOptions ;
fs.ingest_directory?;
Performance Characteristics
Encoding Performance:
- Time: O(N) where N = total file size
- Space: O(chunks) + correction overhead (typically 0-5%)
- Chunk size: 4KB default (configurable)
Retrieval Performance:
- Beam-limited hierarchical search: O(beam_width × max_depth)
- LRU caching reduces repeated disk I/O
- Inverted index enables sub-linear candidate generation
Correction Overhead:
- Observed: 0-5% typical for structured data
- Varies with data entropy and VSA dimensionality
- Statistics tracked per-engram
Development
# Clone and build
# Run tests
# Run tests with FUSE support
# Build documentation
# Check code quality
For cross-component development with other Embeddenator crates:
# Add to workspace Cargo.toml
[]
= { = "../embeddenator-vsa" }
= { = "../embeddenator-retrieval" }
Documentation
- CHANGELOG.md - Version history and migration guides
- CONTRIBUTING.md - Development guidelines and PR process
- docs/ARCHITECTURE.md - Detailed system architecture
- docs/FUSE.md - FUSE implementation details
- docs/CORRECTION.md - Bit-perfect reconstruction system
- API Documentation - Complete rustdoc API reference
Related Projects
- embeddenator - Parent monorepo with CLI and additional tools
- embeddenator-vsa - Core VSA primitives
- embeddenator-retrieval - Retrieval and indexing
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is in active development. Expect API changes in minor versions until 1.0.
License
MIT - See LICENSE file for details.
Copyright (c) 2024-2026 Tyler Zervas