agentvfs 0.1.6

Virtual filesystem CLI backed by embedded databases for AI agents
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
# Data Model & Schema

This document describes the logical data model used by avfs. The model is backend-agnostic, with specific implementations for each storage backend.

## Logical Data Model

avfs organizes data into **collections** (similar to tables or trees). Each collection stores key-value pairs with specific semantics.

### Collections Overview

| Collection | Purpose | Key Format | Value Type |
|------------|---------|------------|------------|
| `files` | File/directory metadata | file_id (u64) | FileEntry |
| `paths` | Path to ID mapping | path (string) | file_id (u64) |
| `contents` | Content blobs (CAS) | hash (32 bytes) | ContentBlob |
| `versions` | Version history | file_id + version_num | VersionEntry |
| `tags` | Tag definitions | tag_id (u64) | TagInfo |
| `tag_names` | Tag name lookup | tag_name (string) | tag_id (u64) |
| `file_tags` | File-tag associations | file_id + tag_id | timestamp |
| `file_meta` | Custom metadata | file_id + key | value (string) |
| `settings` | Vault configuration | key (string) | value (string) |

## Data Structures

### FileEntry

Represents a file or directory in the virtual filesystem.

```rust
struct FileEntry {
    id: u64,
    parent_id: Option<u64>,     // None for root
    name: String,
    file_type: FileType,        // File or Directory
    content_hash: Option<Hash>, // None for directories
    size: u64,
    created_at: Timestamp,
    modified_at: Timestamp,
}

enum FileType {
    File,
    Directory,
}
```

**Serialized format** (bincode/MessagePack):
```
[id: 8 bytes][parent_id: 9 bytes][name: var][type: 1 byte]
[hash: 33 bytes][size: 8 bytes][created: 8 bytes][modified: 8 bytes]
```

### ContentBlob

Stores actual file content using content-addressable storage.

```rust
struct ContentBlob {
    hash: [u8; 32],      // SHA-256
    data: Vec<u8>,       // Raw content
    size: u64,
    ref_count: u32,      // Reference counting for GC
    created_at: Timestamp,
}
```

### VersionEntry

Records a point-in-time snapshot of a file.

```rust
struct VersionEntry {
    file_id: u64,
    version_num: u32,
    content_hash: Hash,
    size: u64,
    created_at: Timestamp,
}
```

**Key format**: `file_id (8 bytes) + version_num (4 bytes)`

### TagInfo

Defines a tag that can be applied to files.

```rust
struct TagInfo {
    id: u64,
    name: String,
    color: Option<String>,  // Hex color code
    created_at: Timestamp,
}
```

## Key Encoding

Keys are encoded consistently across backends:

### Numeric Keys
- u64 values are encoded as big-endian bytes for proper ordering
- Example: `1000``[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0xE8]`

### String Keys
- UTF-8 encoded
- Paths use forward slashes, no trailing slash (except root)

### Composite Keys
- Components are length-prefixed or use fixed sizes
- Example file_tag key: `file_id (8 bytes) + tag_id (8 bytes)`

```rust
fn encode_file_tag_key(file_id: u64, tag_id: u64) -> [u8; 16] {
    let mut key = [0u8; 16];
    key[0..8].copy_from_slice(&file_id.to_be_bytes());
    key[8..16].copy_from_slice(&tag_id.to_be_bytes());
    key
}
```

## Value Serialization

Values are serialized using a compact binary format:

### Primary: bincode
- Fast, compact, Rust-native
- Used for internal structures

### Alternative: MessagePack
- More portable
- Better for potential cross-language access

### Configuration
```toml
[storage]
serialization = "bincode"  # or "msgpack", "cbor"
```

## Backend-Specific Implementations

### SQLite Schema

When using SQLite, collections map to tables:

```sql
-- File metadata
CREATE TABLE files (
    id              INTEGER PRIMARY KEY,
    parent_id       INTEGER REFERENCES files(id) ON DELETE CASCADE,
    name            TEXT NOT NULL,
    file_type       INTEGER NOT NULL,  -- 0=file, 1=directory
    content_hash    BLOB,
    size            INTEGER NOT NULL DEFAULT 0,
    created_at      INTEGER NOT NULL,  -- Unix timestamp
    modified_at     INTEGER NOT NULL,
    UNIQUE(parent_id, name)
);

CREATE INDEX idx_files_parent ON files(parent_id);
CREATE INDEX idx_files_hash ON files(content_hash);

-- Path lookup (denormalized for performance)
CREATE TABLE paths (
    path    TEXT PRIMARY KEY,
    file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE
);

-- Content-addressable storage
CREATE TABLE contents (
    hash        BLOB PRIMARY KEY,  -- 32 bytes SHA-256
    data        BLOB NOT NULL,
    size        INTEGER NOT NULL,
    ref_count   INTEGER NOT NULL DEFAULT 1,
    created_at  INTEGER NOT NULL
);

-- Version history
CREATE TABLE versions (
    file_id     INTEGER NOT NULL,
    version_num INTEGER NOT NULL,
    content_hash BLOB NOT NULL,
    size        INTEGER NOT NULL,
    created_at  INTEGER NOT NULL,
    PRIMARY KEY (file_id, version_num)
);

CREATE INDEX idx_versions_created ON versions(created_at);

-- Tags
CREATE TABLE tags (
    id          INTEGER PRIMARY KEY,
    name        TEXT NOT NULL UNIQUE,
    color       TEXT,
    created_at  INTEGER NOT NULL
);

CREATE TABLE file_tags (
    file_id     INTEGER NOT NULL,
    tag_id      INTEGER NOT NULL,
    created_at  INTEGER NOT NULL,
    PRIMARY KEY (file_id, tag_id)
);

CREATE INDEX idx_file_tags_tag ON file_tags(tag_id);

-- Custom metadata
CREATE TABLE file_meta (
    file_id     INTEGER NOT NULL,
    key         TEXT NOT NULL,
    value       TEXT NOT NULL,
    PRIMARY KEY (file_id, key)
);

-- Full-text search (FTS5)
CREATE VIRTUAL TABLE fts_content USING fts5(
    path,
    content,
    content_rowid='file_id'
);

-- Vault settings
CREATE TABLE settings (
    key     TEXT PRIMARY KEY,
    value   TEXT NOT NULL
);
```

### Sled/LMDB/RocksDB Schema

For key-value backends, collections map to separate trees/databases:

```
Tree: files
  Key: u64 (big-endian)
  Value: bincode(FileEntry)

Tree: paths
  Key: UTF-8 string
  Value: u64 (big-endian)

Tree: contents
  Key: 32 bytes (SHA-256)
  Value: raw bytes (content data)

Tree: contents_meta
  Key: 32 bytes (SHA-256)
  Value: bincode(ContentMeta { size, ref_count, created_at })

Tree: versions
  Key: file_id (8 bytes) + version_num (4 bytes)
  Value: bincode(VersionEntry)

Tree: tags
  Key: u64 (big-endian)
  Value: bincode(TagInfo)

Tree: tag_names
  Key: UTF-8 string
  Value: u64 (big-endian)

Tree: file_tags
  Key: file_id (8 bytes) + tag_id (8 bytes)
  Value: timestamp (8 bytes)

Tree: file_meta
  Key: file_id (8 bytes) + key_len (2 bytes) + key (UTF-8)
  Value: UTF-8 string

Tree: settings
  Key: UTF-8 string
  Value: UTF-8 string

Tree: _meta
  Key: "version" | "backend" | "created_at"
  Value: varies
```

## Common Queries

### List directory contents

```rust
// Find all files where parent_id = directory_id
fn list_directory(storage: &dyn StorageBackend, dir_id: u64) -> Result<Vec<FileEntry>> {
    // Scan files collection, filter by parent_id
    // (SQLite can use index, KV stores scan and filter)
}
```

**SQLite:**
```sql
SELECT * FROM files WHERE parent_id = ? ORDER BY file_type DESC, name;
```

### Get file by path

```rust
fn get_file_by_path(storage: &dyn StorageBackend, path: &str) -> Result<Option<FileEntry>> {
    // 1. Look up file_id in paths collection
    // 2. Get FileEntry from files collection
}
```

**SQLite:**
```sql
SELECT f.* FROM files f
JOIN paths p ON f.id = p.file_id
WHERE p.path = ?;
```

### Get file content

```rust
fn get_content(storage: &dyn StorageBackend, file_id: u64) -> Result<Vec<u8>> {
    // 1. Get FileEntry, extract content_hash
    // 2. Get content from contents collection
}
```

### Version history

```rust
fn get_versions(storage: &dyn StorageBackend, file_id: u64) -> Result<Vec<VersionEntry>> {
    // Scan versions with prefix = file_id bytes
}
```

**SQLite:**
```sql
SELECT * FROM versions WHERE file_id = ? ORDER BY version_num DESC;
```

### Files by tag

```rust
fn files_with_tag(storage: &dyn StorageBackend, tag_name: &str) -> Result<Vec<FileEntry>> {
    // 1. Get tag_id from tag_names
    // 2. Scan file_tags with tag_id suffix
    // 3. Get FileEntry for each file_id
}
```

**SQLite:**
```sql
SELECT f.* FROM files f
JOIN file_tags ft ON f.id = ft.file_id
JOIN tags t ON ft.tag_id = t.id
WHERE t.name = ?;
```

## Full-Text Search

### SQLite (FTS5)

Built-in full-text search:

```sql
-- Search
SELECT path, snippet(fts_content, 1, '<b>', '</b>', '...', 32) as snippet
FROM fts_content
WHERE fts_content MATCH ?
ORDER BY rank;

-- Index on insert
INSERT INTO fts_content(rowid, path, content)
VALUES (?, ?, ?);

-- Update on content change
DELETE FROM fts_content WHERE rowid = ?;
INSERT INTO fts_content(rowid, path, content) VALUES (?, ?, ?);
```

### Tantivy (Sled/LMDB/RocksDB)

Separate search index using tantivy:

```rust
// Index structure
let schema = Schema::builder()
    .add_text_field("path", TEXT | STORED)
    .add_text_field("content", TEXT)
    .add_u64_field("file_id", INDEXED | STORED)
    .build();

// Search
let query_parser = QueryParser::for_index(&index, vec![content_field]);
let query = query_parser.parse_query(query_str)?;
let results = searcher.search(&query, &TopDocs::with_limit(limit))?;
```

## ID Generation

### Auto-increment (SQLite)
SQLite handles ID generation via `AUTOINCREMENT`.

### Monotonic IDs (KV stores)
For key-value backends, use a counter in the settings collection:

```rust
fn next_id(storage: &dyn StorageBackend, collection: &str) -> Result<u64> {
    storage.transaction(|txn| {
        let key = format!("_next_id_{}", collection);
        let current: u64 = txn.get("settings", key.as_bytes())?
            .map(|v| u64::from_be_bytes(v.try_into().unwrap()))
            .unwrap_or(0);
        let next = current + 1;
        txn.put("settings", key.as_bytes(), &next.to_be_bytes())?;
        Ok(next)
    })
}
```

## Maintenance Operations

### Garbage Collection

Remove unreferenced content blobs:

```rust
fn garbage_collect(storage: &dyn StorageBackend) -> Result<GcStats> {
    // 1. Scan contents_meta for ref_count = 0
    // 2. Delete from contents and contents_meta
}
```

**SQLite:**
```sql
DELETE FROM contents WHERE ref_count = 0;
```

### Version Pruning

Keep last N versions:

```rust
fn prune_versions(storage: &dyn StorageBackend, keep: u32) -> Result<PruneStats> {
    // For each file_id in versions:
    //   Get version count
    //   If > keep, delete oldest (version_num < max - keep)
    //   Decrement ref_count on deleted content hashes
}
```

**SQLite:**
```sql
DELETE FROM versions
WHERE (file_id, version_num) NOT IN (
    SELECT file_id, version_num FROM versions v2
    WHERE v2.file_id = versions.file_id
    ORDER BY version_num DESC
    LIMIT ?
);
```

### Compaction

Reclaim disk space:

- **SQLite**: `VACUUM`
- **Sled**: Automatic compaction
- **LMDB**: Copy to new database
- **RocksDB**: `compact_range()`

## Migration

When migrating between backends, the migration tool:

1. Opens source with old backend
2. Creates destination with new backend
3. Iterates all collections, copying key-value pairs
4. Rebuilds search index
5. Verifies integrity (checksums)

Data format (bincode) is the same across backends, so no transformation is needed.