libnoa 0.1.0

AI-native distributed version control system with per-agent workspace isolation, JSONL append-only logs, snapshot-based history, and full git protocol compatibility
Documentation
# Object Storage Design

## Overview

noa uses a content-addressed storage model inspired by Git but with a
pluggable backend architecture. Objects are addressed by SHA-256 hash
and stored as opaque blobs.

## Object Types

### Blob

Raw file content. Identified by `SHA256(content)`.

```rust
pub struct BlobId(pub String); // hex-encoded SHA-256
```

No delta compression. Each unique content produces exactly one blob.
Duplicate content is automatically deduplicated by hash.

### Tree

Directory listing. Maps paths to child entries (blobs or subtrees).

```rust
pub struct TreeEntry {
    pub name: String,
    pub kind: TreeEntryKind, // Blob or Tree
    pub hash: String,        // SHA-256 of child
}

pub struct TreeId(pub String); // SHA-256(msgpack(entries))
```

Trees are serialized as MessagePack for compactness and fast deserialization.

## Trait Definition

```rust
#[async_trait]
pub trait ObjectStore: Send + Sync {
    async fn put_blob(&self, data: &[u8]) -> Result<BlobId>;
    async fn get_blob(&self, id: &BlobId) -> Result<Vec<u8>>;
    async fn put_tree(&self, entries: Vec<TreeEntry>) -> Result<TreeId>;
    async fn get_tree(&self, id: &TreeId) -> Result<Vec<TreeEntry>>;
}
```

## Backends

### RedbObjectStore (Local)

Uses [redb](https://github.com/cberner/redb) embedded key-value store.

- Two tables: `blobs` (key: hash bytes, value: content bytes) and
  `trees` (key: hash bytes, value: msgpack entries)
- Zero-copy reads via memory-mapped files
- ACID transactions with automatic crash recovery
- Single-writer, multi-reader via MVCC
- No external daemon required

### MinioObjectStore (Remote)

Uses S3-compatible API via `aws-sdk-s3`.

- Path-style addressing: `<bucket>/blobs/<hash>`, `<bucket>/trees/<hash>`
- Supports any S3-compatible backend (MinIO, AWS S3, GCS, etc.)
- Automatic retries with exponential backoff
- Suitable for distributed deployments

## Design Decisions

### Why SHA-256 instead of SHA-1?

Git uses SHA-1, which is cryptographically broken (SHAttered attack, 2017).
SHA-256 is collision-resistant and widely available.

### Why no delta compression?

1. **Simplicity**: Delta compression (Git's pack files) adds significant
   complexity (sliding window matching, thin packs, delta chains).
2. **Write performance**: Direct blob writes are O(1). Delta compression
   requires reading existing objects.
3. **AI agent workload**: Agents frequently regenerate entire files.
   Old versions are ephemeral — delta chains would be short and numerous.
4. **Backend offloading**: S3/MinIO handle deduplication at the storage layer.

### Why MessagePack for trees?

- 30-50% smaller than JSON for binary-heavy data
- Schema-flexible (no need for protobuf definitions)
- Rust ecosystem support via `rmp-serde`
- Fast deserialization

### Why redb over SQLite?

- **Type safety**: redb uses Rust generics for table definitions
- **Performance**: redb is optimized for Rust workloads (zero-copy reads)
- **Simplicity**: Single dependency, no C library linkage
- **Crash safety**: redb's write-ahead log is simpler than SQLite's WAL mode

Trade-off: redb has a smaller community and fewer tooling options than SQLite.
For noa's use case (embedded binary storage), the trade-off is favorable.