syntheca 0.1.0

Content-addressable storage on top of apotheca. Bytes go in, BLAKE3 hash comes out.
Documentation

syntheca

Content-addressable storage. Bytes go in, a BLAKE3 hash comes out; lookups by hash; identical bytes coalesce into a single stored entry.

This crate (syntheca, binary syn) is the Rust reference implementation of the syntheca protocol. It is a thin layer over apotheca: syntheca derives the apotheca name from blake3(bytes), treats apotheca's collision outcome as benign dedup when the bytes match, and adds an optional verify-on-read that rehashes under BLAKE3 (in addition to apotheca's mandatory SHA-256 verification). Phase 1: single hash function (BLAKE3, fixed), single apotheca pool, three operations (put, get, stat).

Install

The binary:

cargo install syntheca

The library:

[dependencies]
syntheca = "0.1"

CLI

syn exposes the three operations one-for-one. The default pool root is $HOME/.syntheca/; override with --pool <dir>.

syn put <path>          # store the file's bytes; prints the hash to stdout
syn put -               # store stdin; prints the hash to stdout
syn get <hash>          # bytes to stdout, verified
syn stat <hash>         # size and sha256 to stdout

<hash> is 64 lowercase hexadecimal digits (32-byte BLAKE3). Uppercase or mixed-case input is rejected. put is idempotent: re-putting identical bytes returns the same hash and does not modify the stored entry. A genuine BLAKE3 collision (two distinct byte sequences hashing to the same digest) fails with a hash-collision error and the stored bytes are not modified; this does not occur from honest inputs against a collision-resistant hash. get first runs apotheca's SHA-256 verification, then optionally rehashes under BLAKE3 (default on); a mismatch is reported as an integrity error rather than silently propagated. stat does not read or re-hash the bytes.

Exit status is 0 on success, non-zero on collision, not-found, integrity error, malformed hash, or any I/O failure, with a diagnostic on stderr.

Library

use syntheca::{Pool, Hash};

let pool = Pool::open("/path/to/pool")?;

let hash = pool.put(b"hello")?;        // hash = blake3("hello")
let bytes = pool.get(&hash)?;          // verified before return
let stat = pool.stat(&hash)?;          // { size, sha256 } from apotheca
assert_eq!(bytes, b"hello");

// Hashes round-trip through 64-char lowercase hex.
let s = hash.to_hex();
let parsed = Hash::from_hex(&s)?;
assert_eq!(hash, parsed);

Hashes are 32-byte BLAKE3 digests. The hash function is fixed at the type level; per-pool selection is deferred. Equality is over the underlying octets; the canonical wire encoding is 64 lowercase hex digits.

use syntheca::{Pool, Options};

// Disable BLAKE3 verify-on-read. apotheca's SHA-256 verify still runs.
let pool = Pool::open_with("/path/to/pool", Options { verify_on_read: false })?;

Errors split into PutError (HashCollision, Apotheca), GetError (NotFound, IntegrityError, Apotheca), StatError (NotFound, Apotheca). Lower-level apotheca errors propagate unchanged through the Apotheca variants. A malformed hash on Hash::from_hex is HashParseError (WrongLength or InvalidChar) — not a protocol error.

On-disk layout

A syntheca pool is an apotheca pool. Inside the pool root, each entry lives at store/<hex-blake3>/ with bytes and meta files (apotheca's layout). The directory name is the BLAKE3 hash of the bytes, encoded as 64 lowercase hex digits.

<pool>/
  store/
    <hex-blake3>/
      bytes              # the entry's bytes
      meta               # size and sha256 (apotheca's storage digest)
  tmp/
    <staging-id>/        # staging area for atomic put
      ...

bytes is octet-for-octet what was put. meta records apotheca's SHA-256 storage digest; the BLAKE3 hash is implicit in the directory name and is not stored separately. A read verifies bytes against the stored SHA-256 (apotheca) and, by default, rehashes under BLAKE3 to confirm the directory name still names what it claims to.

Two digests, one entry

Every entry has two associated hashes:

  • BLAKE3 — names the entry within syntheca, the unit of content equality, derived at put time and required at get/stat time.
  • SHA-256 — apotheca's storage-integrity digest; the value stat returns; the field apotheca uses to detect collisions and reject corrupted reads.

These are deliberately distinct. apotheca is the substrate and uses SHA-256 as its mandatory integrity hash for any caller; syntheca picks BLAKE3 for content-addressing on top. put collision detection rides on apotheca's SHA-256 comparison: differing bytes have differing SHA-256, so apotheca returns Collision even when the BLAKE3-derived names match — which syntheca surfaces as HashCollision.

Status and scope

Phase 1 reference implementation. Conformant with the syntheca Phase 1 protocol: operations, the BLAKE3-name encoding, two-hash integrity, verify-on-read, and the syn CLI surface. Inherits Phase 1 conformance from apotheca for the substrate (single local backend, atomic put, etc.).

Out of scope here: enumeration (apotheca exclusion, transitive — syntheca has no ls/list operation; consumers maintain their own manifests), deletion (write-once; GC is a higher-layer concern operating on the underlying apotheca backends directly), alternative hash functions, multi-pool composition, configuration files, state chains, history, schema validation. State and history live in projects above syntheca (metatheca, literium, dbaiv); syntheca stops below that line.

License

Licensed under either of MIT (LICENSE-MIT) or Apache-2.0 (LICENSE-APACHE) at your option.

See also

The protocol specification, decision rationale, and broader project framing live in the syntheca project group at https://gitlab.com/pantheca/syntheca. The substrate apotheca (named write-once store, no content-addressing) lives at https://gitlab.com/pantheca/apotheca.