syntheca
Content-addressable storage. Bytes go in, a BLAKE3 hash comes out; lookups by hash; identical bytes coalesce into a single stored entry.
This crate (syntheca, binary syn) is the Rust reference implementation
of the syntheca protocol. It is a thin layer over
apotheca: syntheca derives the
apotheca name from blake3(bytes), treats apotheca's collision outcome
as benign dedup when the bytes match, and adds an optional verify-on-read
that rehashes under BLAKE3 (in addition to apotheca's mandatory SHA-256
verification). Phase 1: single hash function (BLAKE3, fixed), single
apotheca pool, three operations (put, get, stat).
Install
The binary:
The library:
[]
= "0.1"
CLI
syn exposes the three operations one-for-one. The default pool root is
$HOME/.syntheca/; override with --pool <dir>.
<hash> is 64 lowercase hexadecimal digits (32-byte BLAKE3). Uppercase or
mixed-case input is rejected. put is idempotent: re-putting identical
bytes returns the same hash and does not modify the stored entry. A genuine
BLAKE3 collision (two distinct byte sequences hashing to the same digest)
fails with a hash-collision error and the stored bytes are not modified;
this does not occur from honest inputs against a collision-resistant hash.
get first runs apotheca's SHA-256 verification, then optionally rehashes
under BLAKE3 (default on); a mismatch is reported as an integrity error
rather than silently propagated. stat does not read or re-hash the bytes.
Exit status is 0 on success, non-zero on collision, not-found, integrity
error, malformed hash, or any I/O failure, with a diagnostic on stderr.
Library
use ;
let pool = open?;
let hash = pool.put?; // hash = blake3("hello")
let bytes = pool.get?; // verified before return
let stat = pool.stat?; // { size, sha256 } from apotheca
assert_eq!;
// Hashes round-trip through 64-char lowercase hex.
let s = hash.to_hex;
let parsed = from_hex?;
assert_eq!;
Hashes are 32-byte BLAKE3 digests. The hash function is fixed at the type level; per-pool selection is deferred. Equality is over the underlying octets; the canonical wire encoding is 64 lowercase hex digits.
use ;
// Disable BLAKE3 verify-on-read. apotheca's SHA-256 verify still runs.
let pool = open_with?;
Errors split into PutError (HashCollision, Apotheca), GetError
(NotFound, IntegrityError, Apotheca), StatError (NotFound,
Apotheca). Lower-level apotheca errors propagate unchanged through the
Apotheca variants. A malformed hash on Hash::from_hex is HashParseError
(WrongLength or InvalidChar) — not a protocol error.
On-disk layout
A syntheca pool is an apotheca pool. Inside the pool root, each entry lives
at store/<hex-blake3>/ with bytes and meta files (apotheca's layout).
The directory name is the BLAKE3 hash of the bytes, encoded as 64 lowercase
hex digits.
<pool>/
store/
<hex-blake3>/
bytes # the entry's bytes
meta # size and sha256 (apotheca's storage digest)
tmp/
<staging-id>/ # staging area for atomic put
...
bytes is octet-for-octet what was put. meta records apotheca's SHA-256
storage digest; the BLAKE3 hash is implicit in the directory name and is
not stored separately. A read verifies bytes against the stored SHA-256
(apotheca) and, by default, rehashes under BLAKE3 to confirm the directory
name still names what it claims to.
Two digests, one entry
Every entry has two associated hashes:
- BLAKE3 — names the entry within syntheca, the unit of content
equality, derived at
puttime and required atget/stattime. - SHA-256 — apotheca's storage-integrity digest; the value
statreturns; the field apotheca uses to detect collisions and reject corrupted reads.
These are deliberately distinct. apotheca is the substrate and uses
SHA-256 as its mandatory integrity hash for any caller; syntheca picks
BLAKE3 for content-addressing on top. put collision detection rides on
apotheca's SHA-256 comparison: differing bytes have differing SHA-256, so
apotheca returns Collision even when the BLAKE3-derived names match —
which syntheca surfaces as HashCollision.
Status and scope
Phase 1 reference implementation. Conformant with the syntheca Phase 1
protocol: operations, the BLAKE3-name encoding, two-hash integrity,
verify-on-read, and the syn CLI surface. Inherits Phase 1 conformance
from apotheca for the substrate (single local backend, atomic put, etc.).
Out of scope here: enumeration (apotheca exclusion, transitive — syntheca
has no ls/list operation; consumers maintain their own manifests),
deletion (write-once; GC is a higher-layer concern operating on the
underlying apotheca backends directly), alternative hash functions,
multi-pool composition, configuration files, state chains, history,
schema validation. State and history live in projects above syntheca
(metatheca, literium, dbaiv); syntheca stops below that line.
License
Licensed under either of MIT (LICENSE-MIT) or Apache-2.0 (LICENSE-APACHE) at your option.
See also
The protocol specification, decision rationale, and broader project framing
live in the syntheca project group at
https://gitlab.com/pantheca/syntheca. The substrate apotheca (named
write-once store, no content-addressing) lives at
https://gitlab.com/pantheca/apotheca.