Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
riegeli-rs
A pure-Rust, byte-level compatible implementation of Google's Riegeli/records file format — a high-performance, seekable, compressed record store used in machine learning and data pipelines.
Why Riegeli?
Riegeli files combine fast sequential writes, random-access reads, and transparent compression into a single container format. Records are grouped into chunks, chunks are aligned to 64 KiB blocks with integrity hashes at every boundary, and the whole file can be seeked by numeric position without scanning from the start. This crate brings that format to the Rust ecosystem with no C++ dependency.
Quick start
[]
= "0.1"
Write three records and read them back:
use Cursor;
use ;
// Write
let mut buf = Vecnew;
let opts = new.compression;
let mut writer = new.unwrap;
writer.write_record.unwrap;
writer.write_record.unwrap;
writer.write_record.unwrap;
writer.close.unwrap;
// Read
let mut reader = new.unwrap;
while let Some = reader.read_record.unwrap
File format overview
A Riegeli file is divided into fixed-size blocks of 65 536 bytes. Every
block boundary carries a 24-byte BlockHeader containing a HighwayHash-64
integrity hash and previous_chunk / next_chunk file-offset pointers
(little-endian u64). These pointers allow a reader that lands on any block
boundary to locate the nearest chunk without scanning.
The first block header is at offset 0. Immediately after it, at offset 24, sits
a 40-byte file-signature chunk (chunk type 's').
Data is stored in chunks. Each chunk begins with a 40-byte ChunkHeader:
| Field | Size | Description |
|---|---|---|
header_hash |
8 bytes | HighwayHash-64 of the remaining 32 bytes |
data_size |
8 bytes | Byte length of the chunk data |
data_hash |
8 bytes | HighwayHash-64 of the chunk data |
chunk_type_and_num_records |
8 bytes | Low 8 bits = chunk type, high 56 = count |
decoded_data_size |
8 bytes | Uncompressed payload size |
Chunk types:
| Type | Byte | Purpose |
|---|---|---|
| Simple | 'r' |
Records stored sequentially |
| FileSignature | 's' |
Marks the start of a valid Riegeli file |
| FileMetadata | 'm' |
Optional serialized proto metadata |
| Padding | 'p' |
Alignment padding |
| Transposed | 't' |
Columnar proto decomposition |
Compression (first byte of chunk data for compressed types):
| Algorithm | Byte |
|---|---|
| None | 0x00 |
| Brotli | 'b' |
| Zstd | 'z' |
| Snappy | 's' |
Public API
RecordWriter
let opts = new
.compression
.transpose // enable columnar encoding for proto records
.chunk_size // flush every ~1 MiB of record data
.initial_padding; // pad file size to a multiple on close
let mut writer = new?;
writer.write_record?;
writer.flush?; // ensure buffered records are written
writer.close?; // finalize the file; further writes return Err
RecordReader
let mut reader = new?;
// Sequential read
while let Some = reader.read_record?
// Position and seek
let pos = reader.last_pos; // RecordPosition of last-read record
let n = pos.numeric; // u64 suitable for storage
reader.seek_numeric?; // return to that record
let same = reader.read_record?; // re-read it
// Metadata
if let Some = reader.read_metadata?
// Field projection (transpose chunks only)
use ;
let proj = new
.add_field // include proto field 1
.add_field; // include proto field 2
let opts = new.field_projection;
let mut reader = new?;
RecordPosition
Returned by RecordReader::pos() and last_pos(). Call .numeric() to obtain
a u64 suitable for persistence, then pass to seek_numeric to restore
position.
Cargo features
| Feature | Default | Crate | Description |
|---|---|---|---|
brotli |
no | brotli |
Brotli compression |
zstd |
yes | zstd |
Zstd compression |
snappy |
no | snap |
Snappy compression |
To use Brotli and Zstd with no Snappy:
= { = "...", = ["brotli", "zstd"] }
Implementation status
All phases are complete and byte-level compatible with the C++ reference implementation.
| Phase | Sprints | Scope |
|---|---|---|
| 1 | 1–7 | Varint, headers, hashing, simple chunks (all compression codecs), RecordWriter, RecordReader (seek, recovery, block-boundary handling) |
| 2 | 8–13 | Proto wire parsing, transpose encoder + decoder (full state machine with NoOp bridging, implicit transitions), interop hardening |
| 3 | 14–24 | Conformance suite, performance tuning, field projection, API restriction |
Conformance is verified against the C++ reference implementation via an FFI
test harness (riegeli-ffi) that calls the C++ reader and writer over a
cxx bridge. The test suite uses golden files produced by C++
to validate byte-level interoperability in both directions.
Build requirements
This crate generates Rust code from .proto files at build time using
protobuf-codegen, which requires a compatible protoc binary on your PATH.
Download the latest release from the
protobuf releases page
(look for protoc-<version>-<platform>.zip).
Dependencies
| Crate | Version | Required | Purpose |
|---|---|---|---|
highway |
1.3 | always | HighwayHash-64 |
brotli |
8 | optional | Brotli codec |
zstd |
0.13 | optional | Zstd codec |
snap |
1 | optional | Snappy codec |
Dev dependencies: proptest, criterion.
Benchmarks
See riegeli/benches/README.md for the full
head-to-head Rust vs. C++ benchmark matrix. Representative results on Linux
x86-64 (10 000 records, large payload):
| Config | Rust write | Rust read | C++ write | C++ read |
|---|---|---|---|---|
| simple+none | 1348 MB/s | 2832 MB/s | 747 MB/s | 1343 MB/s |
| simple+zstd:3 | 2123 MB/s | 3070 MB/s | 3693 MB/s | 5914 MB/s |
| transpose+none | 956 MB/s | 808 MB/s | 693 MB/s | 1149 MB/s |
| transpose+zstd:3 | 1605 MB/s | 845 MB/s | 3142 MB/s | 4123 MB/s |
C++ read throughput is measured through the FFI bridge and includes a per-record copy across the boundary, making it lower than native C++ performance.
License
Apache-2.0