# bllist
Durable, crash-safe, checksummed block-based linked list allocators stored in a single file.
`bllist` builds on [`bstack`](https://crates.io/crates/bstack) to provide persistent linked lists backed by fixed-size or variable-size blocks. Every block carries a CRC32 checksum; writes flush durably to disk before returning; and the file survives unclean shutdowns through automatic orphan recovery on the next open.
[](https://crates.io/crates/bllist)
[](https://docs.rs/bllist)
[](LICENSE)
---
## Features
- **Two allocator types** — `FixedBlockList` for uniform records, `DynamicBlockList` for variable-size records with bin-based reuse
- **CRC32 integrity** — every block is checksummed; corruption is detected on read
- **Crash safety** — all mutations are durable before returning; orphaned blocks are recovered on the next `open`
- **Power-of-two block sizes** — dynamic blocks have a total on-disk footprint (header + payload) that is always a power of two, enabling splitting and coalescing
- **Splitting** — when a free block is larger than needed (within `MAX_SPLIT` = 3 levels), it is halved and the spare half returned to its bin, reducing wasted space
- **Coalescing on open** — adjacent free blocks whose combined size is a power of two are merged into a single larger block, fighting long-term fragmentation
- **Zero-copy reads** — `read_into` and `pop_front_into` fill a caller-supplied buffer directly from the file
- **Thread-safe** — `Send + Sync`; concurrent reads are efficient via `pread` on Unix/Windows
- **Valid BStack files** — both list types produce valid `bstack` files; the BStack header and crash-recovery semantics are inherited for free
- **Cross-type protection** — `FixedBlockList` and `DynamicBlockList` use different file magics (`"BLLS"` vs `"BLLD"`) and cannot open each other's files
---
## Quick start
Add to `Cargo.toml`:
```toml
[dependencies]
bllist = "0.1"
```
### Fixed-size blocks
```rust
use bllist::FixedBlockList;
fn main() -> Result<(), bllist::Error> {
// 52 bytes of payload per block (64 bytes total on disk).
let list = FixedBlockList::<52>::open("data.blls")?;
list.push_front(b"hello")?;
list.push_front(b"world")?;
while let Some(data) = list.pop_front()? {
println!("{}", String::from_utf8_lossy(&data));
}
// prints "world", then "hello"
Ok(())
}
```
### Variable-size blocks
```rust
use bllist::DynamicBlockList;
fn main() -> Result<(), bllist::Error> {
// The total on-disk block size (header + payload) is a power of two.
// A 5-byte push occupies 32 bytes on disk (5+20=25 → 32, bin 5).
let list = DynamicBlockList::open("data.blld")?;
list.push_front(b"short")?;
list.push_front(b"a somewhat longer record")?;
while let Some(data) = list.pop_front()? {
println!("{}", String::from_utf8_lossy(&data));
}
Ok(())
}
```
---
## API overview
### `FixedBlockList<PAYLOAD_CAPACITY>`
`PAYLOAD_CAPACITY` is the number of **payload bytes** per block. Each block occupies `PAYLOAD_CAPACITY + 12` bytes on disk. `PAYLOAD_CAPACITY` must be `> 0`; a value of `0` is rejected at compile time.
| `open(path)` | Open or create; performs crash recovery |
| `push_front(data)` | Allocate, write, and prepend to the list |
| `pop_front()` → `Option<Vec<u8>>` | Unlink head, read payload, free block |
| `pop_front_into(buf)` → `bool` | Zero-copy pop into caller buffer |
| `alloc()` → `BlockRef` | Allocate a raw block (from free list or new) |
| `free(block)` | Return a block to the free list |
| `write(block, data)` | Write payload, preserve next pointer |
| `read(block)` → `Vec<u8>` | Read and checksum-verify payload |
| `read_into(block, buf)` | Zero-copy read into caller buffer |
| `set_next(block, next)` | Update next pointer, preserve payload |
| `get_next(block)` → `Option<BlockRef>` | Read next pointer (no CRC check) |
| `root()` → `Option<BlockRef>` | Current head of the active list |
| `payload_capacity()` | `PAYLOAD_CAPACITY` |
### `BlockRef`
A `Copy` handle encoding a block's logical byte offset in the BStack file. Treat it like a typed index; never forge offsets manually.
### `DynamicBlockList`
Blocks may hold any payload up to 2^31 − 20 bytes. The **total on-disk footprint** of a block (20-byte header + payload) is always a power of two, with a minimum of 32 bytes (bin 5, 12-byte payload). Allocation first checks the exact power-of-two bin, then searches up to `MAX_SPLIT` = 3 bins higher and splits if found, and finally extends the file. On open, adjacent free blocks whose combined size is a power of two are coalesced.
| `open(path)` | Open or create; validates header; coalesces free blocks and recovers orphans |
| `push_front(data)` | Allocate, write, and prepend to the list |
| `pop_front()` → `Option<Vec<u8>>` | Unlink head, read payload, free block |
| `pop_front_into(buf)` → `bool` | Zero-copy pop into caller buffer |
| `alloc(size)` → `DynBlockRef` | Allocate block; splits from larger bin or extends file |
| `free(block)` | Return a block to its bin |
| `write(block, data)` | Write payload, update `data_len` |
| `read(block)` → `Vec<u8>` | Read `data_len` bytes, checksum-verify |
| `read_into(block, buf)` | Zero-copy read into caller buffer |
| `set_next(block, next)` | Update next pointer, preserve payload |
| `get_next(block)` → `Option<DynBlockRef>` | Read next pointer (no CRC check) |
| `root()` → `Option<DynBlockRef>` | Current head of the active list |
| `capacity(block)` → `usize` | Payload capacity = `block_size − 20` |
| `data_len(block)` → `usize` | Bytes last written to this block |
| `data_start(block)` → `u64` | Logical offset of first payload byte (validates offset) |
| `data_end(block)` → `u64` | Logical offset past the last written byte (reads `data_len`) |
| `bstack()` → `&BStack` | Underlying file handle for raw read-only streaming |
| `block_size_for(size)` → `usize` | Smallest power-of-two total size ≥ `size + 20` (min 32) |
### `DynBlockRef`
A `Copy` handle encoding a dynamic block's logical byte offset. Analogous to `BlockRef` but for `DynamicBlockList`.
| `data_start()` → `u64` | Logical offset of the first payload byte (`self.0 + 20`); pure, no I/O |
---
## Streaming reads
`read()` and `read_into()` verify the CRC on every call and either allocate a
`Vec<u8>` or copy into a caller buffer. For large payloads — or when you need
to hand a byte range to another layer (e.g. `sendfile`, a scatter-gather
buffer, or an async runtime) — `DynamicBlockList` exposes three building
blocks that let you issue a single raw read:
| `block.data_start()` | none (pure arithmetic) | start of payload as `u64` |
| `list.data_end(block)?` | 1 × 4-byte read (`data_len`) | one-past-end of written data as `u64` |
| `list.bstack().get_into(start, buf)?` | 1 × `pread` of your chosen length | fills `buf` from the file |
```rust
use bllist::DynamicBlockList;
let list = DynamicBlockList::open("data.blld")?;
// … obtain `block` from push_front / pop_front / root traversal …
// Compute the byte range with no file I/O.
let start: u64 = block.data_start();
let end: u64 = list.data_end(block)?; // one 4-byte read
// Single pread into a caller-owned buffer — no CRC, no Vec allocation.
let mut buf = vec![0u8; (end - start) as usize];
list.bstack().get_into(start, &mut buf)?;
// Or fill a sub-range of an existing buffer:
list.bstack().get_into(start, &mut frame[offset..])?;
```
> **Only read-only BStack operations are safe**: `get`, `get_into`, `peek`,
> `len`. Never call `push`, `pop`, or `set` on the handle returned by
> `bstack()` — doing so can silently corrupt the list structure. Use `read()`
> or `read_into()` when CRC verification matters.
---
## File layouts
### `FixedBlockList` files (`"BLLS"`)
```
┌──────────────────────────┬───────────────────────────────────────────┐
│ BStack header (16 B) │ bllist header (24 B, logical offset 0) │
│ "BSTK" magic + clen │ "BLLS" + version + root + free_head │
├──────────────────────────┴───────────────────────────────────────────┤
│ Block 0 (PAYLOAD_CAPACITY+12 bytes, logical offset 24) │
│ checksum(4) │ next(8) │ payload(PAYLOAD_CAPACITY) │
├──────────────────────────────────────────────────────────────────────┤
│ Block 1 (PAYLOAD_CAPACITY+12 bytes, logical offset 24+PC+12) … │
└──────────────────────────────────────────────────────────────────────┘
```
- The **bllist header** stores the root block offset and single free-list-head offset.
- The **block checksum** covers bytes `[4..PAYLOAD_CAPACITY+12]` (next pointer + full payload field). Payload bytes beyond the last `write` are guaranteed to be zero.
- The **free list** is an embedded singly-linked list using the `next` field of freed blocks.
### `DynamicBlockList` files (`"BLLD"`)
```
┌──────────────────────────┬───────────────────────────────────────────────┐
│ BStack header (16 B) │ bllist-dynamic header (272 B, logical off 0) │
│ "BSTK" magic + clen │ "BLLD" + version + root + bin_heads[32] │
├──────────────────────────┴───────────────────────────────────────────────┤
│ Block (total size = 2^k bytes, k ≥ 5) │
│ checksum(4) │ next(8) │ block_size(4) │ data_len(4) │ payload(bs-20 B) │
├──────────────────────────────────────────────────────────────────────────┤
│ Block … │
└──────────────────────────────────────────────────────────────────────────┘
```
- The **bllist-dynamic header** stores the root offset and 32 bin free-list heads. Bin *k* holds free blocks whose **total on-disk size** equals 2^*k* bytes (bins 0–4 are always empty; minimum is bin 5 = 32 bytes).
- The **`block_size` field** stores the total on-disk size of the block (header + payload), always a power of two (≥ 32). Payload capacity = `block_size − 20`.
- The **block checksum** covers bytes `[4..block_size]` (next + block_size + data_len + full payload field, including zero-padded tail).
- `data_len` records how many bytes were written; bytes beyond it are guaranteed to be zero.
---
## Crash safety details
`bllist` is designed around two principles:
1. **Durable writes** — every `stack.set()` / `stack.push()` call issues `fsync` (or `F_FULLFSYNC` on macOS) before returning.
2. **CRC-detected partial writes** — the checksum over the block header and payload detects any block that was partially overwritten before a crash.
On `open`, the file is scanned for *orphaned* blocks (allocated but not reachable from either the active list or any free list). They are silently reclaimed.
| During `alloc` (file grow) | Block exists but is in no list | Reclaimed as orphan on next `open` |
| After `alloc`, before `push_front` links it | Block written but root not updated | Reclaimed as orphan on next `open` |
| After `pop_front` advances root, before `free` | Block exists but in no list | Reclaimed as orphan on next `open` |
No data that was fully committed (root updated) is ever lost.
---
## Choosing the right type
| Record size | Always the same | Varies |
| On-disk overhead per block | 12 bytes | 20 bytes |
| Block size on disk | `PAYLOAD_CAPACITY + 12` | Power of two ≥ `payload + 20` (min 32) |
| Free list | Single flat list | 32 power-of-two bins |
| Splitting | No | Up to `MAX_SPLIT` = 3 levels above target bin |
| Coalescing on open | No | Adjacent free blocks merged when sum is power of two |
| Orphan scan | O(n) slot enumeration | O(n) sequential scan + rebuild |
| File magic | `"BLLS"` | `"BLLD"` |
| On-disk format version | 1 | 2 |
### Choosing `PAYLOAD_CAPACITY` for `FixedBlockList`
- Minimum: `1`
- For small records: `52` (64 bytes on disk), `116` (128 bytes on disk)
- For larger records: set to your typical payload size directly
- `PAYLOAD_CAPACITY = 0` is rejected at compile time
---
## Direct file access — use with extreme caution
Both list types produce valid BStack files, so you can open them with
`bstack::BStack::open` or inspect the raw bytes with any file tool.
**Writing to the file outside of `bllist` is strongly discouraged.**
`bllist` does not re-validate structural invariants on every operation, so
direct writes can silently corrupt the list in ways that are not caught until
much later — or not caught at all.
Specific dangers:
| `BStack::push` | Appends raw bytes that are not a complete, aligned block; breaks slot enumeration and orphan recovery |
| `BStack::pop` | May truncate a block mid-stream or destroy the list header |
| `BStack::set` at header offsets | Overwrites root or free-list / bin-head pointers |
| `BStack::set` inside a block | Invalidates the block's CRC; `read` will return a checksum error |
| Raw file writes (`write(2)`, etc.) | Bypasses the advisory lock entirely; any of the above, plus potential torn writes |
**The exclusive advisory lock** (`flock` on Unix, `LockFileEx` on Windows)
held by a live list prevents a second process from opening the same file
through BStack simultaneously. It does **not** prevent raw file
descriptor access, so a process that opens the file without going through
BStack can bypass the lock and cause corruption.
**Safe read-only inspection** is possible: open the file with
`bstack::BStack::open` and use only `get`, `peek`, and `len`. These calls do
not write to the file and will not disturb the list state. Mutating calls
(`push`, `pop`, `set`) must not be used.