farena 0.3.1

A file-backed arena allocator using pread for memory byte storage
Documentation

farena

File-backed arena allocator using pread for random access.

Write data to a temporary file, then read it back by location. The data stays on disk instead of in memory, so your process doesn't use extra RAM.

pread lets us read from any offset without seeking, which means:

  • No file position to manage between reads
  • Thread-safe: multiple threads can read concurrently without locking

Use this when you need scratch space for bytes but can't afford to keep everything in memory.

Limitations

  • Each file is limited to 4GB (u32 offsets). For larger data, use multiple files.
  • FileArena is immutable once built. To add more data, create a new writer, then build a new FileArena containing all files.
  • Temp files use your system's temp directory (TMPDIR). This crate doesn't check if it's on real disk - make sure it's not a ramdisk like tmpfs or ramfs.
  • This crate does many random reads. Use a fast storage for best performance.
  • Each file in a FileArena keeps one file descriptor open for its lifetime. Creating arenas with thousands of files may hit your system's ulimit. Check with ulimit -n and monitor with lsof -p $$ | wc -l. Increase the limit or reduce file count if needed.

Building multi-file arenas

Use FileArenaBuilder to safely assemble arenas from multiple writers. It handles file placement automatically, so you don't need to worry about the ordering contract:

let mut w0 = FileArenaWriter::new(0)?;
let loc0 = w0.push("data0")?;
let f0 = w0.finish()?;

let mut w1 = FileArenaWriter::new(1)?;
let loc1 = w1.push("data1")?;
let f1 = w1.finish()?;

let mut builder = FileArenaBuilder::new();
builder.add(f1, loc1);  // Order doesn't matter
builder.add(f0, loc0);
let arena = builder.build()?;

Usage

use farena::{FileArenaWriter, Location};

// Write phase
let mut writer = FileArenaWriter::new(0)?;
let loc1 = writer.push("hello")?;
let loc2 = writer.push(" world")?;

// Read phase — into_arena() is a convenience for single-file arenas
let arena = writer.into_arena()?;

assert_eq!(arena.get(loc1)?, b"hello");
assert_eq!(arena.get(loc2)?, b" world");

Multiple files (low-level)

Prefer FileArenaBuilder above — it enforces the ordering contract automatically. FileArena::new is the low-level alternative.

Each writer gets a unique index. Files must be passed to FileArena::new in index order:

let mut w1 = FileArenaWriter::new(0)?;
let loc1 = w1.push("data1")?;
let f1 = w1.finish()?;

let mut w2 = FileArenaWriter::new(1)?;
let loc2 = w2.push("data2")?;
let f2 = w2.finish()?;

let arena = FileArena::new(vec![f1, f2])?;
assert_eq!(arena.get(loc1)?, b"data1");
assert_eq!(arena.get(loc2)?, b"data2");

Parallel writing

The design supports parallel writing. Each writer gets a unique index, and FileArenaBuilder handles assembling the arena:

let items = vec!["item1", "item2", "item3", "item4"];

// Use .into_par_iter() with rayon for parallel execution
let results: Vec<(Location, std::fs::File)> = (0..items.len())
    .into_iter()
    .map(|i| {
        let mut writer = FileArenaWriter::new(i as u16).unwrap();
        let loc = writer.push(items[i]).unwrap();
        let file = writer.finish().unwrap();
        (loc, file)
    })
    .collect();

// Builder places files in the correct order automatically
let mut builder = FileArenaBuilder::new();
for (loc, file) in results {
    builder.add(file, loc);
}
let arena = builder.build()?;

Graph/tree structures

A common pattern is storing node metadata in memory while keeping large payloads on disk. This is useful when:

  • Payloads are large and would consume too much memory
  • You need to traverse the structure without loading all data at once
  • You construct long text by concatenating payloads (e.g., thread content)

For example, a tree where each node has an ID and a text payload:

#[derive(Clone)]
struct Node {
    id: u64,
    payload_loc: Location,  // Text stored on disk
    children: Vec<u64>,
}

// Build your tree with Locations instead of storing text directly
let mut nodes = Vec::new();
let mut writer = FileArenaWriter::new(0)?;

// Write payloads, store locations
for (id, text) in &[("root", "root text"), ("child1", "child text")] {
    let loc = writer.push(*text)?;
    nodes.push(Node {
        id: hash(id),  // Your own hash function
        payload_loc: loc,
        children: vec![],
    });
}

let arena = writer.into_arena()?;

// Traverse and read payloads as needed
// Note: get_str_into appends, so we create a fresh buffer each iteration
for node in &nodes {
    let mut buf = String::new();
    arena.get_str_into(node.payload_loc, &mut buf)?;
    println!("Node {}: {}", node.id, buf);
}

// Or concatenate payloads into a single buffer
let mut full_text = String::new();
for node in &nodes {
    arena.get_str_into(node.payload_loc, &mut full_text)?;
}
// full_text now contains all payloads concatenated

Buffer reuse

Reuse the same buffer across multiple reads to avoid allocations:

let mut buf = Vec::new();

arena.get_into(loc1, &mut buf)?;
assert_eq!(buf, b"hello");

buf.clear();  // Reuse without reallocating
arena.get_into(loc2, &mut buf)?;
assert_eq!(buf, b" world");

Unsafe reads

If you know your stored data is valid UTF-8, use get_str_into_unchecked to skip the UTF-8 validation:

let mut buf = String::new();

// SAFETY: we pushed valid UTF-8 above
unsafe { arena.get_str_into_unchecked(loc, &mut buf) }?;
assert_eq!(buf, "known utf8");

Temp directory

Temp files are created in your system's temp directory (respects TMPDIR). Check your temp directory is on real disk with:

df -h ${TMPDIR:-/tmp}

The filesystem should not be tmpfs or ramfs.

License: MIT