farena 0.3.0

A file-backed arena allocator using pread for memory byte storage
Documentation
# farena

File-backed arena allocator using `pread` for random access.

Write data to a temporary file, then read it back by location.
The data stays on disk instead of in memory, so your process doesn't use extra RAM.

`pread` lets us read from any offset without seeking, which means:
- No file position to manage between reads
- Thread-safe: multiple threads can read concurrently without locking

Use this when you need scratch space for bytes but can't afford to keep everything in memory.

## Limitations

- Each file is limited to 4GB (u32 offsets). For larger data, use multiple files.
- `FileArena` is immutable once built. To add more data, create a new writer,
  then build a new `FileArena` containing all files.
- Temp files use your system's temp directory (`TMPDIR`). This crate doesn't check
  if it's on real disk - make sure it's not a ramdisk like `tmpfs` or `ramfs`.
- This crate does many random reads. Use a fast storage for best performance.
- Each file in a `FileArena` keeps one file descriptor open for its lifetime.
  Creating arenas with thousands of files may hit your system's ulimit. Check with
  `ulimit -n` and monitor with `lsof -p $$ | wc -l`. Increase the limit or reduce
  file count if needed.

## Building multi-file arenas

Use `FileArenaBuilder` to safely assemble arenas from multiple writers.
It handles file placement automatically, so you don't need to worry
about the ordering contract:

```rust
let mut w0 = FileArenaWriter::new(0)?;
let loc0 = w0.push("data0")?;
let f0 = w0.finish()?;

let mut w1 = FileArenaWriter::new(1)?;
let loc1 = w1.push("data1")?;
let f1 = w1.finish()?;

let mut builder = FileArenaBuilder::new();
builder.add(f1, loc1);  // Order doesn't matter
builder.add(f0, loc0);
let arena = builder.build()?;
```

## Usage

```rust
use farena::{FileArenaWriter, Location};

// Write phase
let mut writer = FileArenaWriter::new(0)?;
let loc1 = writer.push("hello")?;
let loc2 = writer.push(" world")?;

// Read phase — into_arena() is a convenience for single-file arenas
let arena = writer.into_arena()?;

assert_eq!(arena.get(loc1)?, b"hello");
assert_eq!(arena.get(loc2)?, b" world");
```

## Multiple files (low-level)

**Prefer `FileArenaBuilder` above** — it enforces the ordering
contract automatically. `FileArena::new` is the low-level alternative.

Each writer gets a unique index. Files must be passed to
`FileArena::new` in index order:

```rust
let mut w1 = FileArenaWriter::new(0)?;
let loc1 = w1.push("data1")?;
let f1 = w1.finish()?;

let mut w2 = FileArenaWriter::new(1)?;
let loc2 = w2.push("data2")?;
let f2 = w2.finish()?;

let arena = FileArena::new(vec![f1, f2])?;
assert_eq!(arena.get(loc1)?, b"data1");
assert_eq!(arena.get(loc2)?, b"data2");
```

## Parallel writing

The design supports parallel writing. Each writer gets a unique index,
and `FileArenaBuilder` handles assembling the arena:

```rust
let items = vec!["item1", "item2", "item3", "item4"];

// Use .into_par_iter() with rayon for parallel execution
let results: Vec<(Location, std::fs::File)> = (0..items.len())
    .into_iter()
    .map(|i| {
        let mut writer = FileArenaWriter::new(i as u16).unwrap();
        let loc = writer.push(items[i]).unwrap();
        let file = writer.finish().unwrap();
        (loc, file)
    })
    .collect();

// Builder places files in the correct order automatically
let mut builder = FileArenaBuilder::new();
for (loc, file) in results {
    builder.add(file, loc);
}
let arena = builder.build()?;
```

## Graph/tree structures

A common pattern is storing node metadata in memory while keeping
large payloads on disk. This is useful when:

- Payloads are large and would consume too much memory
- You need to traverse the structure without loading all data at once
- You construct long text by concatenating payloads (e.g., thread content)

For example, a tree where each node has an ID and a text payload:

```rust
#[derive(Clone)]
struct Node {
    id: u64,
    payload_loc: Location,  // Text stored on disk
    children: Vec<u64>,
}

// Build your tree with Locations instead of storing text directly
let mut nodes = Vec::new();
let mut writer = FileArenaWriter::new(0)?;

// Write payloads, store locations
for (id, text) in &[("root", "root text"), ("child1", "child text")] {
    let loc = writer.push(*text)?;
    nodes.push(Node {
        id: hash(id),  // Your own hash function
        payload_loc: loc,
        children: vec![],
    });
}

let arena = writer.into_arena()?;

// Traverse and read payloads as needed
// Note: get_str_into appends, so we create a fresh buffer each iteration
for node in &nodes {
    let mut buf = String::new();
    arena.get_str_into(node.payload_loc, &mut buf)?;
    println!("Node {}: {}", node.id, buf);
}

// Or concatenate payloads into a single buffer
let mut full_text = String::new();
for node in &nodes {
    arena.get_str_into(node.payload_loc, &mut full_text)?;
}
// full_text now contains all payloads concatenated
```

## Buffer reuse

Reuse the same buffer across multiple reads to avoid allocations:

```rust
let mut buf = Vec::new();

arena.get_into(loc1, &mut buf)?;
assert_eq!(buf, b"hello");

buf.clear();  // Reuse without reallocating
arena.get_into(loc2, &mut buf)?;
assert_eq!(buf, b" world");
```

## Unsafe reads

If you know your stored data is valid UTF-8, use `get_str_into_unchecked`
to skip the UTF-8 validation:

```rust
let mut buf = String::new();

// SAFETY: we pushed valid UTF-8 above
unsafe { arena.get_str_into_unchecked(loc, &mut buf) }?;
assert_eq!(buf, "known utf8");
```

## Temp directory

Temp files are created in your system's temp directory (respects `TMPDIR`).
Check your temp directory is on real disk with:

```
df -h ${TMPDIR:-/tmp}
```

The filesystem should not be `tmpfs` or `ramfs`.

License: MIT