cowfile
A copy-on-write abstraction for binary data backed by memory or files.
Overview
cowfile provides CowFile, a type that wraps binary data with a pending write log backed by
either a Vec<u8> or an OS-level copy-on-write memory map (MAP_PRIVATE on Unix, PAGE_WRITECOPY
on Windows). Modifications accumulate in a pending log and are applied to the committed buffer on
commit(). A final merged output can be produced at any time without committing.
This is designed for binary analysis and transformation pipelines where multiple passes modify a binary (e.g., deobfuscation, patching) without needing to copy the entire file between each pass.
Features
- Zero-copy base layer: Memory-mapped files or owned byte vectors as the committed buffer
- Pending write log: Only modified byte ranges are stored, not the entire file
- Two-tier commit model:
data()returns committed state,read()composites pending writes - Typed I/O: Read/write primitives (
u8..u64,i8..i64,f32,f64) in little-endian or big-endian - User-defined types:
ReadFromandWriteTotraits for custom struct serialization - Cursor support:
CowFileCursorimplementsstd::io::Read,Write, andSeek - Thread-safe:
Send + Syncwith internalRwLocksynchronization - Fork support: Create independent copies that share read pages via OS-level CoW
- Dual output: Produce final output as
Vec<u8>or write directly to a file
Quick Start
use CowFile;
// Create from owned bytes
let pf = from_vec;
// Writes go to the pending log (uses &self — interior mutability)
pf.write.unwrap;
pf.write.unwrap;
// data() shows committed state (unchanged)
assert_eq!;
// read() composites pending writes over committed state
assert_eq!;
// Commit applies pending writes to the buffer (requires &mut self)
let mut pf = pf;
pf.commit.unwrap;
assert_eq!;
// More modifications in a second pass
pf.write.unwrap;
// Produce final output with all modifications applied
let output = pf.to_vec.unwrap;
assert_eq!;
assert_eq!;
assert_eq!;
From a file (memory-mapped)
use CowFile;
let pf = open.unwrap;
pf.write.unwrap; // Pending write
let mut pf = pf;
pf.commit.unwrap; // Only the first page is CoW'd
pf.to_file.unwrap; // Write output to disk
Typed primitive I/O
use CowFile;
let pf = from_vec;
// Write and read little-endian u32
pf..unwrap;
assert_eq!;
// Write and read big-endian u16
pf..unwrap;
assert_eq!;
User-defined types
use ;
let pf = from_vec;
pf.write_type.unwrap;
let header: Header = pf.read_type.unwrap;
assert_eq!;
assert_eq!;
Cursor (std::io compatibility)
use ;
use CowFile;
let pf = from_vec;
let mut cursor = pf.cursor;
// Use standard I/O traits
cursor.write_all.unwrap;
cursor.seek.unwrap;
let mut buf = ;
cursor.read_exact.unwrap;
assert_eq!;
Forking
use CowFile;
let pf = open.unwrap;
pf.write.unwrap;
// Fork re-opens the file — shares read pages via OS-level CoW
let forked = pf.fork.unwrap;
assert!; // Fork starts clean
Architecture
Committed Buffer (immutable) Pending Log (copy-on-write)
+---------------------+ +-------------------------+
| Vec<u8> or MmapMut | <--- | Vec<PendingWrite> |
| (OS-level CoW) | | (applied on commit) |
+---------------------+ +-------------------------+
data() -> &[u8] of committed buffer (zero-cost)
read() -> composites pending writes over committed state
commit() -> applies pending to buffer, clears log
discard() -> clears pending log without applying
to_vec() / to_file() -> materializes with pending applied
API Summary
Constructors
| Method | Description |
|---|---|
CowFile::from_vec(data) |
Create from an owned Vec<u8> (zero-copy move) |
CowFile::open(path) |
Memory-map a file with copy-on-write semantics |
CowFile::from_file(file) |
Memory-map from an open std::fs::File |
Data Access
| Method | Description |
|---|---|
data() |
Returns &[u8] of committed buffer (pending not visible) |
read(offset, len) |
Read bytes with pending writes composited |
read_byte(offset) |
Read a single byte with pending composited |
read_le::<T>(offset) |
Read a primitive in little-endian order |
read_be::<T>(offset) |
Read a primitive in big-endian order |
read_type::<T>(offset) |
Read a user-defined ReadFrom type |
Writing
| Method | Description |
|---|---|
write(offset, data) |
Write bytes to the pending log (&self) |
write_byte(offset, byte) |
Write a single byte to the pending log |
write_le::<T>(offset, val) |
Write a primitive in little-endian order |
write_be::<T>(offset, val) |
Write a primitive in big-endian order |
write_type(offset, val) |
Write a user-defined WriteTo type |
Lifecycle
| Method | Description |
|---|---|
commit() |
Apply pending writes to the committed buffer (&mut self) |
discard() |
Clear pending writes without applying (&mut self) |
has_pending() |
Check if there are uncommitted writes |
fork() |
Create an independent copy (re-maps file if mmap-backed) |
Output
| Method | Description |
|---|---|
to_vec() |
Produce Vec<u8> with pending applied |
to_file(path) |
Write to disk with pending applied |
into_vec() |
Consume and return data (zero-copy if no pending and Vec-backed) |
cursor() |
Create a CowFileCursor implementing Read/Write/Seek |
Metadata
| Method | Description |
|---|---|
len() |
Total data length in bytes |
is_empty() |
Whether the data is empty |
source_path() |
Original file path (for open()-created instances) |
Thread Safety
CowFile is Send + Sync. The committed buffer can be read concurrently via data() from
multiple threads. Writes to the pending log are serialised by an internal RwLock. The dirty
flag uses an AtomicBool for a lock-free fast path when there are no pending writes.
Note that commit() and discard() require &mut self, so they need exclusive access.
License
Licensed under the Apache License, Version 2.0. See LICENSE-APACHE for details.