Crate anybytes

Crate anybytes 

Source
Expand description

Crates.io Version docs.rs Discord Shield

This Library is still pre-0.1.0 the API is therefore in heavy flux, and everything should be considered alpha!

A small library for conveniently working with immutables bytes from different sources, providing zero-copy slicing and cloning.

Access itself is extremely cheap via no-op conversion to a &[u8].

The storage mechanism backing the bytes can be extended and is implemented for a variety of sources already, including other byte handling crates Bytes, mmap-ed files, Strings and Zerocopy types.

See INVENTORY.md for notes on possible cleanup and future functionality.

§Overview

Bytes decouples data access from lifetime management through two traits: ByteSource and ByteOwner. A ByteSource can yield a slice of its bytes and then convert itself into a ByteOwner that keeps the underlying storage alive. This separation lets callers obtain a borrow of the bytes, drop any locks or external guards, and still retain the data by storing the owner behind an Arc. No runtime indirection is required when constructing a Bytes, and custom storage types integrate by implementing ByteSource.

§Quick Start

use anybytes::Bytes;

fn main() {
    // create `Bytes` from a vector
    let bytes = Bytes::from(vec![1u8, 2, 3, 4]);

    // take a zero-copy slice
    let slice = bytes.slice(1..3);

    // convert it to a typed View
    let view = slice.view::<[u8]>().unwrap();
    assert_eq!(&*view, &[2, 3]);
}

The full example is available in examples/quick_start.rs.

§Arc Sources

Bytes can also be created directly from an Arc holding a byte container. This avoids allocating another Arc wrapper:

use anybytes::Bytes;
use std::sync::Arc;

let data = Arc::new(vec![1u8, 2, 3, 4]);
let bytes = Bytes::from(data.clone());
assert_eq!(bytes.as_ref(), data.as_slice());

Implementing ByteSource for Arc<[u8]> or Arc<Vec<u8>> is therefore unnecessary, since Bytes::from already reuses the provided Arc.

§Reclaiming Ownership

Bytes::try_unwrap_owner allows recovering the original owner when no other references exist.

use anybytes::Bytes;

let bytes = Bytes::from(vec![1u8, 2, 3]);
let vec = bytes.try_unwrap_owner::<Vec<u8>>().expect("unique owner");
assert_eq!(vec, vec![1, 2, 3]);

§Advanced Usage

Bytes can directly wrap memory-mapped files or other large buffers. Combined with the view module this enables simple parsing of structured data without copying:

use anybytes::Bytes;
use zerocopy::{FromBytes, Immutable, KnownLayout};

#[derive(FromBytes, Immutable, KnownLayout)]
#[repr(C)]
struct Header { magic: u32, count: u32 }

// `file` can be any type that implements `memmap2::MmapAsRawDesc` such as
// `&std::fs::File` or `&tempfile::NamedTempFile`.
fn read_header(file: &std::fs::File) -> std::io::Result<anybytes::view::View<Header>> {
    let bytes = unsafe { Bytes::map_file(file)? };
    Ok(bytes.view().unwrap())
}

To map only a portion of a file use the unsafe helper Bytes::map_file_region(file, offset, len).

§Byte Area

Use ByteArea to incrementally build immutable bytes on disk; each section can yield a handle that reconstructs its range after the area is frozen:

use anybytes::area::ByteArea;

let mut area = ByteArea::new().unwrap();
let mut sections = area.sections();
let mut section = sections.reserve::<u8>(4).unwrap();
section.copy_from_slice(b"test");
let handle = section.handle();
let bytes = section.freeze().unwrap();
drop(sections);
let all = area.freeze().unwrap();
assert_eq!(handle.bytes(&all).as_ref(), bytes.as_ref());
assert_eq!(handle.view(&all).unwrap().as_ref(), b"test".as_ref());

Call area.persist(path) to keep the temporary file instead of mapping it.

The area only aligns allocations to the element type and may share pages between adjacent sections to minimize wasted space. Multiple sections may be active simultaneously; their byte ranges do not overlap.

See examples/byte_area.rs for a complete example that reserves different typed sections, mutates them simultaneously, and then either freezes the area into Bytes or persists it to disk.

§Features

By default the crate enables the mmap and zerocopy features. Other optional features provide additional integrations:

  • bytes – support for the bytes crate so bytes::Bytes can act as a ByteSource.
  • ownedbytes – adds compatibility with ownedbytes and implements its StableDeref trait.
  • mmap – enables memory-mapped file handling via the memmap2 crate.
  • zerocopy – exposes the view module for typed zero-copy access and allows using zerocopy types as sources.
  • pyo3 – builds the pyanybytes module to provide Python bindings for Bytes.
  • winnow – implements the Stream traits for Bytes and offers parsers (view, view_elems(count)) that return typed Views.

Enabling the pyo3 feature requires the Python development headers and libraries (for example libpython3.x). Running cargo test --all-features therefore needs these libraries installed; otherwise disable the feature during testing.

§Examples

§Comparison

CrateActiveExtensiblemmap supportZerocopy IntegrationPyo3 Integrationkani verified
anybytes🚧
bytes1
ownedbytes
minibytes2

§Development

Run ./scripts/preflight.sh from the repository root before committing. The script formats the code and executes all tests using Python 3.12 for the pyo3 feature.

Kani proofs and deterministic fuzz smoke tests are executed with ./scripts/verify.sh, which should be run on a dedicated system. The script installs the Kani verifier, cargo-fuzz, and the nightly toolchain before running cargo kani --workspace --all-features followed by a bounded cargo +nightly fuzz run. Override the fuzz target or arguments by setting FUZZ_TARGET or FUZZ_ARGS in the environment. Verification can take a long time and isn’t needed for quick development iterations.

For exploratory fuzzing use cargo fuzz run bytes_mut_ops. The fuzz target mirrors a simple vector model to ensure helpers like take_prefix and pop_front remain consistent when exercised by randomized sequences.

§Glossary

§Acknowledgements

This library started as a fork of the minibyte library in facebooks sapling scm.

Thanks to @kylebarron for his feedback and ideas on Pyo3 integration.


  1. Recently added a new “Owned Bytes” variant, which still has all the downsides of a V-Table. 

  2. Recently published again. 

Re-exports§

pub use crate::area::ByteArea;
pub use crate::area::Section;
pub use crate::area::SectionWriter;
pub use crate::bytes::ByteOwner;
pub use crate::bytes::ByteSource;
pub use crate::bytes::Bytes;
pub use crate::bytes::WeakBytes;
pub use crate::view::View;

Modules§

area
Temporary byte area backed by a file.
bytes
Core byte container types and traits. Core byte container types.
view
Types for zero-copy viewing of structured data.