Crate flatdata

source ·
Expand description

Implementation of heremaps/flatdata in Rust.

Flatdata is a library providing data structures for convenient creation, storage and access of packed memory-mappable structures with minimal overhead.

The idea is, that the user defines a schema of the data format using flatdata’s very simple schema language supporting plain structs, vectors and multivectors. The schema is then used to generate builders and readers for serialization and deserialization of the data. The data is serialized in a portable way which allows zero-overhead random access to it by using memory mapped storage. Memory mapped approach makes it possible to use the operating system facilities for loading, caching and paging of the data, and most important, accessing it as if it were in memory. Read more in “Why flatdata?”.

This crate provides:

The generator is part of the main heremaps/flatdata repository, the generate helper function is provided as a convenience wrapper.

For a comprehensive example, cf. coappearances schema and the corresponding usage.

Examples

First you design a schema for the data you want to store. Let’s say we want to store a list of prime factors for each natural number:

namespace prime {
// Represents a single prime factor of a number and how often it occurs.
struct Factor {
    value : u32 : 32;
    count : u32 : 8;
}

// Points towards the beginning of the list of prime numbers.
struct Number {
    @range(factors)
    first_factor_ref : u32;
}

// Stores a list of prime factors for numbers from 0 to N
archive Archive {
    @explicit_reference( Number.first_factor_ref, factors )
    numbers : vector<Number>;

    factors : vector<Factor>;
}
}

Maybe create a diagram using the dot generator from heremaps/flatdata:

diag

Then you generate code using e.g. the generate utility in a build.rs script and include it in your project. Now you can create a (disk-based) archive and fill it with data:

include!("prime_generated.rs");

use flatdata::{ MemoryResourceStorage};

pub fn calculate_prime_factors(
    builder: &mut prime::ArchiveBuilder,
    max_number: u32,
) -> std::io::Result<()> {
    let mut numbers = builder.start_numbers()?;
    let mut factors = builder.start_factors()?;
    numbers.grow()?.set_first_factor_ref(0);
    for mut x in 0..=max_number {
        // Let's calculate prime factor in a very inefficient way
        for y in 2..x {
            let mut count = 0;
            while x % y == 0 {
                count += 1;
                x /= y;
            }
            if count > 0 {
                let mut factor = factors.grow()?;
                factor.set_value(y);
                factor.set_count(count);
            }
        }
        numbers.grow()?.set_first_factor_ref(factors.len() as u32);
    }
    numbers.close().expect("Failed to close");
    factors.close().expect("Failed to close");
    Ok(())
}

pub fn main() {
    let storage = MemoryResourceStorage::new("/primes");
let mut builder =
    prime::ArchiveBuilder::new(storage.clone()).expect("failed to create builder");
calculate_prime_factors(&mut builder, 10000).expect("Failed to write archive");
// store archive for re-use
// ...
// in a different application open archive for use:
let archive = prime::Archive::open(storage).expect("failed to open archive");
let number = 1234;
let factor_range = archive.numbers().at(number).first_factor_ref() as usize
    ..archive.numbers().at(number + 1).first_factor_ref() as usize;
let factors: Vec<_> = archive
    .factors()
    .slice(factor_range)
    .iter()
    .flat_map(|x| std::iter::repeat(x.value()).take(x.count() as usize))
    .collect();
println!("List if prime factors for {}: {:?}", number, factors);
}

This will print

List if prime factors for 1234 is [2, 617]

Optional Features

The following are a list of Cargo features that can be enabled or disabled:

  • tar: Enables support for reading TAR archives using the TarArchiveResourceStorage struct.

Re-exports

  • pub use crate::memory::PADDING_SIZE;
  • pub use crate::storage::check_optional_resource;
  • pub use crate::storage::check_resource;
  • pub use crate::storage::create_archive;
  • pub use crate::storage::create_external_vector;
  • pub use crate::storage::create_multi_vector;

Structs

  • Vector which flushes its content when growing.
  • Resource storage on disk using memory mapped files.
  • Resource storage in memory.
  • A read-only view on a multivector.
  • A container for writing an indexed sequence of heterogeneous data items.
  • Exposes blocks of raw data, providing auxiliary functionality like extracting substrings.
  • Read-only resource storage on disk using a memory mapped tar archive.
  • A container holding a contiguous sequence of flatdata structs of the same type T in memory, and providing read and write access to it.

Enums

Traits

  • A specialized Struct factory producing Index items. Used primarily by the MultiVector/MultiArrayView.
  • Marks structs that can be used stand-alone, e.g. no range
  • Marks structs that cannot be used stand-alone, e.g. no range
  • Hierarchical Resource Storage
  • Enhanced slices of flatdata Structs so that they can be created from bytes / converted to bytes Note: TryFrom/AsRef cannot be used, since slice is a foreign type
  • A factory trait used to bind lifetime to Ref implementations.
  • A type used as element of ‘MultiArrayView’.
  • A type used as element of MultiArrayView.
  • Shortcut trait for VariadicStructs that are able to produce references of any given lifetime
  • A type used to create VariadicStructs.

Functions

  • A helper function wrapping the flatdata generator.

Type Aliases

  • A handle to a resource storage used by archives
  • Index specifying a variadic type of MultiArrayView.