Expand description
Implementation of heremaps/flatdata in Rust.
Flatdata is a library providing data structures for convenient creation, storage and access of packed memory-mappable structures with minimal overhead.
The idea is, that the user defines a schema of the data format using flatdata’s very simple schema language supporting plain structs, vectors and multivectors. The schema is then used to generate builders and readers for serialization and deserialization of the data. The data is serialized in a portable way which allows zero-overhead random access to it by using memory mapped storage. Memory mapped approach makes it possible to use the operating system facilities for loading, caching and paging of the data, and most important, accessing it as if it were in memory. Read more in “Why flatdata?”.
This crate provides:
- data structures for writing data to archives:
StructBuf
,Vector
,ExternalVector
,MultiVector
- data structures for reading data from archives:
ArrayView
,MultiArrayView
- resource storage backends for using archives:
MemoryResourceStorage
,FileResourceStorage
,TarArchiveResourceStorage
The generator is part of the main heremaps/flatdata repository,
the generate
helper function is provided as a convenience wrapper.
For a comprehensive example, cf. coappearances schema and the corresponding usage.
§Examples
First you design a schema for the data you want to store. Let’s say we want to store a list of prime factors for each natural number:
namespace prime {
// Represents a single prime factor of a number and how often it occurs.
struct Factor {
value : u32 : 32;
count : u32 : 8;
}
// Points towards the beginning of the list of prime numbers.
struct Number {
@range(factors)
first_factor_ref : u32;
}
// Stores a list of prime factors for numbers from 0 to N
archive Archive {
@explicit_reference( Number.first_factor_ref, factors )
numbers : vector<Number>;
factors : vector<Factor>;
}
}
Maybe create a diagram using the dot generator from heremaps/flatdata:
Then you generate code using e.g. the generate
utility in a build.rs
script and include it in your project.
Now you can create a (disk-based) archive and fill it with data:
include!("prime_generated.rs");
use flatdata::{ MemoryResourceStorage};
pub fn calculate_prime_factors(
builder: &mut prime::ArchiveBuilder,
max_number: u32,
) -> std::io::Result<()> {
let mut numbers = builder.start_numbers()?;
let mut factors = builder.start_factors()?;
numbers.grow()?.set_first_factor_ref(0);
for mut x in 0..=max_number {
// Let's calculate prime factor in a very inefficient way
for y in 2..x {
let mut count = 0;
while x % y == 0 {
count += 1;
x /= y;
}
if count > 0 {
let mut factor = factors.grow()?;
factor.set_value(y);
factor.set_count(count);
}
}
numbers.grow()?.set_first_factor_ref(factors.len() as u32);
}
numbers.close().expect("Failed to close");
factors.close().expect("Failed to close");
Ok(())
}
pub fn main() {
let storage = MemoryResourceStorage::new("/primes");
let mut builder =
prime::ArchiveBuilder::new(storage.clone()).expect("failed to create builder");
calculate_prime_factors(&mut builder, 10000).expect("Failed to write archive");
// store archive for re-use
// ...
// in a different application open archive for use:
let archive = prime::Archive::open(storage).expect("failed to open archive");
let number = 1234;
let factor_range = archive.numbers().at(number).first_factor_ref() as usize
..archive.numbers().at(number + 1).first_factor_ref() as usize;
let factors: Vec<_> = archive
.factors()
.slice(factor_range)
.iter()
.flat_map(|x| std::iter::repeat(x.value()).take(x.count() as usize))
.collect();
println!("List if prime factors for {}: {:?}", number, factors);
}
This will print
List if prime factors for 1234 is [2, 617]
§Optional Features
The following are a list of Cargo features that can be enabled or disabled:
- tar: Enables support for reading TAR archives using the
TarArchiveResourceStorage
struct.
Structs§
- External
Vector - Vector which flushes its content when growing.
- File
Resource Storage - Resource storage on disk using memory mapped files.
- Memory
Resource Storage - Resource storage in memory.
- Multi
Array View - A read-only view on a multivector.
- Multi
Vector - A container for writing an indexed sequence of heterogeneous data items.
- RawData
- Exposes blocks of raw data, providing auxiliary functionality like extracting substrings.
- TarArchive
Resource Storage - Read-only resource storage on disk using a memory mapped tar archive.
- Vector
- A container holding a contiguous sequence of flatdata structs of the same
type
T
in memory, and providing read and write access to it.
Enums§
- Generator
Error - Error type for generate function
- Resource
Storage Error - Error indicating failures when reading and writing data from/to a
Storage
.
Traits§
- Index
Struct - A specialized Struct factory producing Index items. Used primarily by the MultiVector/MultiArrayView.
- NoOverlap
- Marks structs that can be used stand-alone, e.g. no range
- Overlap
- Marks structs that cannot be used stand-alone, e.g. no range
- Resource
Storage - Hierarchical Resource Storage
- Slice
Ext - Enhanced slices of flatdata Structs so that they can be created from bytes / converted to bytes Note: TryFrom/AsRef cannot be used, since slice is a foreign type
- Struct
- A factory trait used to bind lifetime to Ref implementations.
- Variadic
Index - A type used as element of ‘MultiArrayView’.
- Variadic
Ref - A type used as element of
MultiArrayView
. - Variadic
RefFactory - Shortcut trait for VariadicStructs that are able to produce references of any given lifetime
- Variadic
Struct - A type used to create VariadicStructs.
Functions§
- generate
- A helper function wrapping the flatdata generator.
Type Aliases§
- Storage
Handle - A handle to a resource storage used by archives
- Type
Index - Index specifying a variadic type of
MultiArrayView
.