Skip to main content

Crate embed_collections

Crate embed_collections 

Source
Expand description

§embed-collections

A collection of memory efficient data structures, for embedding environment and server applications that need tight memory management.

This crate provides two categories of modules:

  • Cache efficient collections:

    • const_vec: Fixed capacity inline vec
    • seg_list: A cache aware (short-live) list to store elements with adaptive size segments
    • various: A short-live list wrapping SegList with Option<T> to delay allocation.
    • btree: A cache aware B+tree for lone-live and large dataset, optimized for numeric types, with special entry API allows peeking adjacent values.
    • various_map: A short-live map wrapping BTreeMap (std) with Option<(K, V)> to delay allocation.
  • Intrusive collections:

    • Provide Pointer SmartPointer trait for smart pointer types: owned (Box), multiple ownership (Arc, Rc, Irc), raw pointers (NonNull<T>, *const T, *mut T)
    • Irc: Intrusive ref counter.
    • dlist: Intrusive Doubly Linked List (Queue / Stack).
    • slist: Intrusive Singly Linked List ( Queue / stack).
    • slist_owned: An intrusive slist but with safe and more compact interface
    • avl: Intrusive AVL Tree (Balanced Binary Search Tree), port to rust from ZFS

§SegList & Various

SegList and Various is designed for parameter passing.

More CPU-cache friendly compared to LinkedList. And because it does not re-allocate, it’s faster than Vec::push() when the number of elements is small. It’s nice to the memory allocator (always allocate with fixed size segment).

Benchmark: append + drain (x86_64, cache line 128 bytes):

(platform: intel i7-8550U)

ElementsSegListVecSegList vs Vec
1040.5 ns147.0 ns3.6x faster
3299.1 ns237.8 ns2.4x faster
100471.1 ns464.0 ns~1.0x
5002.77 µs895.5 ns3.1x slower

§B+tree

We provide a BTreeMap for single-threaded long-term in-memory storage. It’s a cache aware b+tree:

  • Nodes are filled up in 4 cache lines (256 bytes on x86_64).
    • Capacity in compile-time determined according to the size of Key, Value.
    • Reduce memory fragmentation by alignment.
  • Optimised for numeric key
    • Respecting numeric space for sequential insertion.
    • Reduce latency for sequential insertion.
  • Faster iteration and teardown
  • Limitation:
    • K should have clone (for propagate into the InterNode during split)
    • K & V should <= CACHE_LINE_SIZE - 16
      • It make sure InterNode can hold at least two children.
      • If K & V is large you should put into Box, for room saving, and for the speed to move value
  • Special API:
    • Peak and move to previous/next Entry (for modification).
    • Alter key of an OccupiedEntry.
    • Batch remove with range.
    • Movable Cursor (for readonly)

Compared to std::collections::btree (as of rust 1.94):

  • The std impl is pure btree (not b+tree) without horizontal links. Each key store only once at either leaf and inter nodes.
  • The std impl is optimised for point lookup.
  • The std impl has fixed Cap=11, node size varies according to T. (For T=U64, size is 288B for InterNode and 192B for LeafNode)
  • The std cursor API is still unstable (as of 1.94) and relatively complex to use.

benchmark

platform: intel i7-8550U, key: u32, value: u32, rust 1.92.

Measured in million ops for different size of dataset:

insert_seqbtreestd
1k88.95620.001
10k75.29116.04
100k45.95911.207
insert_randbtreestdavl(box)avl(arc)
1k21.31117.79211.1729.5397
10k14.26811.5876.36695.651
100k5.48143.06910.780.732
get_seqbtreestd
1k59.44834.248
10k37.22527.571
100k30.7719.907
get_randbtreestdavl(box)avl(arc)
1k47.3327.65124.25423.466
10k19.35816.86811.77110.806
100k5.25843.25691.44231.2712
remove_randbtreestd
1k20.96515.968
10k16.07311.701
100k5.02143.0724
iterbtreestd
1k1342.8346.8
10k1209.4303.83
100k152.5751.147
into_iterbtreestd
1k396.07143.81
10k397.0581.389
100k360.1856.742

§Intrusive Collections

intrusive collection is often used in c/c++ code, they does not need extra allocation. But the disadvantages includes: complexity to write, bad for cache hit when the node is too small

There’re three usage scenarios:

  1. Push smart pointer to the list, so that the list hold 1 ref count when the type is Arc / Rc, but you have to use UnsafeCell for internal mutation.

  2. Push Box to the list, the list own the items until they are popped, it’s better than std LinkedList because no additional allocation is needed.

  3. Push raw pointer (better use NonNull instead of *const T for smaller footprint) to the list, for temporary usage. You must ensure the list item not dropped be other refcount (for example, the item is holding by Arc in other structure).

§Difference to intrusive-collections crate

This crate choose to use trait instead of c like offset_of!, mainly because:

  • Mangling with offset conversion makes the code hard to read (for people not used to c style coding).

  • You don’t have to understand some complex macro style.

  • It’s dangerous to use pointer offset conversion when the embedded Node not perfectly aligned, and using memtion to return the node ref is more safer approach. For example, the default repr(Rust) might reorder the field, or you mistakenly use repr(packed).

use embed_collections::{dlist::{DLinkedList, DListItem, DListNode}, Pointer};
use std::cell::UnsafeCell;
use std::sync::Arc;

// The tag structure only for labeling, distinguish two lists
struct CacheTag;
struct IOTag;

struct MyItem {
    val: i32,
    cache_link: UnsafeCell<DListNode<MyItem, CacheTag>>,
    io_link: UnsafeCell<DListNode<MyItem, IOTag>>,
}

impl MyItem {
    fn new(val: i32) -> Self {
        Self {
            val,
            cache_link: UnsafeCell::new(DListNode::default()),
            io_link: UnsafeCell::new(DListNode::default()),
        }
    }
}

unsafe impl DListItem<CacheTag> for MyItem {
    fn get_node(&self) -> &mut DListNode<Self, CacheTag> {
        unsafe { &mut *self.cache_link.get() }
    }
}

unsafe impl DListItem<IOTag> for MyItem {
    fn get_node(&self) -> &mut DListNode<Self, IOTag> {
        unsafe { &mut *self.io_link.get() }
    }
}

let mut cache_list = DLinkedList::<Arc<MyItem>, CacheTag>::new();
let mut io_list = DLinkedList::<Arc<MyItem>, IOTag>::new();

let item1 = Arc::new(MyItem::new(10));
let item2 = Arc::new(MyItem::new(20));

// Push the same item to both lists
cache_list.push_back(item1.clone());
io_list.push_back(item1.clone());

cache_list.push_back(item2.clone());
io_list.push_back(item2.clone());

assert_eq!(cache_list.pop_front().unwrap().val, 10);
assert_eq!(io_list.pop_front().unwrap().val, 10);
assert_eq!(cache_list.pop_front().unwrap().val, 20);
assert_eq!(io_list.pop_front().unwrap().val, 20);

§Feature Flags

  • default: Enabled by default. Includes the std, various, seglist features.
  • full: Includes everything but std
  • std: Enables integration with the Rust standard library, including the println! macro for debugging. Disabling this feature enables no_std compilation.
  • avl: Enables the avl module.
  • btree: Enable the btree module.
  • dlist: Enables the doubly linked list (dlist) module.
  • irc: Enables the irc (intrusive ref count) module.
  • various: Enable various_map module
  • various + seglist: Enable various module
  • seglist: Enable seg_list module
  • slist: Enables the singly linked list (slist) and owned singly linked list (slist_owned) modules.

To compile with no_std and only the slist module, you would use: cargo build --no-default-features --features slist

Re-exports§

pub use const_vec::ConstVec;
pub use seg_list::SegList;seglist
pub use various::Various;various and seglist
pub use various_map::VariousMap;various
pub use btree::BTreeMap;btree

Modules§

avlavl
An intrusive AVL tree implementation.
btreebtree
B+Tree Map - A in memory cache-optimized B+Tree for single-threaded use.
const_vec
A fixed-size inline vector with const generic capacity.
dlistdlist
An intrusive doubly linked list implementation.
ircirc
Intrusive Reference Counter (Irc)
seg_listseglist
SegList : A segmented list with cache-friendly node sizes.
slistslist
An intrusive singly linked list implementation.
slist_ownedslist
An owned intrusive singly linked list.
variousvarious and seglist
Various: A small-list optimization container for efficient parameter passing.
various_mapvarious
VariousMap: Provide a tempoary map optimise for empty or one condition, to delay allocation.

Macros§

print_log
logging macro for development
trace_log
logging macro for development

Constants§

CACHE_LINE_SIZEx86-64 or AArch64 or ARM64EC or PowerPC64
Cache line size in bytes

Traits§

Pointer
Abstract pointer trait to support various pointer types in collections.
SmartPointer