intern_arc/lib.rs
1/*
2 * Copyright 2021 Actyx AG
3 *
4 * Licensed under the Apache License, Version 2.0 (the "License");
5 * you may not use this file except in compliance with the License.
6 * You may obtain a copy of the License at
7 *
8 * http://www.apache.org/licenses/LICENSE-2.0
9 *
10 * Unless required by applicable law or agreed to in writing, software
11 * distributed under the License is distributed on an "AS IS" BASIS,
12 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 * See the License for the specific language governing permissions and
14 * limitations under the License.
15 */
16//! Interning library based on atomic reference counting
17//!
18//! This library differs from [`arc-interner`](https://crates.io/crates/arc-interner)
19//! (which served as initial inspiration) in that
20//!
21//! - interners must be created and can be dropped (for convenience functions see below)
22//! - it does not dispatch based on [`TypeId`](https://doc.rust-lang.org/std/any/struct.TypeId.html)
23//! (each interner is for exactly one type)
24//! - it offers both [`Hash`](https://doc.rust-lang.org/std/hash/trait.Hash.html)-based
25//! and [`Ord`](https://doc.rust-lang.org/std/cmp/trait.Ord.html)-based storage
26//! - it handles unsized types without overhead, so you should inline
27//! [`str`](https://doc.rust-lang.org/std/primitive.str.html) instead of
28//! [`String`](https://doc.rust-lang.org/std/string/struct.String.html)
29//!
30//! Unfortunately, this combination of features makes it inevitable to use unsafe Rust.
31//! The construction of unsized values has been adapted from the standard library’s
32//! [`Arc`](https://doc.rust-lang.org/std/sync/struct.Arc.html) type; reference counting
33//! employs a [`parking_lot`](https://docs.rs/parking_lot) mutex since it must also ensure
34//! consistent access to and usage of the cleanup function that will remove the value from its interner.
35//! The test suite passes also under [miri](https://github.com/rust-lang/miri) to
36//! check against some classes of undefined behavior in the unsafe code (including memory leaks).
37//!
38//! ## API flavors
39//!
40//! The same API is provided in two flavors:
41//!
42//! - [`HashInterner`](struct.HashInterner.html) uses hash-based storage, namely the standard
43//! [`HashSet`](https://doc.rust-lang.org/std/collections/struct.HashSet.html)
44//! - [`OrdInterner`](struct.OrdInterner.html) uses ord-based storage, namely the standard
45//! [`BTreeSet`](https://doc.rust-lang.org/std/collections/struct.BTreeSet.html)
46//!
47//! Interning small values takes of the order of 100–200ns on a typical server CPU. The ord-based
48//! storage has an advantage when interning large values (like slices greater than 1kB). The
49//! hash-based storage has an advantage when keeping lots of values (many thousands and up)
50//! interned at the same time.
51//!
52//! Nothing beats your own benchmarking, though.
53//!
54//! ## Convenience access to interners
55//!
56//! When employing interning for example within a dedicated deserialiser thread, it is best to create
57//! and locally use an interner, avoiding further synchronisation overhead. You can also store interners
58//! in thread-local variables if you only care about deduplication per thread.
59//!
60//! That said, this crate also provides convenience functions based on a global type-indexed pool:
61//!
62//! ```
63//! use intern_arc::{global::hash_interner};
64//!
65//! let i1 = hash_interner().intern_ref("hello"); // -> InternedHash<str>
66//! let i2 = hash_interner().intern_sized(vec![1, 2, 3]); // -> InternedHash<Vec<i32>>
67//! # use std::any::{Any, TypeId}; // using TypeId to avoid influencing type interence
68//! # use intern_arc::InternedHash;
69//! # assert_eq!(i1.type_id(), TypeId::of::<InternedHash<str>>());
70//! # assert_eq!(i2.type_id(), TypeId::of::<InternedHash<Vec<i32>>>());
71//! ```
72//!
73//! You can also use the type-indexed pool yourself to control its lifetime:
74//!
75//! ```
76//! use intern_arc::{global::OrdInternerPool, InternedOrd};
77//!
78//! let mut pool = OrdInternerPool::new();
79//! let i: InternedOrd<[u8]> = pool.get_or_create().intern_box(vec![1, 2, 3].into());
80//! ```
81#![doc(html_logo_url = "https://developer.actyx.com/img/logo.svg")]
82#![doc(html_favicon_url = "https://developer.actyx.com/img/favicon.ico")]
83#![allow(unexpected_cfgs)]
84
85pub mod global;
86mod hash;
87mod ref_count;
88mod tree;
89
90pub use hash::{HashInterner, InternedHash};
91pub use tree::{InternedOrd, OrdInterner};
92
93#[cfg(loom)]
94mod loom {
95 pub use ::loom::{
96 alloc::{alloc, dealloc, Layout},
97 sync::{
98 atomic::{spin_loop_hint, AtomicPtr, AtomicUsize, Ordering::*},
99 MutexGuard, RwLock,
100 },
101 thread::{current, yield_now},
102 };
103
104 pub struct Mutex<T>(::loom::sync::Mutex<T>);
105 impl<T> Mutex<T> {
106 pub fn new(t: T) -> Self {
107 Self(::loom::sync::Mutex::new(t))
108 }
109 pub fn lock(&self) -> MutexGuard<'_, T> {
110 self.0.lock().unwrap()
111 }
112 }
113}
114
115#[cfg(not(loom))]
116#[allow(unused_imports, deprecated)]
117mod loom {
118 pub use parking_lot::{Mutex, MutexGuard, RwLock, RwLockUpgradableReadGuard};
119 pub use std::{
120 alloc::{alloc, dealloc, Layout},
121 sync::atomic::{spin_loop_hint, AtomicPtr, AtomicUsize, Ordering::*},
122 thread::{current, yield_now},
123 };
124}