sp_im/lib.rs
1// This Source Code Form is subject to the terms of the Mozilla Public
2// License, v. 2.0. If a copy of the MPL was not distributed with this
3// file, You can obtain one at http://mozilla.org/MPL/2.0/.
4
5//! # Immutable Data Structures for Rust
6//!
7//! This library implements several of the more commonly useful immutable data
8//! structures for Rust.
9//!
10//! ## What are immutable data structures?
11//!
12//! Immutable data structures are data structures which can be copied and
13//! modified efficiently without altering the original. The most uncomplicated
14//! example of this is the venerable [cons list][cons-list]. This crate offers a
15//! selection of more modern and flexible data structures with similar
16//! properties, tuned for the needs of Rust developers.
17//!
18//! Briefly, the following data structures are provided:
19//!
20//! * [Vectors][vector::Vector] based on [RRB trees][rrb-tree]
21//! * [Ordered maps][ordmap::OrdMap]/[sets][ordset::OrdSet] based on
22//!   [B-trees][b-tree]
23//!
24//! ## Why Would I Want This?
25//!
26//! While immutable data structures can be a game changer for other
27//! programming languages, the most obvious benefit - avoiding the
28//! accidental mutation of data - is already handled so well by Rust's
29//! type system that it's just not something a Rust programmer needs
30//! to worry about even when using data structures that would send a
31//! conscientious Clojure programmer into a panic.
32//!
33//! Immutable data structures offer other benefits, though, some of
34//! which are useful even in a language like Rust. The most prominent
35//! is *structural sharing*, which means that if two data structures
36//! are mostly copies of each other, most of the memory they take up
37//! will be shared between them. This implies that making copies of an
38//! immutable data structure is cheap: it's really only a matter of
39//! copying a pointer and increasing a reference counter, where in the
40//! case of [`Vec`][std::vec::Vec] you have to allocate the same
41//! amount of memory all over again and make a copy of every element
42//! it contains. For immutable data structures, extra memory isn't
43//! allocated until you modify either the copy or the original, and
44//! then only the memory needed to record the difference.
45//!
46//! Another goal of this library has been the idea that you shouldn't
47//! even have to think about what data structure to use in any given
48//! situation, until the point where you need to start worrying about
49//! optimisation - which, in practice, often never comes. Beyond the
50//! shape of your data (ie. whether to use a list or a map), it should
51//! be fine not to think too carefully about data structures - you can
52//! just pick the one that has the right shape and it should have
53//! acceptable performance characteristics for every operation you
54//! might need. Specialised data structures will always be faster at
55//! what they've been specialised for, but `im` aims to provide the
56//! data structures which deliver the least chance of accidentally
57//! using them for the wrong thing.
58//!
59//! For instance, [`Vec`][std::vec::Vec] beats everything at memory
60//! usage, indexing and operations that happen at the back of the
61//! list, but is terrible at insertion and removal, and gets worse the
62//! closer to the front of the list you get.
63//! [`VecDeque`][std::collections::VecDeque] adds a little bit of
64//! complexity in order to make operations at the front as efficient
65//! as operations at the back, but is still bad at insertion and
66//! especially concatenation. [`Vector`][vector::Vector] adds another
67//! bit of complexity, and could never match [`Vec`][std::vec::Vec] at
68//! what it's best at, but in return every operation you can throw at
69//! it can be completed in a reasonable amount of time - even normally
70//! expensive operations like copying and especially concatenation are
71//! reasonably cheap when using a [`Vector`][vector::Vector].
72//!
73//! It should be noted, however, that because of its simplicity,
74//! [`Vec`][std::vec::Vec] actually beats [`Vector`][vector::Vector] even at its
75//! strongest operations at small sizes, just because modern CPUs are
76//! hyperoptimised for things like copying small chunks of contiguous memory -
77//! you actually need to go past a certain size (usually in the vicinity of
78//! several hundred elements) before you get to the point where
79//! [`Vec`][std::vec::Vec] isn't always going to be the fastest choice.
80//! [`Vector`][vector::Vector] attempts to overcome this by actually just being
81//! an array at very small sizes, and being able to switch efficiently to the
82//! full data structure when it grows large enough. Thus,
83//! [`Vector`][vector::Vector] will actually be equivalent to
84//! [Vec][std::vec::Vec] until it grows past the size of a single chunk.
85//!
86//! The map [`OrdMap`][ordmap::OrdMap] - generally performs similarly its
87//! equivalents in the standard library, but tend to run a bit slower
88//! on the basic operations ([`OrdMap`][ordmap::OrdMap] currently tends to run
89//! 2-3x slower). On the other hand, they offer the cheap copy and structural
90//! sharing between copies that you'd expect from immutable data structures.
91//!
92//! In conclusion, the aim of this library is to provide a safe
93//! default choice for the most common kinds of data structures,
94//! allowing you to defer careful thinking about the right data
95//! structure for the job until you need to start looking for
96//! optimisations - and you may find, especially for larger data sets,
97//! that immutable data structures are still the right choice.
98//!
99//! ## Values
100//!
101//! Because we need to make copies of shared nodes in these data structures
102//! before updating them, the values you store in them must implement
103//! [`Clone`][std::clone::Clone].  For primitive values that implement
104//! [`Copy`][std::marker::Copy], such as numbers, everything is fine: this is
105//! the case for which the data structures are optimised, and performance is
106//! going to be great.
107//!
108//! On the other hand, if you want to store values for which cloning is
109//! expensive, or values that don't implement [`Clone`][std::clone::Clone], you
110//! need to wrap them in [`Rc`][std::rc::Rc] or [`Arc`][std::sync::Arc]. Thus,
111//! if you have a complex structure `BigBlobOfData` and you want to store a list
112//! of them as a `Vector<BigBlobOfData>`, you should instead use a
113//! `Vector<Rc<BigBlobOfData>>`, which is going to save you not only the time
114//! spent cloning the big blobs of data, but also the memory spent keeping
115//! multiple copies of it around, as [`Rc`][std::rc::Rc] keeps a single
116//! reference counted copy around instead.
117//!
118//! If you're storing smaller values that aren't
119//! [`Copy`][std::marker::Copy]able, you'll need to exercise judgement: if your
120//! values are going to be very cheap to clone, as would be the case for short
121//! [`String`][std::string::String]s or small [`Vec`][std::vec::Vec]s, you're
122//! probably better off storing them directly without wrapping them in an
123//! [`Rc`][std::rc::Rc], because, like the [`Rc`][std::rc::Rc], they're just
124//! pointers to some data on the heap, and that data isn't expensive to clone -
125//! you might actually lose more performance from the extra redirection of
126//! wrapping them in an [`Rc`][std::rc::Rc] than you would from occasionally
127//! cloning them.
128//!
129//! ### When does cloning happen?
130//!
131//! So when will your values actually be cloned? The easy answer is only if you
132//! [`clone`][std::clone::Clone::clone] the data structure itself, and then only
133//! lazily as you change it. Values are stored in tree nodes inside the data
134//! structure, each node of which contains up to 64 values. When you
135//! [`clone`][std::clone::Clone::clone] a data structure, nothing is actually
136//! copied - it's just the reference count on the root node that's incremented,
137//! to indicate that it's shared between two data structures. It's only when you
138//! actually modify one of the shared data structures that nodes are cloned:
139//! when you make a change somewhere in the tree, the node containing the change
140//! needs to be cloned, and then its parent nodes need to be updated to contain
141//! the new child node instead of the old version, and so they're cloned as
142//! well.
143//!
144//! We can call this "lazy" cloning - if you make two copies of a data structure
145//! and you never change either of them, there's never any need to clone the
146//! data they contain. It's only when you start making changes that cloning
147//! starts to happen, and then only on the specific tree nodes that are part of
148//! the change. Note that the implications of lazily cloning the data structure
149//! extend to memory usage as well as the CPU workload of copying the data
150//! around - cloning an immutable data structure means both copies share the
151//! same allocated memory, until you start making changes.
152//!
153//! Most crucially, if you never clone the data structure, the data inside it is
154//! also never cloned, and in this case it acts just like a mutable data
155//! structure, with minimal performance differences (but still non-zero, as we
156//! still have to check for shared nodes).
157//!
158//! ## Data Structures
159//!
160//! We'll attempt to provide a comprehensive guide to the available
161//! data structures below.
162//!
163//! ### Performance Notes
164//!
165//! "Big O notation" is the standard way of talking about the time
166//! complexity of data structure operations. If you're not familiar
167//! with big O notation, here's a quick cheat sheet:
168//!
169//! *O(1)* means an operation runs in constant time: it will take the
170//! same time to complete regardless of the size of the data
171//! structure.
172//!
173//! *O(n)* means an operation runs in linear time: if you double the
174//! size of your data structure, the operation will take twice as long
175//! to complete; if you quadruple the size, it will take four times as
176//! long, etc.
177//!
178//! *O(log n)* means an operation runs in logarithmic time: for
179//! *log<sub>2</sub>*, if you double the size of your data structure,
180//! the operation will take one step longer to complete; if you
181//! quadruple the size, it will need two steps more; and so on.
182//! However, the data structures in this library generally run in
183//! *log<sub>64</sub>* time, meaning you have to make your data
184//! structure 64 times bigger to need one extra step, and 4096 times
185//! bigger to need two steps. This means that, while they still count
186//! as O(log n), operations on all but really large data sets will run
187//! at near enough to O(1) that you won't usually notice.
188//!
189//! *O(n log n)* is the most expensive operation you'll see in this
190//! library: it means that for every one of the *n* elements in your
191//! data structure, you have to perform *log n* operations. In our
192//! case, as noted above, this is often close enough to O(n) that it's
193//! not usually as bad as it sounds, but even O(n) isn't cheap and the
194//! cost still increases logarithmically, if slowly, as the size of
195//! your data increases. O(n log n) basically means "are you sure you
196//! need to do this?"
197//!
198//! *O(1)** means 'amortised O(1),' which means that an operation
199//! usually runs in constant time but will occasionally be more
200//! expensive: for instance,
201//! [`Vector::push_back`][vector::Vector::push_back], if called in
202//! sequence, will be O(1) most of the time but every 64th time it
203//! will be O(log n), as it fills up its tail chunk and needs to
204//! insert it into the tree. Please note that the O(1) with the
205//! asterisk attached is not a common notation; it's just a convention
206//! I've used in these docs to save myself from having to type
207//! 'amortised' everywhere.
208//!
209//! ### Lists
210//!
211//! Lists are sequences of single elements which maintain the order in
212//! which you inserted them. The only list in this library is
213//! [`Vector`][vector::Vector], which offers the best all round
214//! performance characteristics: it's pretty good at everything, even
215//! if there's always another kind of list that's better at something.
216//!
217//! | Type | Algorithm | Constraints | Order | Push | Pop | Split | Append |
218//! Lookup | | --- | --- | --- | --- | --- | --- | --- | --- | --- |
219//! | [`Vector<A>`][vector::Vector] | [RRB tree][rrb-tree] |
220//! [`Clone`][std::clone::Clone] | insertion | O(1)\* | O(1)\* | O(log n) |
221//! O(log n) | O(log n) |
222//!
223//! ### Maps
224//!
225//! Maps are mappings of keys to values, where the most common read
226//! operation is to find the value associated with a given key. Maps
227//! may or may not have a defined order. Any given key can only occur
228//! once inside a map, and setting a key to a different value will
229//! overwrite the previous value.
230//!
231//! | Type | Algorithm | Key Constraints | Order | Insert | Remove | Lookup |
232//! | --- | --- | --- | --- | --- | --- | --- |
233//! | [`OrdMap<K, V>`][ordmap::OrdMap] | [B-tree][b-tree] |
234//! [`Clone`][std::clone::Clone] + [`Ord`][std::cmp::Ord] | sorted | O(log n) |
235//! O(log n) | O(log n) |
236//!
237//! ### Sets
238//!
239//! Sets are collections of unique values, and may or may not have a
240//! defined order. Their crucial property is that any given value can
241//! only exist once in a given set.
242//!
243//! | Type | Algorithm | Constraints | Order | Insert | Remove | Lookup |
244//! | --- | --- | --- | --- | --- | --- | --- |
245//! | [`OrdSet<A>`][ordset::OrdSet] | [B-tree][b-tree] |
246//! [`Clone`][std::clone::Clone] + [`Ord`][std::cmp::Ord] | sorted | O(log n) |
247//! O(log n) | O(log n) |
248//!
249//! ## In-place Mutation
250//!
251//! All of these data structures support in-place copy-on-write
252//! mutation, which means that if you're the sole user of a data
253//! structure, you can update it in place without taking the
254//! performance hit of making a copy of the data structure before
255//! modifying it (this is about an order of magnitude faster than
256//! immutable operations, almost as fast as
257//! [`std::collections`][std::collections]'s mutable data structures).
258//!
259//! Thanks to [`Rc`][std::rc::Rc]'s reference counting, we are able to
260//! determine whether a node in a data structure is being shared with
261//! other data structures, or whether it's safe to mutate it in place.
262//! When it's shared, we'll automatically make a copy of the node
263//! before modifying it. The consequence of this is that cloning a
264//! data structure becomes a lazy operation: the initial clone is
265//! instant, and as you modify the cloned data structure it will clone
266//! chunks only where you change them, so that if you change the
267//! entire thing you will eventually have performed a full clone.
268//!
269//! This also gives us a couple of other optimisations for free:
270//! implementations of immutable data structures in other languages
271//! often have the idea of local mutation, like Clojure's transients
272//! or Haskell's `ST` monad - a managed scope where you can treat an
273//! immutable data structure like a mutable one, gaining a
274//! considerable amount of performance because you no longer need to
275//! copy your changed nodes for every operation, just the first time
276//! you hit a node that's sharing structure. In Rust, we don't need to
277//! think about this kind of managed scope, it's all taken care of
278//! behind the scenes because of our low level access to the garbage
279//! collector (which, in our case, is just a simple
280//! [`Rc`][std::rc::Rc]).
281//!
282//! ## Thread Safety
283//!
284//! The data structures in the `im` crate are thread safe, through
285//! [`Arc`][std::sync::Arc]. This comes with a slight performance impact, so
286//! that if you prioritise speed over thread safety, you may want to use the
287//! `im-rc` crate instead, which is identical to `im` except that it uses
288//! [`Rc`][std::rc::Rc] instead of [`Arc`][std::sync::Arc], implying that the
289//! data structures in `im-rc` do not implement [`Send`][std::marker::Send] and
290//! [`Sync`][std::marker::Sync]. This yields approximately a 20-25% increase in
291//! general performance.
292//!
293//! ## Feature Flags
294//!
295//! `im` comes with optional support for the following crates through Cargo
296//! feature flags. You can enable them in your `Cargo.toml` file like this:
297//!
298//! ```no_compile
299//! [dependencies]
300//! im = { version = "*", features = ["proptest", "serde"] }
301//! ```
302//!
303//! | Feature | Description |
304//! | ------- | ----------- |
305//! | [`proptest`](https://crates.io/crates/proptest) | Strategies for all `im` datatypes under a `proptest` namespace, eg. `sp_im::vector::proptest::vector()` |
306//! | [`quickcheck`](https://crates.io/crates/quickcheck) | [`quickcheck::Arbitrary`](https://docs.rs/quickcheck/latest/quickcheck/trait.Arbitrary.html) implementations for all `im` datatypes (not available in `im-rc`) |
307//! | [`rayon`](https://crates.io/crates/rayon) | parallel iterator implementations for [`Vector`][vector::Vector] (not available in `im-rc`) |
308//! | [`serde`](https://crates.io/crates/serde) | [`Serialize`](https://docs.rs/serde/latest/serde/trait.Serialize.html) and [`Deserialize`](https://docs.rs/serde/latest/serde/trait.Deserialize.html) implementations for all `im` datatypes |
309//! | [`arbitrary`](https://crates.io/crates/arbitrary/) | [`arbitrary::Arbitrary`](https://docs.rs/arbitrary/latest/arbitrary/trait.Arbitrary.html) implementations for all `im` datatypes |
310//!
311//! [std::collections]: https://doc.rust-lang.org/std/collections/index.html
312//! [std::collections::VecDeque]: https://doc.rust-lang.org/std/collections/struct.VecDeque.html
313//! [std::vec::Vec]: https://doc.rust-lang.org/std/vec/struct.Vec.html
314//! [std::string::String]: https://doc.rust-lang.org/std/string/struct.String.html
315//! [std::rc::Rc]: https://doc.rust-lang.org/std/rc/struct.Rc.html
316//! [std::sync::Arc]: https://doc.rust-lang.org/std/sync/struct.Arc.html
317//! [std::cmp::Eq]: https://doc.rust-lang.org/std/cmp/trait.Eq.html
318//! [std::cmp::Ord]: https://doc.rust-lang.org/std/cmp/trait.Ord.html
319//! [std::clone::Clone]: https://doc.rust-lang.org/std/clone/trait.Clone.html
320//! [std::clone::Clone::clone]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone
321//! [std::marker::Copy]: https://doc.rust-lang.org/std/marker/trait.Copy.html
322//! [std::marker::Send]: https://doc.rust-lang.org/std/marker/trait.Send.html
323//! [std::marker::Sync]: https://doc.rust-lang.org/std/marker/trait.Sync.html
324//! [ordmap::OrdMap]: ./struct.OrdMap.html
325//! [ordset::OrdSet]: ./struct.OrdSet.html
326//! [vector::Vector]: ./struct.Vector.html
327//! [vector::Vector::push_back]: ./vector/enum.Vector.html#method.push_back
328//! [rrb-tree]: https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf
329//! [b-tree]: https://en.wikipedia.org/wiki/B-tree
330//! [cons-list]: https://en.wikipedia.org/wiki/Cons#Lists
331
332#![forbid(rust_2018_idioms)]
333#![deny(unsafe_code, nonstandard_style)]
334#![warn(unreachable_pub, missing_docs)]
335#![cfg_attr(has_specialisation, feature(specialization))]
336#![cfg_attr(not(any(feature = "std", test)), no_std)]
337
338#[macro_use]
339extern crate alloc;
340
341#[cfg(test)]
342#[macro_use]
343extern crate pretty_assertions;
344
345mod config;
346mod nodes;
347mod sort;
348mod sync;
349
350#[macro_use]
351mod util;
352
353#[macro_use]
354mod ord;
355pub use crate::ord::{
356  map as ordmap,
357  set as ordset,
358};
359
360#[macro_use]
361pub mod vector;
362
363pub mod iter;
364
365#[macro_use]
366pub mod conslist;
367
368pub mod shared;
369
370#[cfg(any(test, feature = "serde"))]
371#[doc(hidden)]
372pub mod ser;
373
374#[cfg(test)]
375#[doc(hidden)]
376pub mod arbitrary;
377
378#[cfg(all(threadsafe))]
379#[doc(hidden)]
380pub mod quickcheck;
381
382#[cfg(test)]
383#[macro_use]
384extern crate quickcheck;
385
386#[cfg(any(threadsafe, not(feature = "pool")))]
387mod fakepool;
388
389#[doc(inline)]
390pub use crate::vector::Vector;
391pub use crate::{
392  ordmap::OrdMap,
393  ordset::OrdSet,
394};
395
396#[cfg(test)]
397mod test;
398
399#[cfg(test)]
400mod tests;
401
402/// Update a value inside multiple levels of data structures.
403///
404/// This macro takes a [`Vector`][Vector], [`OrdMap`][OrdMap] or, a key or a
405/// series of keys, and a value, and returns the data structure with the new
406/// value at the location described by the keys.
407///
408/// If one of the keys in the path doesn't exist, the macro will panic.
409///
410/// # Examples
411///
412/// ```
413/// # #[macro_use] extern crate sp_im;
414/// # use std::sync::Arc;
415/// # fn main() {
416/// let vec_inside_vec = vector![vector![1, 2, 3], vector![4, 5, 6]];
417///
418/// let expected = vector![vector![1, 2, 3], vector![4, 5, 1337]];
419///
420/// assert_eq!(expected, update_in![vec_inside_vec, 1 => 2, 1337]);
421/// # }
422/// ```
423///
424/// [Vector]: ../vector/enum.Vector.html
425/// [OrdMap]: ../ordmap/struct.OrdMap.html
426#[macro_export]
427macro_rules! update_in {
428    ($target:expr, $path:expr => $($tail:tt) => *, $value:expr ) => {{
429        let inner = $target.get($path).expect("update_in! macro: key not found in target");
430        $target.update($path, update_in!(inner, $($tail) => *, $value))
431    }};
432
433    ($target:expr, $path:expr, $value:expr) => {
434        $target.update($path, $value)
435    };
436}
437
438/// Get a value inside multiple levels of data structures.
439///
440/// This macro takes a [`Vector`][Vector], [`OrdMap`][OrdMap] along with a key
441/// or a series of keys, and returns the value at the location inside the data
442/// structure described by the key sequence, or `None` if any of the keys didn't
443/// exist.
444///
445/// # Examples
446///
447/// ```
448/// # #[macro_use] extern crate sp_im;
449/// # use std::sync::Arc;
450/// # fn main() {
451/// let vec_inside_vec = vector![vector![1, 2, 3], vector![4, 5, 6]];
452///
453/// assert_eq!(Some(&6), get_in![vec_inside_vec, 1 => 2]);
454/// # }
455/// ```
456///
457/// [Vector]: ../vector/enum.Vector.html
458/// [OrdMap]: ../ordmap/struct.OrdMap.html
459#[macro_export]
460macro_rules! get_in {
461    ($target:expr, $path:expr => $($tail:tt) => * ) => {{
462        $target.get($path).and_then(|v| get_in!(v, $($tail) => *))
463    }};
464
465    ($target:expr, $path:expr) => {
466        $target.get($path)
467    };
468}
469
470#[cfg(test)]
471mod lib_test {
472  #[test]
473  fn update_in() {
474    let vector = vector![1, 2, 3, 4, 5];
475    assert_eq!(vector![1, 2, 23, 4, 5], update_in!(vector, 2, 23));
476    let ordmap = ordmap![1 => 1, 2 => 2, 3 => 3];
477    assert_eq!(ordmap![1 => 1, 2 => 23, 3 => 3], update_in!(ordmap, 2, 23));
478
479    let vecs = vector![vector![1, 2, 3], vector![4, 5, 6], vector![7, 8, 9]];
480    let vecs_target =
481      vector![vector![1, 2, 3], vector![4, 5, 23], vector![7, 8, 9]];
482    assert_eq!(vecs_target, update_in!(vecs, 1 => 2, 23));
483  }
484
485  #[test]
486  fn get_in() {
487    let vector = vector![1, 2, 3, 4, 5];
488    assert_eq!(Some(&3), get_in!(vector, 2));
489    let ordmap = ordmap![1 => 1, 2 => 2, 3 => 3];
490    assert_eq!(Some(&2), get_in!(ordmap, &2));
491
492    let vecs = vector![vector![1, 2, 3], vector![4, 5, 6], vector![7, 8, 9]];
493    assert_eq!(Some(&6), get_in!(vecs, 1 => 2));
494  }
495}