sorted_iter/
lib.rs

1//! This crate provides set and relational operations for all iterators in the standard library that are known
2//! at compile time to be sorted.
3//!
4//! # Set operations
5//! ```
6//! # extern crate maplit;
7//! # use maplit::*;
8//! # extern crate sorted_iter;
9//! use sorted_iter::SortedIterator;
10//!
11//! let primes = btreeset! { 2, 3, 5, 7, 11, 13u64 }.into_iter();
12//! let fibs = btreeset! { 1, 2, 3, 5, 8, 13u64 }.into_iter();
13//! let fib_primes = primes.intersection(fibs);
14//! ```
15//!
16//! It is possible to efficiently define set operations on sorted iterators. Sorted iterators are
17//! very common in the standard library. E.g. the elements of a [BTreeSet] or the keys of a [BTreeMap]
18//! are guaranteed to be sorted according to the element order, as are iterable ranges like `0..100`.
19//!
20//! There are also a number of operations on iterators that preserve the sort order. E.g. if an
21//! iterator is sorted, [take], [take_while] etc. are going to result in a sorted iterator as well.
22//!
23//! Since the complete types of iterators are typically visible in rust, it is possible to encode these
24//! rules at type level. This is what this crate does.
25//!
26//! For available set operations, see [SortedIterator].
27//! For sorted iterators in the std lib, see instances for the [SortedByItem] marker trait.
28//!
29//! # Relational operations
30//! ```
31//! # extern crate maplit;
32//! # use maplit::*;
33//! # extern crate sorted_iter;
34//! use sorted_iter::SortedPairIterator;
35//!
36//! let cities = btreemap! {
37//!   1 => "New York",
38//!   2 => "Tokyo",
39//!   3u8 => "Berlin"
40//! }.into_iter();
41//! let countries = btreemap! {
42//!   1 => "USA",
43//!   2 => "Japan",
44//!   3u8 => "Germany"
45//! }.into_iter();
46//! let cities_and_countries = cities.join(countries);
47//! ```
48//!
49//! Iterators of pairs that are sorted according to the first element / key are also very common in
50//! the standard library and elsewhere. E.g. the elements of a [BTreeMap] are guaranteed to be sorted
51//! according to the key order.
52//!
53//! The same rules as for sorted iterators apply for preservation of the sort order, except that there
54//! are some additional operations that preserve sort order. Anything that only operates on the value,
55//! like e.g. map or filter_map on the value, is guaranteed to preserve the sort order.
56//!
57//! The operations that can be defined on sorted pair operations are the relational operations known
58//! from relational algebra / SQL, namely join, left_join, right_join and outer_join.
59//!
60//! For available relational operations, see [SortedPairIterator].
61//! For sorted iterators in the std lib, see instances the for [SortedByKey] marker trait.
62//!
63//! # Transformations that retain order are allowed
64//! ```
65//! # extern crate sorted_iter;
66//! use sorted_iter::*;
67//!
68//! let odd = (1..31).step_by(2);
69//! let multiples_of_3 = (3..30).step_by(3);
70//! let either = odd.union(multiples_of_3);
71//! ```
72//!
73//! # Transformations that can change the order lose the sorted property
74//! ```compile_fail
75//! # extern crate sorted_iter;
76//! use sorted_iter::*;
77//!
78//! // we have no idea what map does to the order. could be anything!
79//! let a = (1..31).map(|x| -x);
80//! let b = (3..30).step_by(3);
81//! let either = a.union(b); // does not compile!
82//! ```
83//!
84//! # Assuming sort ordering
85//!
86//! For most std lib iterators, this library already provides instances. But there will occasionally be an iterator
87//! from a third party library where you *know* that it is properly sorted.
88//!
89//! For this case, there is an escape hatch:
90//!
91//! ```
92//! // the assume_ extensions have to be implicitly imported
93//! use sorted_iter::*;
94//! use sorted_iter::assume::*;
95//! let odd = vec![1,3,5,7u8].into_iter().assume_sorted_by_item();
96//! let even = vec![2,4,6,8u8].into_iter().assume_sorted_by_item();
97//! let all = odd.union(even);
98//!
99//! let cities = vec![(1u8, "New York")].into_iter().assume_sorted_by_key();
100//! let countries = vec![(1u8, "USA")].into_iter().assume_sorted_by_key();
101//! let cities_and_countries = cities.join(countries);
102//! ```
103//!
104//! # Marking your own iterators
105//!
106//! If you have a library and want to mark some iterators as sorted, this is possible by implementing the
107//! appropriate marker trait, [SortedByItem] or [SortedByKey].
108//!
109//! ```
110//! # extern crate sorted_iter;
111//! // marker traits are not at top level, since usually you don't need them
112//! use sorted_iter::sorted_iterator::SortedByItem;
113//! use sorted_iter::sorted_pair_iterator::SortedByKey;
114//!
115//! pub struct MySortedIter<T> { whatever: T }
116//! pub struct MySortedPairIter<K, V> { whatever: (K, V) }
117//!
118//! impl<T> SortedByItem for MySortedIter<T> {}
119//! impl<K, V> SortedByKey for MySortedPairIter<K, V> {}
120//! ```
121//!
122//! By reexporting the extension traits, you get a seamless experience for people using your library.
123//!
124//! ```
125//! extern crate sorted_iter;
126//! pub use sorted_iter::{SortedIterator, SortedPairIterator};
127//! ```
128//!
129//! ## Tests
130//!
131//! Tests are done using the fantastic [quickcheck] crate, by comparing against the operations defined on
132//! [BTreeSet] and [BTreeMap].
133//!
134//! [SortedIterator]: trait.SortedIterator.html
135//! [SortedPairIterator]: trait.SortedPairIterator.html
136//! [SortedByItem]: sorted_iterator/trait.SortedByItem.html
137//! [SortedByKey]: sorted_pair_iterator/trait.SortedByKey.html
138//! [quickcheck]: https://github.com/BurntSushi/quickcheck
139//! [BTreeSet]: https://doc.rust-lang.org/std/collections/struct.BTreeSet.html
140//! [BTreeMap]: https://doc.rust-lang.org/std/collections/struct.BTreeMap.html
141//! [take]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.take
142//! [take_while]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.take_while
143//! [Ord]: https://doc.rust-lang.org/std/cmp/trait.Ord.html
144#[cfg(test)]
145extern crate quickcheck;
146
147#[cfg(test)]
148#[macro_use(quickcheck)]
149extern crate quickcheck_macros;
150
151use core::cmp::Ordering;
152
153pub mod one_or_less_iterator;
154pub mod sorted_iterator;
155pub mod sorted_pair_iterator;
156
157pub use one_or_less_iterator::OneOrLess;
158
159use crate::sorted_iterator::*;
160use crate::sorted_pair_iterator::*;
161
162#[deny(missing_docs)]
163
164/// set operations for iterators where the items are sorted according to the natural order
165pub trait SortedIterator: Iterator + Sized {
166    /// Visits the values representing the union, i.e., all the values in `self` or `other`,
167    /// without duplicates.
168    fn union<J>(self, other: J) -> Union<Self, J>
169    where
170        J: SortedIterator<Item = Self::Item>,
171    {
172        Union {
173            a: self.peekable(),
174            b: other.peekable(),
175        }
176    }
177
178    /// Visits the values representing the intersection, i.e., the values that are both in `self`
179    /// and `other`.
180    fn intersection<J>(self, other: J) -> Intersection<Self, J>
181    where
182        J: SortedIterator<Item = Self::Item>,
183    {
184        Intersection {
185            a: self,
186            b: other.peekable(),
187        }
188    }
189
190    /// Visits the values representing the difference, i.e., the values that are in `self` but not
191    /// in `other`.
192    fn difference<J>(self, other: J) -> Difference<Self, J>
193    where
194        J: SortedIterator<Item = Self::Item>,
195    {
196        Difference {
197            a: self,
198            b: other.peekable(),
199        }
200    }
201
202    /// Visits the values representing the symmetric difference, i.e., the values that are in
203    /// `self` or in `other` but not in both.
204    fn symmetric_difference<J>(self, other: J) -> SymmetricDifference<Self, J>
205    where
206        J: SortedIterator<Item = Self::Item>,
207    {
208        SymmetricDifference {
209            a: self.peekable(),
210            b: other.peekable(),
211        }
212    }
213
214    /// Creates an iterator that pairs each element of `self` with `()`. This transforms a
215    /// `SortedIterator` into a [`SortedPairIterator`].
216    fn pairs(self) -> Pairs<Self> {
217        Pairs { i: self }
218    }
219
220    /// Returns `true` if `self` has no elements in common with `other`. This is equivalent to
221    /// checking for an empty intersection.
222    fn is_disjoint<J>(self, mut other: J) -> bool
223    where
224        J: SortedIterator<Item = Self::Item>,
225        Self::Item: Ord,
226    {
227        let mut next_b = other.next();
228        'next_a: for a in self {
229            while let Some(b) = &next_b {
230                match a.cmp(b) {
231                    Ordering::Less => continue 'next_a,
232                    Ordering::Equal => return false,
233                    Ordering::Greater => next_b = other.next(),
234                }
235            }
236            break;
237        }
238        true
239    }
240
241    /// Returns `true` if this sorted iterator is a subset of another, i.e., `other` contains at
242    /// least all the values in `self`.
243    fn is_subset<J>(self, mut other: J) -> bool
244    where
245        J: SortedIterator<Item = Self::Item>,
246        Self::Item: Ord,
247    {
248        'next_a: for a in self {
249            while let Some(b) = other.next() {
250                match a.cmp(&b) {
251                    Ordering::Less => break,
252                    Ordering::Equal => continue 'next_a,
253                    Ordering::Greater => continue,
254                }
255            }
256            return false;
257        }
258        true
259    }
260
261    /// Returns `true` if this sorted iterator is a superset of another, i.e., `self` contains at
262    /// least all the values in `other`.
263    fn is_superset<J>(self, other: J) -> bool
264    where
265        J: SortedIterator<Item = Self::Item>,
266        Self::Item: Ord,
267    {
268        other.is_subset(self)
269    }
270}
271
272impl<I> SortedIterator for I where I: Iterator + SortedByItem {}
273
274/// Union of multiple sorted iterators.
275///
276/// An advantage of this function over multiple calls to `SortedIterator::union`
277/// is that the number of merged sequences does not need to be known at the
278/// compile time. The drawback lies in the fact that all iterators have to be
279/// of the same type.
280///
281/// The algorithmic complexity of fully consuming the resulting iterator is
282/// *O(N log(K))* where *N* is the total number of items that the input iterators
283/// yield and *K* is the number of input iterators.
284///
285/// # Examples
286///
287/// ```
288/// # extern crate maplit;
289/// # use maplit::*;
290/// # extern crate sorted_iter;
291/// # use std::collections::BTreeSet;
292/// use sorted_iter::multiway_union;
293///
294/// let sequences = vec![
295///     btreeset! { 0, 5, 10, 15, 20, 25 }.into_iter(),
296///     btreeset! { 0, 1, 4, 9, 16, 25, 36 }.into_iter(),
297///     btreeset! { 4, 7, 11, 15, 18 }.into_iter(),
298/// ];
299///
300/// assert_eq!(
301///     multiway_union(sequences).collect::<BTreeSet<u64>>(),
302///     btreeset! { 0, 1, 4, 5, 7, 9, 10, 11, 15, 16, 18, 20, 25, 36 }
303/// );
304/// ```
305pub fn multiway_union<T, I>(iters: T) -> MultiwayUnion<I>
306where
307    I: SortedIterator,
308    T: IntoIterator<Item = I>,
309    I::Item: Ord,
310{
311    MultiwayUnion::from_iter(iters)
312}
313
314/// relational operations for iterators of pairs where the items are sorted according to the key
315pub trait SortedPairIterator<K, V>: Iterator + Sized {
316    fn join<W, J: SortedPairIterator<K, W>>(self, that: J) -> Join<Self, J> {
317        Join {
318            a: self.peekable(),
319            b: that.peekable(),
320        }
321    }
322
323    fn left_join<W, J: SortedPairIterator<K, W>>(self, that: J) -> LeftJoin<Self, J> {
324        LeftJoin {
325            a: self.peekable(),
326            b: that.peekable(),
327        }
328    }
329
330    fn right_join<W, J: SortedPairIterator<K, W>>(self, that: J) -> RightJoin<Self, J> {
331        RightJoin {
332            a: self.peekable(),
333            b: that.peekable(),
334        }
335    }
336
337    fn outer_join<W, J: SortedPairIterator<K, W>>(self, that: J) -> OuterJoin<Self, J> {
338        OuterJoin {
339            a: self.peekable(),
340            b: that.peekable(),
341        }
342    }
343
344    fn map_values<W, F: (FnMut(V) -> W)>(self, f: F) -> MapValues<Self, F> {
345        MapValues { i: self, f }
346    }
347
348    fn filter_map_values<W, F: (FnMut(V) -> W)>(self, f: F) -> FilterMapValues<Self, F> {
349        FilterMapValues { i: self, f }
350    }
351
352    fn keys(self) -> Keys<Self> {
353        Keys { i: self }
354    }
355}
356
357impl<K, V, I> SortedPairIterator<K, V> for I where I: Iterator<Item = (K, V)> + SortedByKey {}
358
359pub mod assume {
360    //! extension traits for unchecked conversions from iterators to sorted iterators
361    use super::*;
362
363    /// extension trait for any iterator to add a assume_sorted_by_item method
364    pub trait AssumeSortedByItemExt: Iterator + Sized {
365        /// assume that the iterator is sorted by its item order
366        fn assume_sorted_by_item(self) -> AssumeSortedByItem<Self> {
367            AssumeSortedByItem { i: self }
368        }
369    }
370
371    impl<I: Iterator + Sized> AssumeSortedByItemExt for I {}
372
373    /// extension trait for any iterator of pairs to add a assume_sorted_by_key method
374    pub trait AssumeSortedByKeyExt: Iterator + Sized {
375        fn assume_sorted_by_key(self) -> AssumeSortedByKey<Self> {
376            AssumeSortedByKey { i: self }
377        }
378    }
379
380    impl<K, V, I: Iterator<Item = (K, V)> + Sized> AssumeSortedByKeyExt for I {}
381}
382
383#[cfg(test)]
384mod tests {
385    use super::*;
386
387    type Element = i64;
388    type Reference = std::collections::BTreeSet<Element>;
389
390    #[quickcheck]
391    fn disjoint(a: Reference, b: Reference) -> bool {
392        a.is_disjoint(&b) == a.iter().is_disjoint(b.iter())
393    }
394
395    #[quickcheck]
396    fn subset(a: Reference, b: Reference) -> bool {
397        a.is_subset(&b) == a.iter().is_subset(b.iter())
398    }
399
400    #[quickcheck]
401    fn superset(a: Reference, b: Reference) -> bool {
402        a.is_superset(&b) == a.iter().is_superset(b.iter())
403    }
404}