sorted_iter/lib.rs
1//! This crate provides set and relational operations for all iterators in the standard library that are known
2//! at compile time to be sorted.
3//!
4//! # Set operations
5//! ```
6//! # extern crate maplit;
7//! # use maplit::*;
8//! # extern crate sorted_iter;
9//! use sorted_iter::SortedIterator;
10//!
11//! let primes = btreeset! { 2, 3, 5, 7, 11, 13u64 }.into_iter();
12//! let fibs = btreeset! { 1, 2, 3, 5, 8, 13u64 }.into_iter();
13//! let fib_primes = primes.intersection(fibs);
14//! ```
15//!
16//! It is possible to efficiently define set operations on sorted iterators. Sorted iterators are
17//! very common in the standard library. E.g. the elements of a [BTreeSet] or the keys of a [BTreeMap]
18//! are guaranteed to be sorted according to the element order, as are iterable ranges like `0..100`.
19//!
20//! There are also a number of operations on iterators that preserve the sort order. E.g. if an
21//! iterator is sorted, [take], [take_while] etc. are going to result in a sorted iterator as well.
22//!
23//! Since the complete types of iterators are typically visible in rust, it is possible to encode these
24//! rules at type level. This is what this crate does.
25//!
26//! For available set operations, see [SortedIterator].
27//! For sorted iterators in the std lib, see instances for the [SortedByItem] marker trait.
28//!
29//! # Relational operations
30//! ```
31//! # extern crate maplit;
32//! # use maplit::*;
33//! # extern crate sorted_iter;
34//! use sorted_iter::SortedPairIterator;
35//!
36//! let cities = btreemap! {
37//! 1 => "New York",
38//! 2 => "Tokyo",
39//! 3u8 => "Berlin"
40//! }.into_iter();
41//! let countries = btreemap! {
42//! 1 => "USA",
43//! 2 => "Japan",
44//! 3u8 => "Germany"
45//! }.into_iter();
46//! let cities_and_countries = cities.join(countries);
47//! ```
48//!
49//! Iterators of pairs that are sorted according to the first element / key are also very common in
50//! the standard library and elsewhere. E.g. the elements of a [BTreeMap] are guaranteed to be sorted
51//! according to the key order.
52//!
53//! The same rules as for sorted iterators apply for preservation of the sort order, except that there
54//! are some additional operations that preserve sort order. Anything that only operates on the value,
55//! like e.g. map or filter_map on the value, is guaranteed to preserve the sort order.
56//!
57//! The operations that can be defined on sorted pair operations are the relational operations known
58//! from relational algebra / SQL, namely join, left_join, right_join and outer_join.
59//!
60//! For available relational operations, see [SortedPairIterator].
61//! For sorted iterators in the std lib, see instances the for [SortedByKey] marker trait.
62//!
63//! # Transformations that retain order are allowed
64//! ```
65//! # extern crate sorted_iter;
66//! use sorted_iter::*;
67//!
68//! let odd = (1..31).step_by(2);
69//! let multiples_of_3 = (3..30).step_by(3);
70//! let either = odd.union(multiples_of_3);
71//! ```
72//!
73//! # Transformations that can change the order lose the sorted property
74//! ```compile_fail
75//! # extern crate sorted_iter;
76//! use sorted_iter::*;
77//!
78//! // we have no idea what map does to the order. could be anything!
79//! let a = (1..31).map(|x| -x);
80//! let b = (3..30).step_by(3);
81//! let either = a.union(b); // does not compile!
82//! ```
83//!
84//! # Assuming sort ordering
85//!
86//! For most std lib iterators, this library already provides instances. But there will occasionally be an iterator
87//! from a third party library where you *know* that it is properly sorted.
88//!
89//! For this case, there is an escape hatch:
90//!
91//! ```
92//! // the assume_ extensions have to be implicitly imported
93//! use sorted_iter::*;
94//! use sorted_iter::assume::*;
95//! let odd = vec![1,3,5,7u8].into_iter().assume_sorted_by_item();
96//! let even = vec![2,4,6,8u8].into_iter().assume_sorted_by_item();
97//! let all = odd.union(even);
98//!
99//! let cities = vec![(1u8, "New York")].into_iter().assume_sorted_by_key();
100//! let countries = vec![(1u8, "USA")].into_iter().assume_sorted_by_key();
101//! let cities_and_countries = cities.join(countries);
102//! ```
103//!
104//! # Marking your own iterators
105//!
106//! If you have a library and want to mark some iterators as sorted, this is possible by implementing the
107//! appropriate marker trait, [SortedByItem] or [SortedByKey].
108//!
109//! ```
110//! # extern crate sorted_iter;
111//! // marker traits are not at top level, since usually you don't need them
112//! use sorted_iter::sorted_iterator::SortedByItem;
113//! use sorted_iter::sorted_pair_iterator::SortedByKey;
114//!
115//! pub struct MySortedIter<T> { whatever: T }
116//! pub struct MySortedPairIter<K, V> { whatever: (K, V) }
117//!
118//! impl<T> SortedByItem for MySortedIter<T> {}
119//! impl<K, V> SortedByKey for MySortedPairIter<K, V> {}
120//! ```
121//!
122//! By reexporting the extension traits, you get a seamless experience for people using your library.
123//!
124//! ```
125//! extern crate sorted_iter;
126//! pub use sorted_iter::{SortedIterator, SortedPairIterator};
127//! ```
128//!
129//! ## Tests
130//!
131//! Tests are done using the fantastic [quickcheck] crate, by comparing against the operations defined on
132//! [BTreeSet] and [BTreeMap].
133//!
134//! [SortedIterator]: trait.SortedIterator.html
135//! [SortedPairIterator]: trait.SortedPairIterator.html
136//! [SortedByItem]: sorted_iterator/trait.SortedByItem.html
137//! [SortedByKey]: sorted_pair_iterator/trait.SortedByKey.html
138//! [quickcheck]: https://github.com/BurntSushi/quickcheck
139//! [BTreeSet]: https://doc.rust-lang.org/std/collections/struct.BTreeSet.html
140//! [BTreeMap]: https://doc.rust-lang.org/std/collections/struct.BTreeMap.html
141//! [take]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.take
142//! [take_while]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.take_while
143//! [Ord]: https://doc.rust-lang.org/std/cmp/trait.Ord.html
144#[cfg(test)]
145extern crate quickcheck;
146
147#[cfg(test)]
148#[macro_use(quickcheck)]
149extern crate quickcheck_macros;
150
151use core::cmp::Ordering;
152
153pub mod one_or_less_iterator;
154pub mod sorted_iterator;
155pub mod sorted_pair_iterator;
156
157pub use one_or_less_iterator::OneOrLess;
158
159use crate::sorted_iterator::*;
160use crate::sorted_pair_iterator::*;
161
162#[deny(missing_docs)]
163
164/// set operations for iterators where the items are sorted according to the natural order
165pub trait SortedIterator: Iterator + Sized {
166 /// Visits the values representing the union, i.e., all the values in `self` or `other`,
167 /// without duplicates.
168 fn union<J>(self, other: J) -> Union<Self, J>
169 where
170 J: SortedIterator<Item = Self::Item>,
171 {
172 Union {
173 a: self.peekable(),
174 b: other.peekable(),
175 }
176 }
177
178 /// Visits the values representing the intersection, i.e., the values that are both in `self`
179 /// and `other`.
180 fn intersection<J>(self, other: J) -> Intersection<Self, J>
181 where
182 J: SortedIterator<Item = Self::Item>,
183 {
184 Intersection {
185 a: self,
186 b: other.peekable(),
187 }
188 }
189
190 /// Visits the values representing the difference, i.e., the values that are in `self` but not
191 /// in `other`.
192 fn difference<J>(self, other: J) -> Difference<Self, J>
193 where
194 J: SortedIterator<Item = Self::Item>,
195 {
196 Difference {
197 a: self,
198 b: other.peekable(),
199 }
200 }
201
202 /// Visits the values representing the symmetric difference, i.e., the values that are in
203 /// `self` or in `other` but not in both.
204 fn symmetric_difference<J>(self, other: J) -> SymmetricDifference<Self, J>
205 where
206 J: SortedIterator<Item = Self::Item>,
207 {
208 SymmetricDifference {
209 a: self.peekable(),
210 b: other.peekable(),
211 }
212 }
213
214 /// Creates an iterator that pairs each element of `self` with `()`. This transforms a
215 /// `SortedIterator` into a [`SortedPairIterator`].
216 fn pairs(self) -> Pairs<Self> {
217 Pairs { i: self }
218 }
219
220 /// Returns `true` if `self` has no elements in common with `other`. This is equivalent to
221 /// checking for an empty intersection.
222 fn is_disjoint<J>(self, mut other: J) -> bool
223 where
224 J: SortedIterator<Item = Self::Item>,
225 Self::Item: Ord,
226 {
227 let mut next_b = other.next();
228 'next_a: for a in self {
229 while let Some(b) = &next_b {
230 match a.cmp(b) {
231 Ordering::Less => continue 'next_a,
232 Ordering::Equal => return false,
233 Ordering::Greater => next_b = other.next(),
234 }
235 }
236 break;
237 }
238 true
239 }
240
241 /// Returns `true` if this sorted iterator is a subset of another, i.e., `other` contains at
242 /// least all the values in `self`.
243 fn is_subset<J>(self, mut other: J) -> bool
244 where
245 J: SortedIterator<Item = Self::Item>,
246 Self::Item: Ord,
247 {
248 'next_a: for a in self {
249 while let Some(b) = other.next() {
250 match a.cmp(&b) {
251 Ordering::Less => break,
252 Ordering::Equal => continue 'next_a,
253 Ordering::Greater => continue,
254 }
255 }
256 return false;
257 }
258 true
259 }
260
261 /// Returns `true` if this sorted iterator is a superset of another, i.e., `self` contains at
262 /// least all the values in `other`.
263 fn is_superset<J>(self, other: J) -> bool
264 where
265 J: SortedIterator<Item = Self::Item>,
266 Self::Item: Ord,
267 {
268 other.is_subset(self)
269 }
270}
271
272impl<I> SortedIterator for I where I: Iterator + SortedByItem {}
273
274/// Union of multiple sorted iterators.
275///
276/// An advantage of this function over multiple calls to `SortedIterator::union`
277/// is that the number of merged sequences does not need to be known at the
278/// compile time. The drawback lies in the fact that all iterators have to be
279/// of the same type.
280///
281/// The algorithmic complexity of fully consuming the resulting iterator is
282/// *O(N log(K))* where *N* is the total number of items that the input iterators
283/// yield and *K* is the number of input iterators.
284///
285/// # Examples
286///
287/// ```
288/// # extern crate maplit;
289/// # use maplit::*;
290/// # extern crate sorted_iter;
291/// # use std::collections::BTreeSet;
292/// use sorted_iter::multiway_union;
293///
294/// let sequences = vec![
295/// btreeset! { 0, 5, 10, 15, 20, 25 }.into_iter(),
296/// btreeset! { 0, 1, 4, 9, 16, 25, 36 }.into_iter(),
297/// btreeset! { 4, 7, 11, 15, 18 }.into_iter(),
298/// ];
299///
300/// assert_eq!(
301/// multiway_union(sequences).collect::<BTreeSet<u64>>(),
302/// btreeset! { 0, 1, 4, 5, 7, 9, 10, 11, 15, 16, 18, 20, 25, 36 }
303/// );
304/// ```
305pub fn multiway_union<T, I>(iters: T) -> MultiwayUnion<I>
306where
307 I: SortedIterator,
308 T: IntoIterator<Item = I>,
309 I::Item: Ord,
310{
311 MultiwayUnion::from_iter(iters)
312}
313
314/// relational operations for iterators of pairs where the items are sorted according to the key
315pub trait SortedPairIterator<K, V>: Iterator + Sized {
316 fn join<W, J: SortedPairIterator<K, W>>(self, that: J) -> Join<Self, J> {
317 Join {
318 a: self.peekable(),
319 b: that.peekable(),
320 }
321 }
322
323 fn left_join<W, J: SortedPairIterator<K, W>>(self, that: J) -> LeftJoin<Self, J> {
324 LeftJoin {
325 a: self.peekable(),
326 b: that.peekable(),
327 }
328 }
329
330 fn right_join<W, J: SortedPairIterator<K, W>>(self, that: J) -> RightJoin<Self, J> {
331 RightJoin {
332 a: self.peekable(),
333 b: that.peekable(),
334 }
335 }
336
337 fn outer_join<W, J: SortedPairIterator<K, W>>(self, that: J) -> OuterJoin<Self, J> {
338 OuterJoin {
339 a: self.peekable(),
340 b: that.peekable(),
341 }
342 }
343
344 fn map_values<W, F: (FnMut(V) -> W)>(self, f: F) -> MapValues<Self, F> {
345 MapValues { i: self, f }
346 }
347
348 fn filter_map_values<W, F: (FnMut(V) -> W)>(self, f: F) -> FilterMapValues<Self, F> {
349 FilterMapValues { i: self, f }
350 }
351
352 fn keys(self) -> Keys<Self> {
353 Keys { i: self }
354 }
355}
356
357impl<K, V, I> SortedPairIterator<K, V> for I where I: Iterator<Item = (K, V)> + SortedByKey {}
358
359pub mod assume {
360 //! extension traits for unchecked conversions from iterators to sorted iterators
361 use super::*;
362
363 /// extension trait for any iterator to add a assume_sorted_by_item method
364 pub trait AssumeSortedByItemExt: Iterator + Sized {
365 /// assume that the iterator is sorted by its item order
366 fn assume_sorted_by_item(self) -> AssumeSortedByItem<Self> {
367 AssumeSortedByItem { i: self }
368 }
369 }
370
371 impl<I: Iterator + Sized> AssumeSortedByItemExt for I {}
372
373 /// extension trait for any iterator of pairs to add a assume_sorted_by_key method
374 pub trait AssumeSortedByKeyExt: Iterator + Sized {
375 fn assume_sorted_by_key(self) -> AssumeSortedByKey<Self> {
376 AssumeSortedByKey { i: self }
377 }
378 }
379
380 impl<K, V, I: Iterator<Item = (K, V)> + Sized> AssumeSortedByKeyExt for I {}
381}
382
383#[cfg(test)]
384mod tests {
385 use super::*;
386
387 type Element = i64;
388 type Reference = std::collections::BTreeSet<Element>;
389
390 #[quickcheck]
391 fn disjoint(a: Reference, b: Reference) -> bool {
392 a.is_disjoint(&b) == a.iter().is_disjoint(b.iter())
393 }
394
395 #[quickcheck]
396 fn subset(a: Reference, b: Reference) -> bool {
397 a.is_subset(&b) == a.iter().is_subset(b.iter())
398 }
399
400 #[quickcheck]
401 fn superset(a: Reference, b: Reference) -> bool {
402 a.is_superset(&b) == a.iter().is_superset(b.iter())
403 }
404}