Expand description
Provides a composable, declarative way to consume an iterator.
If Iterator is the “source half” of data pipeline, Collector is the “sink half” of the pipeline.
In order words, Iterator describes how to produce data, and Collector describes how to consume it.
§Motivation
Suppose we are given an array of i32 and we are asked to find its sum and maximum value.
What would be our approach?
- Approach 1: Two-pass
let nums = [1, 3, 2];
let sum: i32 = nums.into_iter().sum();
let max = nums.into_iter().max().unwrap();
assert_eq!(sum, 6);
assert_eq!(max, 3);Cons: This performs two passes over the data, which is worse than one-pass in performance.
That is fine for arrays, but can be much worse for HashSet, LinkedList,
or… data from an IO stream.
- Approach 2:
Iterator::fold()
let nums = [1, 3, 2];
let (sum, max) = nums
.into_iter()
.fold((0, i32::MIN), |(sum, max), num| {
(sum + num, max.max(num))
});
assert_eq!(sum, 6);
assert_eq!(max, 3);Cons: Not very declarative. The main logic is still kind of procedural. (Doing sum and max by ourselves)
- Approach 3:
Iterator::inspect()
let nums = [1, 3, 2];
let mut sum = 0;
let max = nums
.into_iter()
.inspect(|i| sum += i)
.max()
.unwrap();
assert_eq!(sum, 6);
assert_eq!(max, 3);Cons: This approach has multiple drawbacks:
-
If the requirement changes to “calculate sum and find any negative value,” this approach may produce incorrect results. The “any” logic may short-circuit on finding the desired value, preventing the “sum” logic from summing every value. It is possible that we can rearrange so that the “any” logic goes first, but if the requirement changes to “find any negative value and even value,” we cannot escape.
-
The state is kept outside. Now the iterator cannot go anywhere else (e.g. sending to another thread, sending through a channel).
-
Very unintuitive and hack-y (hard to reason about).
-
And most importantly, not declarative enough.
This crate proposes a one-pass, declarative approach:
use better_collect::{prelude::*, num::Sum, cmp::Max};
let nums = [1, 3, 2];
let (sum, max) = nums
.into_iter()
.better_collect(Sum::<i32>::new().combine(Max::new()));
assert_eq!(sum, 6);
assert_eq!(max.unwrap(), 3);This approach achieves both one-pass and declarative, while is also composable (more of this later).
This is only with integers. How about with a non-Copy type?
// Suppose we open a connection...
fn socket_stream() -> impl Iterator<Item = String> {
["the", "noble", "and", "the", "singer"]
.into_iter()
.map(String::from)
}
// Task: Returns:
// - An array of data from the stream.
// - How many bytes were read.
// - The last-seen data.
// Usually, we're pretty much stuck with for-loop (tradition, `(try_)fold`, `(try_)for_each`).
// No common existing tools can help us here:
let mut received = vec![];
let mut byte_read = 0_usize;
let mut last_seen = None;
for data in socket_stream() {
received.push(data.clone());
byte_read += data.len();
last_seen = Some(data);
}
let expected = (received, byte_read, last_seen);
// This crate's way:
use better_collect::{prelude::*, Last, num::Sum};
let ((received, byte_read), last_seen) = socket_stream()
.better_collect(
vec![]
.into_collector()
.cloning()
// Use `map_ref` so that our collector is a `RefCollector`
// (only a `RefCollector` is `combine`-able)
.combine(Sum::<usize>::new().map_ref(|data: &mut String| data.len()))
.combine(Last::new())
);
assert_eq!((received, byte_read, last_seen), expected);Very declarative! We describe what we want to collect.
You might think this is just like Iterator::unzip(), but this crate does a bit better:
it can split the data and feed separately WITHOUT additional allocation.
To demonstrate the difference, take this example:
use std::collections::HashSet;
use better_collect::prelude::*;
// Suppose we open a connection...
fn socket_stream() -> impl Iterator<Item = String> {
["the", "noble", "and", "the", "singer"]
.into_iter()
.map(String::from)
}
// Task: Collect UNIQUE chunks of data and concatenate them.
// `Iterator::unzip`
let (chunks, concatenated_data): (HashSet<_>, String) = socket_stream()
// Sad. We have to clone.
// We can't take a reference, since the referenced data is returned too.
.map(|chunk| (chunk.clone(), chunk))
.unzip();
let unzip_way = (concatenated_data, chunks);
// Another approach is do two passes (collect to `Vec`, then iterate),
// which is still another allocation,
// or `Iterator::fold`, which's procedural.
// `Collector`
let collector_way = socket_stream()
// No clone. The data flows smoothly.
.better_collect(ConcatString::new().combine(HashSet::new()));
assert_eq!(unzip_way, collector_way);§Traits
§Main traits
Unlike std::iter, this crate defines two main traits instead. Roughly:
use std::ops::ControlFlow;
pub trait Collector {
type Item;
type Output
where
Self: Sized;
fn collect(&mut self, item: Self::Item) -> ControlFlow<()>;
fn finish(self) -> Self::Output
where
Self: Sized;
}
pub trait RefCollector: Collector {
fn collect_ref(&mut self, item: &mut Self::Item) -> ControlFlow<()>;
}Collector is similar to Extend, but it also returns a ControlFlow
value to indicate whether it should stop accumulating items after a call to
collect().
This serves as a hint for adaptors like combine() or chain()
to “vectorize” the remaining items to another collector.
In short, it is like a composable Extend.
RefCollector is a collector that does not require ownership of an item
to process it.
This allows items to flow through multiple collectors without being consumed,
avoiding unnecessary cloning.
It powers combine(), which creates a pipeline of collectors,
letting each item pass through safely by reference until the final collector
takes ownership.
§Other traits
BetterCollect extends Iterator with the
better_collect() method, which feeds all items from an iterator
into a Collector and returns the collector’s result.
To use this method, the BetterCollect trait must be imported.
IntoCollector is a conversion trait that converts a type into a Collector.
More types, traits and functions can be found in this crate’s documentation.
§Features
-
alloc— Enables collectors and implementations for types in thealloccrate (e.g.,Vec,VecDeque,BTreeSet). -
std(default) — Enables theallocfeature and implementations forstd-only types (e.g.,HashMap). When this feature is disabled, the crate builds inno_stdmode. -
unstable— Enables experimental and unstable features. Items gated behind this feature do not follow normal semver guarantees and may change or be removed at any time.Although the crate as a whole is technically still experimental, the items under
unstableare even more experimental, and it is generally discouraged to use them until their designs are finalized and not under this flag anymore.
Modules§
- aggregate
unstable - Module containing items for aggregation.
- cmp
Collectors for comparing items.- collections
alloc Collectors for collections in the standard library- num
- Numeric-related
Collectors. - prelude
- Re-exports commonly used items from this crate.
- string
alloc - String-related
Collectors. - sync
std - This module corresponds to
std::sync. - unit
Collectors for the unit type().- vec
alloc Collectors forVec.
Macros§
- aggregate_
struct unstable - Combines multiple aggregate ops into a single “
struct-based” aggregate.
Structs§
- All
- A
Collectorthat tests whether all collected items satisfy a predicate. - AllRef
- A
RefCollectorthat tests whether all collected items satisfy a predicate. - Any
- A
Collectorthat tests whether any collected item satisfies a predicate. - AnyRef
- A
RefCollectorthat tests whether any collected item satisfies a predicate. - Chain
- A
Collectorthat feeds the first collector until it stop accumulating, then feeds the second collector. - Cloning
- A
RefCollectorthatclones every collected item. - Combine
- A
Collectorthat lets both collectors collect the same item. - Copying
- A
RefCollectorthat copies every collected item. - Count
- A
RefCollectorthat counts the number of items it collects. - Driver
unstable - An
Iteratorthat “drives” the underlying iterator to feed the underlying collector. - Filter
- A
Collectorthat uses a closure to determine whether an item should be collected. - Find
- A
Collectorthat searches for the first item satisfying a predicate. - Fold
- A
Collectorthat accumulates items using a function. - FoldRef
- A
RefCollectorthat accumulates items using a function. - Funnel
- A
RefCollectorthat maps a mutable reference to an item into another mutable reference. - Fuse
- A
Collectorthat can “safely” collect items even after the underlying collector has stopped accumulating, without triggering undesired behaviors. - Last
- A
Collectorthat stores the last item it collects. - Map
- A
Collectorthat calls a closure on each item before collecting. - MapOutput
- Creates a
Collectorthat transforms the final accumulated result. - MapRef
- A
RefCollectorthat calls a closure on each item by mutable reference before collecting. - Nest
unstable - A
Collectorthat collects all outputs produced by an inner collector. - Nest
Exact unstable - A
Collectorthat collects all outputs produced by an inner collector. - Partition
- A
Collectorthat distributes items between two collectors based on a predicate. - Product
- A
Collectorthat computes the product of all collected items. - Reduce
- A
Collectorthat reduces all collected items into a single value by repeatedly applying a reduction function. - Sink
- A
RefCollectorthat collects items… but no one knows where they go. - Skip
- A
Collectorthat skips the firstncollected items before it begins accumulating them. - Sum
- A
Collectorthat computes the sum of all collected items. - Take
- A
Collectorthat stops accumulating after collecting the firstnitems. - Take
While - A
Collectorthat accumulates items as long as a predicate returnstrue. - TryFold
- A
Collectorthat accumulates items using a function as long as the function returns successfully. - TryFold
Ref - A
RefCollectorthat accumulates items by mutable reference using a function as long as the function returns successfully. - Unbatching
- A
Collectorwith a custom collection logic. - Unbatching
Ref - A
RefCollectorwith a custom collection logic. - Unzip
- A
Collectorthat destructures each 2-tuple(A, B)item and distributes its fields:Agoes to the first collector, andBgoes to the second collector.
Traits§
- Better
Collect - Extends
Iteratorwith thebetter_collectmethod for working seamlessly withCollectors. - Collector
- Collects items and produces a final output.
- Collector
ByMut - A type that can be converted into a collector by mutable reference.
- Collector
ByRef - A type that can be converted into a collector by shared reference.
- Into
Collector - Conversion into a
Collector. - RefCollector
- A
Collectorthat can also collect items by mutable reference.