1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
//! Module contains traits and `struct`s for collectors.
//!
//! # Unspecified behaviors
//!
//! Unless stated otherwise by the collector’s implementation, after any of
//! [`Collector::collect()`], [`Collector::collect_many()`], or
//! [`CollectorBase::break_hint()`] have returned [`Break(())`] once,
//! behaviors of subsequent calls to any method other than
//! [`finish()`](CollectorBase::finish) are unspecified.
//! They may panic, overflow, or even resume accumulation
//! (similar to how [`Iterator::next()`] might yield again after returning [`None`]).
//! Callers should generally call [`finish()`](CollectorBase::finish) once a collector
//! has signaled a stop.
//! If this invariant cannot be upheld, wrap it with [`fuse()`](CollectorBase::fuse).
//!
//! This looseness allows for optimizations (for example, omitting an internal "stopped” flag).
//!
//! Although the behavior is unspecified, none of the aforementioned methods are `unsafe`.
//! Implementors must **not** cause memory corruption, undefined behavior,
//! or any other safety violations, and callers must **not** rely on such outcomes.
//!
//! # Limitations and workarounds
//!
//! In some cases, you may need to explicitly annotate the parameter types in closures,
//! especially for adaptors that take generic functions.
//! This is due to current limitations in Rust’s type inference for closure parameters.
//!
//! Moreover, if you ever... (TODO: How to deal with "`collect` method not found,"
//! and "implementation of `FnMut` is not general enough")
//!
//! # Example
//!
//! Suppose we are building a tokenizer to process text for an NLP model.
//! We will skip all complicated details for now and simply collect every word we see.
//!
//! ```
//! use std::{ops::ControlFlow, collections::HashMap};
//! use better_collect::prelude::*;
//!
//! #[derive(Default)]
//! struct Tokenizer {
//! indices: HashMap<String, usize>,
//! words: Vec<String>,
//! }
//!
//! impl Tokenizer {
//! fn tokenize(&self, sentence: &str) -> Vec<usize> {
//! sentence
//! .split_whitespace()
//! .map(|word| self.indices.get(word).copied().unwrap_or(0))
//! .collect()
//! }
//! }
//!
//! // We have to implement this trait first.
//! impl CollectorBase for Tokenizer {
//! // For now, for simplicity, we just return the struct itself.
//! type Output = Self;
//!
//! fn finish(self) -> Self::Output {
//! // Just return itself.
//! self
//! }
//! }
//!
//! impl Collector<String> for Tokenizer {
//! fn collect(&mut self, word: String) -> ControlFlow<()> {
//! self.indices
//! .entry(word)
//! .or_insert_with_key(|word| {
//! self.words.push(word.clone());
//! // Reserve index 0 for out-of-vocabulary words.
//! self.words.len()
//! });
//!
//! // Tokenizer never stops accumulating.
//! ControlFlow::Continue(())
//! }
//! }
//!
//! let sentence = "the noble and the singer";
//! let tokenizer = sentence
//! .split_whitespace()
//! .map(String::from)
//! .feed_into(Tokenizer::default());
//!
//! // "the" should only appear once.
//! assert_eq!(tokenizer.words, ["the", "noble", "and", "singer"]);
//! assert_eq!(tokenizer.tokenize("the singer and the swordswoman"), [1, 4, 3, 1, 0]);
//! ```
//!
//!
//! [`Break(())`]: std::ops::ControlFlow::Break
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
pub use *;
pub const
pub const