Expand description
The centerpiece of this crate is the ByteChunker, which takes a regular
expression and wraps a Read type, becoming an iterator
over the bytes read from the wrapped type, yielding chunks delimited by
the supplied regular expression.
The example program below uses a ByteChunker to do a crude word
tally on text coming in on the standard input.
use std::{collections::BTreeMap, error::Error};
use regex_chunker::ByteChunker;
fn main() -> Result<(), Box<dyn Error>> {
let mut counts: BTreeMap<String, usize> = BTreeMap::new();
let stdin = std::io::stdin();
// The regex is a stab at something matching strings of
// "between-word" characters in general English text.
let chunker = ByteChunker::new(stdin, r#"[ "\r\n.,!?:;/]+"#)?;
for chunk in chunker {
let word = String::from_utf8_lossy(&chunk?).to_lowercase();
*counts.entry(word).or_default() += 1;
}
println!("{:#?}", &counts);
Ok(())
}Enabling the async feature also exposes the stream module, which
features an async version of the ByteChunker, wrapping an
AsyncRead
and implementing
Stream.
(This also pulls in several crates of
tokio machinery, which is why
it’s behind a feature flag.)
Modules§
- stream
async - Asynchronous analogs to the base
*Chunkertypes that wrap Tokio’sAsyncReadtypes and implementStream.
Structs§
- Byte
Chunker - The
ByteChunkertakes abytes::Regex, wraps a byte source (that is, a type that implementsstd::io::Read) and iterates over chunks of bytes from that source that are delimited by the regular expression. It operates very much likebytes::Regex::split, except that it works on an incoming stream of bytes instead of a necessarily-already-in-memory slice. - Custom
Chunker - A chunker that has additionally been supplied with an
Adapter, so it can produce arbitrary types. TheCustomChunkers does not have a separate constructor; it is built by combining aByteChunkerwith anAdapterusingByteChunker::with_adapter. - Simple
Custom Chunker - A version of
CustomChunkerthat takes aSimpleAdaptertype. - String
Adapter - An example
Adaptertype for producing a chunker that yieldsStrings.
Enums§
- Error
Response - Type for specifying a Chunker’s behavior upon encountering an error.
- Match
Disposition - Specify what the chunker should do with the matched text.
- RcErr
- Wraps various types of errors that can happen in the internals of a
Chunker. The way Chunkers respond to and report these errors can be
controlled through builder-pattern methods that take the
ErrorResponseandUtf8FailureModetypes. - Utf8
Failure Mode - Type for specifying a
StringAdapter’s behavior upon encountering non-UTF-8 data.
Traits§
- Adapter
- Trait used to implement a
CustomChunkerby transforming the output of aByteChunker. - Simple
Adapter - Simpler, less flexible, version of the
Adaptertrait.