Crate regex_chunker
source ·Expand description
The centerpiece of this crate is the ByteChunker
, which takes a regular
expression and wraps a Read
type, becoming an iterator
over the bytes read from the wrapped type, yielding chunks delimited by
the supplied regular expression.
The example program below uses a ByteChunker
to do a crude word
tally on text coming in on the standard input.
use std::{collections::BTreeMap, error::Error};
use regex_chunker::ByteChunker;
fn main() -> Result<(), Box<dyn Error>> {
let mut counts: BTreeMap<String, usize> = BTreeMap::new();
let stdin = std::io::stdin();
// The regex is a stab at something matching strings of
// "between-word" characters in general English text.
let chunker = ByteChunker::new(stdin, r#"[ "\r\n.,!?:;/]+"#)?;
for chunk in chunker {
let word = String::from_utf8_lossy(&chunk?).to_lowercase();
*counts.entry(word).or_default() += 1;
}
println!("{:#?}", &counts);
Ok(())
}
Enabling the async
feature also exposes the stream
module, which
features an async version of the ByteChunker
, wrapping an
AsyncRead
and implementing
Stream
.
(This also pulls in several crates of
tokio
machinery, which is why
it’s behind a feature flag.)
Modules
- stream
async
Structs
- The
ByteChunker
takes abytes::Regex
, wraps a byte source (that is, a type that implementsstd::io::Read
) and iterates over chunks of bytes from that source that are delimited by the regular expression. It operates very much likebytes::Regex::split
, except that it works on an incoming stream of bytes instead of a necessarily-already-in-memory slice. - A chunker that has additionally been supplied with an
Adapter
, so it can produce arbitrary types. TheCustomChunker
s does not have a separate constructor; it is built by combining aByteChunker
with anAdapter
usingByteChunker::with_adapter
. - A version of
CustomChunker
that takes aSimpleAdapter
type. - An example
Adapter
type for producing a chunker that yieldsString
s.
Enums
- Type for specifying a Chunker’s behavior upon encountering an error.
- Specify what the chunker should do with the matched text.
- Wraps various types of errors that can happen in the internals of a Chunker. The way Chunkers respond to and report these errors can be controlled through builder-pattern methods that take the
ErrorResponse
andUtf8FailureMode
types. - Type for specifying a
StringAdapter
’s behavior upon encountering non-UTF-8 data.
Traits
- Trait used to implement a
CustomChunker
by transforming the output of aByteChunker
. - Simpler, less flexible, version of the
Adapter
trait.