you know how every chunking library claims to be fast? yeah, we actually meant it.
memchunk splits text at semantic boundaries (periods, newlines, the usual suspects) and does it stupid fast. we're talking "chunk the entire english wikipedia in 120ms" fast.
want to know how? read the blog post where we nerd out about SIMD instructions and lookup tables.
📦 Installation
🚀 Usage
use chunk;
let text = b"Hello world. How are you? I'm fine.\nThanks for asking.";
// With defaults (4KB chunks, split at \n . ?)
let chunks: = chunk.collect;
// With custom size
let chunks: = chunk.size.collect;
// With custom delimiters
let chunks: = chunk.delimiters.collect;
// With both
let chunks: = chunk.size.delimiters.collect;
📝 Citation
If you use memchunk in your research, please cite it as follows:
📄 License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.