🌊 ripline
Fast line based iteration almost entirely lifted from ripgrep's grep_searcher.
All credit to Andrew Gallant and the ripgrep contributors.
Why?
- Doesn't rely on a clousre like the
bstr::for_line*methods (useful in some award lifetime scenarios). - No silently capped line lengths unlike
rust-linereader - Brings the
LineIterwith for working withmemmapfiles
Not all of this functionality was exposed in the grep_searcher crate, and rightly so as a lot of it had grep specific configurations embeded into the logic (i.e. binary detection).
What have I changed?
Not much. I took out some of the ripgrep specific logic such as the binary detection, some search related configs, and consolidated a few of the helper stucts from the other grep_* crates.
Example
See examples for more.
use stdout;
use ;
use ;
use ColorChoice;
Crude and untrustworthy benchmarks
From examples/ripline_benchmarks.rs. Initial benchmark script take from rust-linereader, which is also included in the benchmarks as LR:*.
The input used was all_train.csv, unzipped can catted together five times createing a ~25G file.
| Method | Time | Lines/sec | Bandwidth |
|---|---|---|---|
| read() | 2.01s | 17439155/s | 12303.42 MB/s |
| LR::next_batch() | 2.11s | 16576174/s | 11694.59 MB/s |
| LR::next_line() | 2.65s | 13196734/s | 9310.37 MB/s |
| ripline_line_buffer() | 2.64s | 13277194/s | 9367.14 MB/s |
| ripline_mmap() | 2.16s | 16183503/s | 11417.55 MB/s |
| bstr_for_line() | 2.47s | 14174502/s | 10000.19 MB/s |
| read_until() | 2.86s | 12230594/s | 8628.75 MB/s |
| read_line() | 4.16s | 8417415/s | 5938.53 MB/s |
| lines() | 5.05s | 6930901/s | 4889.79 MB/s |
Note that read and next_batch are not counting lines.
Hardware: Ubuntu 20 AMD Ryzen 9 3950X 16-Core Processor w/ 64 GB DDR4 memory and 1TB NVMe Drive