text-file-sort
This crate implements a sort algorithm for text files composed of lines or line records. For example CSV or TSV.
A data file composed of lines or line records, that is lines that are composed of fields separated by a delimiter, can be sorted using this crate. Example for such files are pg_dump, CSV and GTFS data files. The motivation for writing this module was the need to sort pg_dump files of the OpenStreetMap database containing billions of lines by the primary key of each table before converting the data to PBF format.
This implementation can be used to sort very large files, taking advantage of multiple CPU cores and providing memory usage control.
Issues
Issues are welcome and appreciated. Please submit to https://github.com/navigatorsguild/text-file-sort/issues
Benchmarks
Benchmarks generated by benchmark-rs
Examples
use PathBuf;
use Sort;
// optimized for use with Jemalloc
use Jemalloc;
static GLOBAL: Jemalloc = Jemalloc;
// parallel record sort
License: MIT OR Apache-2.0