Crate lazycsv

Source
Expand description

The lazycsv crate provides a performant CSV parser.

Benchmarks

§Primary Focuses

lazycsv is a parser that performs optimistic optimization. It’s primarily optimized for parsing CSV input that is either unquoted or only minimally quoted—especially when dequoting is unnecessary. In such cases, it can outperform BurntSushi/rust-csv by around 20% in terms of performance.

However, if the input is expected to require dequotation, it’s generally better to use BurntSushi/rust-csv, which performs eager dequoting during the parsing phase. Since lazycsv is a lazy parser, it defers dequoting entirely. If dequotation is performed later, this effectively results in scanning the input twice, which leads to a performance penalty.

  • Vectorized: The parser utilizes SIMD operations, therefore is very performant.
  • Minimal hidden costs: Every API doesn’t bring any invisible overheads, and each operation only does what it needs to do.
  • Zero copy, zero allocation by default: The parser doesn’t allocate any memory during parsing and only performs allocation when dequoting each cell.
  • Lazy Decoding: Input is not copied or unquoted until requested. This is useful when you only need to access a few cells in a large CSV file.
  • #![no_std] eligible: The crate is #![no_std] compatible, and it can be used in systems without an allocator.

§Supported Features

lazycsv primarily supports a subset of RFC 4180 with minor extensions.

§According to RFC 4180:

  • No escape mechanisms other than quoting are supported.
  • Padding cells with whitespace is not allowed.
  • Using double quotes without quoting is not allowed.
  • Quotes must always appear at the very beginning of a cell.

§Additional Restrictions:

  • Only ASCII and UTF-8 encodings are supported.

§Additional Supports:

  • Using LF (\n) instead of CRLF (\r\n) as the newline is permitted.
  • Customizing the separator character is possible.

§Examples

use lazycsv::{Csv, CsvIterItem};

// Iterating over rows
let csv = Csv::new(b"a,b,c\n1,2,3");
for row in csv.into_rows() {
    let [first, second, third] = row?;
    println!(
        "{}, {}, {}",
        first.try_as_str()?,
        second.try_as_str()?,
        third.try_as_str()?,
    );
}

// Or if you want to avoid buffering:
let csv2 = Csv::new(b"a,b,c\n1,2,3");
for item in csv2 {
    if let CsvIterItem::Cell(cell) = item {
        println!("{}", cell.try_as_str()?);
    }
}

§Crate features

  • std - When enabled (the default), this will permit features specific to the standard library. Currently, the only thing used from the standard library is runtime SIMD CPU feature detection. This means that this feature must be enabled to get AVX2 accelerated routines on x86_64 targets without enabling the avx2 feature at compile time, for example. When std is not enabled, this crate will still attempt to use SSE2 accelerated routines on x86_64. It will also use AVX2 accelerated routines when the avx2 feature is enabled at compile time. In general, enable this feature if you can.
  • alloc - When enabled (the default), API in this crate requiring some kind of allocation will become available. (i.e. Cell::try_as_str) Otherwise, this crate is designed from the ground up to be usable in core-only contexts, so the alloc feature doesn’t add much currently. Notably, disabling std but enabling alloc will not result in the use of AVX2 on x86_64 targets unless the avx2 feature is enabled at compile time. (With std enabled, AVX2 can be used even without the avx2 feature enabled at compile time by way of runtime CPU feature detection.)

Structs§

Cell
A cell in a CSV row.
Csv
A stateful CSV parser.
CsvRowIter
An iterator that buffers and yields rows of cells.

Enums§

CsvIterItem
An item yielded by Csv, indicates either a cell or a line break.
RowIterError
Errors returned by CsvRowIter.