Expand description
The lazycsv
crate provides a performant CSV parser.
§Primary Focuses
lazycsv is a parser that performs optimistic optimization. It’s primarily optimized for parsing CSV input that is either unquoted or only minimally quoted—especially when dequoting is unnecessary. In such cases, it can outperform BurntSushi/rust-csv by around 20% in terms of performance.
However, if the input is expected to require dequotation, it’s generally better to use BurntSushi/rust-csv, which performs eager dequoting during the parsing phase. Since lazycsv is a lazy parser, it defers dequoting entirely. If dequotation is performed later, this effectively results in scanning the input twice, which leads to a performance penalty.
- Vectorized: The parser utilizes SIMD operations, therefore is very performant.
- Minimal hidden costs: Every API doesn’t bring any invisible overheads, and each operation only does what it needs to do.
- Zero copy, zero allocation by default: The parser doesn’t allocate any memory during parsing and only performs allocation when dequoting each cell.
- Lazy Decoding: Input is not copied or unquoted until requested. This is useful when you only need to access a few cells in a large CSV file.
#![no_std]
eligible: The crate is#![no_std]
compatible, and it can be used in systems without an allocator.
§Supported Features
lazycsv
primarily supports a subset of RFC 4180 with minor extensions.
§According to RFC 4180:
- No escape mechanisms other than quoting are supported.
- Padding cells with whitespace is not allowed.
- Using double quotes without quoting is not allowed.
- Quotes must always appear at the very beginning of a cell.
§Additional Restrictions:
- Only ASCII and UTF-8 encodings are supported.
§Additional Supports:
- Using LF (
\n
) instead of CRLF (\r\n
) as the newline is permitted. - Customizing the separator character is possible.
§Examples
use lazycsv::{Csv, CsvIterItem};
// Iterating over rows
let csv = Csv::new(b"a,b,c\n1,2,3");
for row in csv.into_rows() {
let [first, second, third] = row?;
println!(
"{}, {}, {}",
first.try_as_str()?,
second.try_as_str()?,
third.try_as_str()?,
);
}
// Or if you want to avoid buffering:
let csv2 = Csv::new(b"a,b,c\n1,2,3");
for item in csv2 {
if let CsvIterItem::Cell(cell) = item {
println!("{}", cell.try_as_str()?);
}
}
§Crate features
- std - When enabled (the default), this will permit features specific to the standard
library. Currently, the only thing used from the standard library is runtime SIMD CPU feature
detection. This means that this feature must be enabled to get AVX2 accelerated routines on
x86_64
targets without enabling theavx2
feature at compile time, for example. Whenstd
is not enabled, this crate will still attempt to use SSE2 accelerated routines onx86_64
. It will also use AVX2 accelerated routines when theavx2
feature is enabled at compile time. In general, enable this feature if you can. - alloc - When enabled (the default), API in this crate requiring some kind of allocation
will become available. (i.e.
Cell::try_as_str
) Otherwise, this crate is designed from the ground up to be usable in core-only contexts, so thealloc
feature doesn’t add much currently. Notably, disablingstd
but enablingalloc
will not result in the use of AVX2 onx86_64
targets unless theavx2
feature is enabled at compile time. (Withstd
enabled, AVX2 can be used even without theavx2
feature enabled at compile time by way of runtime CPU feature detection.)
Structs§
- Cell
- A cell in a CSV row.
- Csv
- A stateful CSV parser.
- CsvRow
Iter - An iterator that buffers and yields rows of cells.
Enums§
- CsvIter
Item - An item yielded by
Csv
, indicates either a cell or a line break. - RowIter
Error - Errors returned by
CsvRowIter
.