Crate csv_lib

Source
Expand description

§Csv_lib Crate

A Rust library to read/write CSV files in the fastest way I know.

For further information, you can check the repo here

§3rd Party Crates Used:

CrateLink
Memmap2memmap2 crate
Encoding_rsencoding_rs crate
Memchrmemchr crate

§Features

  • Zero-copy parsing
  • Custom delimiters support
  • Escape string support
  • Direct mapping from memory

§FFI Support

  • Optional feature to allow usage from another language (e.g., C#).
  • Consider the cost of context switching per iteration.

FFI definition. (Read this, if you want to implent it in another language): You must crate a new project. Use cargo to add this library. Then, define your export functions in your project. You can create the same functions this library have and make a call. Be sure to add the #[no_mangle] attribute to prevent name mangling, and extern "C" to specify the calling convention. In Cargo.toml , you must add the #[crate-type = "cdylib"] attribute to the library target.This will tell cargo to compile a native library. For example in Windows, you will compile a .dll.

Then copy the library to your project, the one you do in other language.

To connect, or program the definitions of functions, the FFI module usefully, so i add it here:

use crate::csv::csv_reader::CsvReaderWithMap;
use crate::models::csv_config::CsvConfig;
use std::ffi::c_char;
use std::ptr;
use encoding_rs::Encoding;

/// Resolves a code page number to an Encoding.
fn encode_solver(codepage: u32) -> &'static Encoding {
    match codepage {
        932 => encoding_rs::SHIFT_JIS,
        936 => encoding_rs::GBK,
        949 => encoding_rs::EUC_KR,
        950 => encoding_rs::BIG5,
        866 => encoding_rs::IBM866,
        874 => encoding_rs::WINDOWS_874,
        1200 => encoding_rs::UTF_16LE,
        1201 => encoding_rs::UTF_16BE,
        1250 => encoding_rs::WINDOWS_1250,
        1251 => encoding_rs::WINDOWS_1251,
        1252 => encoding_rs::WINDOWS_1252,
        1253 => encoding_rs::WINDOWS_1253,
        1254 => encoding_rs::WINDOWS_1254,
        1255 => encoding_rs::WINDOWS_1255,
        1256 => encoding_rs::WINDOWS_1256,
        1257 => encoding_rs::WINDOWS_1257,
        1258 => encoding_rs::WINDOWS_1258,
        10000 => encoding_rs::MACINTOSH,
        10017 => encoding_rs::X_MAC_CYRILLIC,
        20866 => encoding_rs::KOI8_R,
        20932 => encoding_rs::EUC_JP,
        21866 => encoding_rs::KOI8_U,
        28592 => encoding_rs::ISO_8859_2,
        28593 => encoding_rs::ISO_8859_3,
        28594 => encoding_rs::ISO_8859_4,
        28595 => encoding_rs::ISO_8859_5,
        28596 => encoding_rs::ISO_8859_6,
        28597 => encoding_rs::ISO_8859_7,
        28598 => encoding_rs::ISO_8859_8,
        28603 => encoding_rs::ISO_8859_13,
        28605 => encoding_rs::ISO_8859_15,
        38598 => encoding_rs::ISO_8859_8_I,
        50220 => encoding_rs::ISO_2022_JP,
        54936 => encoding_rs::GB18030,
        65001 => encoding_rs::UTF_8,
        _ => encoding_rs::WINDOWS_1252,
    }
}

#[cfg(feature = "ffi")]
#[no_mangle]
pub unsafe extern "C" fn csv_reader_new(
    path: *const c_char,
    delimiter: u8,
    string_separator: u8,
    line_ending: u8,
    encoding: u32,
) -> *mut CsvReaderWithMap {
    if path.is_null() {
        return ptr::null_mut();
    }
    let path = std::ffi::CStr::from_ptr(path).to_string_lossy().into_owned();
    let reader = CsvReaderWithMap::open(
        path,
        &CsvConfig::new(
            delimiter,
            string_separator,
            line_ending,
            encode_solver(encoding),
            Vec::new(),
            false,
        ),
    );
    match reader {
        Ok(reader) => Box::into_raw(Box::new(reader)),
        Err(e) => {
            eprintln!("Error opening CSV file: {}", e);
            ptr::null_mut()
        }
    }
}

#[cfg(feature = "ffi")]
#[no_mangle]
pub unsafe extern "C" fn destroy_csv_reader(reader: *mut CsvReaderWithMap) {
    if !reader.is_null() {
        drop(Box::from_raw(reader));
    }
}

#[cfg(feature = "ffi")]
#[no_mangle]
pub unsafe extern "C" fn reader_next_row(reader: *mut CsvReaderWithMap, encoder: u32) -> *mut c_char {
    if reader.is_null() {
        return ptr::null_mut();
    }
    let rdr = &mut *reader;
    let enc = encode_solver(encoder);
    if let Some(row) = rdr.next_raw() {
        let (decoded, _, had_errors) = enc.decode(&row);
        if had_errors {
            return ptr::null_mut();
        }
        match std::ffi::CString::new(decoded.as_bytes()) {
            Ok(cstring) => cstring.into_raw(),
            Err(_) => ptr::null_mut(),
        }
    } else {
        ptr::null_mut()
    }
}

§Performance

This library is designed to process large CSV files.
Successfully tested on a 1 billion lines CSV file. To test performance, run in release config, it improves a lot the performance

§Contact

If you have any questions, contact me on LinkedIn

Modules§

csv
extensions
helpers
io
models