Crate qsv_sniffer

source ·
Expand description

This csv-sniffer crate provides methods to infer CSV file details (delimiter choice, quote character, number of fields, field data types, etc.).

§Overview

The Sniffer type is the primary entry point for using this crate. Its Sniffer::open_path and Sniffer::open_reader methods return a configured csv::Reader.

Alternatively, the Sniffer::sniff_path and Sniffer::sniff_reader methods return a Metadata object containing the deduced details about the underlying CSV input.

This sniffer detects the following metadata about a CSV file:

  • Delimiter – byte character between fields in a record
  • Has a header row? – whether or not the first row of the data file provdes column headers
  • Number of preamble rows – number of rows in a CSV file before the data starts (occasionally used in data files to introduce the data)
  • Quote – byte character (either “, ’, or `) used to quote fields, or that the file has no quotes
  • Flexible – whether or not records are all of the same length
  • Is utf8-encoded? – whether the file is utf-8 encoded
  • Number of delimiter/fields – maximum number of delimiters in each row (and therefore number of fields in each row)
  • Field names - the name of each field
  • Types – the inferred data type of each field in the data table

See Metadata for full information about what the sniffer returns.

§Setup

Add this to your Cargo.toml:

[dependencies]
csv-sniffer = "0.1"

and this to your crate root:

extern crate qsv_sniffer;

§Example

This example shows how to write a simple command-line tool for discovering the metadata of a CSV file:

extern crate qsv_sniffer;

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} <file>", args[0]);
        ::std::process::exit(1);
    }

    // sniff the path provided by the first argument
    match qsv_sniffer::Sniffer::new().sniff_path(&args[1]) {
        Ok(metadata) => {
            println!("{}", metadata);
        },
        Err(err) => {
            eprintln!("ERROR: {}", err);
        }
    }
}

This example is provided as the primary binary for this crate. In the source directory, this can be run as:

$ cargo run -- tests/data/library-visitors.csv

Modules§

  • Error types and conversions for the csv-sniffer crate.
  • CSV metadata types.

Structs§

Enums§

  • Argument used when calling date_preference on Sniffer.
  • Argument used when calling sample_size on Sniffer.
  • The valid field types for fields in a CSV record.