FromCsvConfig

Struct FromCsvConfig 

Source
pub struct FromCsvConfig {
    pub delimiter: u8,
    pub has_headers: bool,
    pub trim: bool,
    pub max_rows: usize,
    pub infer_schema: bool,
    pub sample_rows: usize,
    pub list_key: Option<String>,
}
Expand description

Configuration for CSV parsing.

This structure controls all aspects of CSV parsing behavior, including delimiters, headers, whitespace handling, security limits, and custom list naming.

§Examples

§Default Configuration

let config = FromCsvConfig::default();
assert_eq!(config.delimiter, b',');
assert!(config.has_headers);
assert!(config.trim);
assert_eq!(config.max_rows, 1_000_000);
assert_eq!(config.list_key, None);

§Tab-Delimited without Headers

let config = FromCsvConfig {
    delimiter: b'\t',
    has_headers: false,
    ..Default::default()
};

§Custom Row Limit for Large Datasets

let config = FromCsvConfig {
    max_rows: 10_000_000, // Allow up to 10M rows
    ..Default::default()
};

§Disable Whitespace Trimming

let config = FromCsvConfig {
    trim: false,
    ..Default::default()
};

§Enable Schema Inference

let config = FromCsvConfig {
    infer_schema: true,
    sample_rows: 200, // Sample first 200 rows
    ..Default::default()
};

§Custom List Key for Irregular Plurals

// For "Person" type, use "people" instead of default "persons"
let config = FromCsvConfig {
    list_key: Some("people".to_string()),
    ..Default::default()
};

Fields§

§delimiter: u8

Field delimiter character (default: ,).

Common alternatives:

  • b'\t' - Tab-separated values (TSV)
  • b';' - Semicolon-separated (common in European locales)
  • b'|' - Pipe-separated
§has_headers: bool

Whether the first row contains column headers (default: true).

When true, the first row is interpreted as column names and not included in the data. When false, all rows are treated as data.

§trim: bool

Whether to trim leading/trailing whitespace from fields (default: true).

When true, fields like " value " become "value". This is generally recommended to handle inconsistently formatted CSV files.

§max_rows: usize

Maximum number of rows to parse (default: 1,000,000).

This security limit prevents memory exhaustion from maliciously large CSV files. Processing stops with an error if more rows are encountered.

§Security Impact

  • DoS Protection: Prevents attackers from causing memory exhaustion
  • Memory Bound: Limits worst-case memory usage to approximately max_rows × avg_row_size × columns
  • Recommended Values:
    • Small deployments: 100,000 - 1,000,000 rows
    • Large deployments: 1,000,000 - 10,000,000 rows
    • Batch processing: Adjust based on available RAM

§Example

// For processing very large datasets on a high-memory server
let config = FromCsvConfig {
    max_rows: 50_000_000,
    ..Default::default()
};
§infer_schema: bool

Whether to automatically infer column types from data (default: false).

When true, the parser samples the first sample_rows to determine the most specific type for each column. When false, uses standard per-value type inference.

§Type Inference Hierarchy (most to least specific)

  1. Null: All values are empty/null
  2. Bool: All values are “true” or “false”
  3. Int: All values parse as integers
  4. Float: All values parse as floats
  5. String: Fallback for all other cases

§Example

let config = FromCsvConfig {
    infer_schema: true,
    sample_rows: 100,
    ..Default::default()
};
§sample_rows: usize

Number of rows to sample for schema inference (default: 100).

Only used when infer_schema is true. Larger sample sizes provide more accurate type detection but slower initial processing.

§Trade-offs

  • Small (10-50): Fast inference, may miss edge cases
  • Medium (100-500): Balanced accuracy and performance
  • Large (1000+): High accuracy, slower for large datasets
§list_key: Option<String>

Custom key name for the matrix list in the document (default: None).

When None, the list key is automatically generated by adding ‘s’ to the lowercased type name (e.g., “Person” → “persons”). When Some, uses the specified custom key instead.

§Use Cases

  • Irregular Plurals: “Person” → “people” instead of “persons”
  • Collective Nouns: “Data” → “dataset” instead of “datas”
  • Custom Naming: Any non-standard naming convention
  • Case-Sensitive Keys: Preserve specific casing requirements

§Examples

§Irregular Plural

let csv = "id,name\n1,Alice\n";
let config = FromCsvConfig {
    list_key: Some("people".to_string()),
    ..Default::default()
};
let doc = from_csv_with_config(csv, "Person", &["name"], config).unwrap();
assert!(doc.get("people").is_some()); // Uses custom plural
assert!(doc.get("persons").is_none()); // Default plural not used

§Collective Noun

let csv = "id,value\n1,42\n";
let config = FromCsvConfig {
    list_key: Some("dataset".to_string()),
    ..Default::default()
};
let doc = from_csv_with_config(csv, "Data", &["value"], config).unwrap();
assert!(doc.get("dataset").is_some());

§Case-Sensitive Key

let csv = "id,value\n1,test\n";
let config = FromCsvConfig {
    list_key: Some("MyCustomList".to_string()),
    ..Default::default()
};
let doc = from_csv_with_config(csv, "Item", &["value"], config).unwrap();
assert!(doc.get("MyCustomList").is_some());

Trait Implementations§

Source§

impl Clone for FromCsvConfig

Source§

fn clone(&self) -> FromCsvConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FromCsvConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for FromCsvConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.