Skip to main content

FromCsvConfig

Struct FromCsvConfig 

Source
pub struct FromCsvConfig {
    pub delimiter: u8,
    pub has_headers: bool,
    pub trim: bool,
    pub max_rows: usize,
    pub infer_schema: bool,
    pub sample_rows: usize,
    pub list_key: Option<String>,
    pub max_columns: usize,
    pub max_cell_size: usize,
    pub max_total_size: usize,
    pub max_header_size: usize,
}
Expand description

Configuration for CSV parsing.

This structure controls all aspects of CSV parsing behavior, including delimiters, headers, whitespace handling, security limits, and custom list naming.

§Examples

§Default Configuration

let config = FromCsvConfig::default();
assert_eq!(config.delimiter, b',');
assert!(config.has_headers);
assert!(config.trim);
assert_eq!(config.max_rows, 1_000_000);
assert_eq!(config.list_key, None);

§Tab-Delimited without Headers

let config = FromCsvConfig {
    delimiter: b'\t',
    has_headers: false,
    ..Default::default()
};

§Custom Row Limit for Large Datasets

let config = FromCsvConfig {
    max_rows: 10_000_000, // Allow up to 10M rows
    ..Default::default()
};

§Disable Whitespace Trimming

let config = FromCsvConfig {
    trim: false,
    ..Default::default()
};

§Enable Schema Inference

let config = FromCsvConfig {
    infer_schema: true,
    sample_rows: 200, // Sample first 200 rows
    ..Default::default()
};

§Custom List Key for Irregular Plurals

// For "Person" type, use "people" instead of default "persons"
let config = FromCsvConfig {
    list_key: Some("people".to_string()),
    ..Default::default()
};

Fields§

§delimiter: u8

Field delimiter character (default: ,).

Common alternatives:

  • b'\t' - Tab-separated values (TSV)
  • b';' - Semicolon-separated (common in European locales)
  • b'|' - Pipe-separated
§has_headers: bool

Whether the first row contains column headers (default: true).

When true, the first row is interpreted as column names and not included in the data. When false, all rows are treated as data.

§trim: bool

Whether to trim leading/trailing whitespace from fields (default: true).

When true, fields like " value " become "value". This is generally recommended to handle inconsistently formatted CSV files.

§max_rows: usize

Maximum number of rows to parse (default: 1,000,000).

This security limit prevents memory exhaustion from maliciously large CSV files. Processing stops with an error if more rows are encountered.

§Security Impact

  • DoS Protection: Prevents attackers from causing memory exhaustion
  • Memory Bound: Limits worst-case memory usage to approximately max_rows × avg_row_size × columns
  • Recommended Values:
    • Small deployments: 100,000 - 1,000,000 rows
    • Large deployments: 1,000,000 - 10,000,000 rows
    • Batch processing: Adjust based on available RAM

§Example

// For processing very large datasets on a high-memory server
let config = FromCsvConfig {
    max_rows: 50_000_000,
    ..Default::default()
};
§infer_schema: bool

Whether to automatically infer column types from data (default: false).

When true, the parser samples the first sample_rows to determine the most specific type for each column. When false, uses standard per-value type inference.

§Type Inference Hierarchy (most to least specific)

  1. Null: All values are empty/null
  2. Bool: All values are “true” or “false”
  3. Int: All values parse as integers
  4. Float: All values parse as floats
  5. String: Fallback for all other cases

§Example

let config = FromCsvConfig {
    infer_schema: true,
    sample_rows: 100,
    ..Default::default()
};
§sample_rows: usize

Number of rows to sample for schema inference (default: 100).

Only used when infer_schema is true. Larger sample sizes provide more accurate type detection but slower initial processing.

§Trade-offs

  • Small (10-50): Fast inference, may miss edge cases
  • Medium (100-500): Balanced accuracy and performance
  • Large (1000+): High accuracy, slower for large datasets
§list_key: Option<String>

Custom key name for the matrix list in the document (default: None).

When None, the list key is automatically generated by adding ‘s’ to the lowercased type name (e.g., “Person” → “persons”). When Some, uses the specified custom key instead.

§Use Cases

  • Irregular Plurals: “Person” → “people” instead of “persons”
  • Collective Nouns: “Data” → “dataset” instead of “datas”
  • Custom Naming: Any non-standard naming convention
  • Case-Sensitive Keys: Preserve specific casing requirements

§Examples

§Irregular Plural

let csv = "id,name\n1,Alice\n";
let config = FromCsvConfig {
    list_key: Some("people".to_string()),
    ..Default::default()
};
let doc = from_csv_with_config(csv, "Person", &["name"], config).unwrap();
assert!(doc.get("people").is_some()); // Uses custom plural
assert!(doc.get("persons").is_none()); // Default plural not used

§Collective Noun

let csv = "id,value\n1,42\n";
let config = FromCsvConfig {
    list_key: Some("dataset".to_string()),
    ..Default::default()
};
let doc = from_csv_with_config(csv, "Data", &["value"], config).unwrap();
assert!(doc.get("dataset").is_some());

§Case-Sensitive Key

let csv = "id,value\n1,test\n";
let config = FromCsvConfig {
    list_key: Some("MyCustomList".to_string()),
    ..Default::default()
};
let doc = from_csv_with_config(csv, "Item", &["value"], config).unwrap();
assert!(doc.get("MyCustomList").is_some());
§max_columns: usize

Maximum number of columns allowed (default: 10,000).

This security limit prevents “column bomb” attacks where malicious CSV files contain excessive columns that cause memory exhaustion and slow processing.

§Security Impact

  • DoS Protection: Prevents attackers from creating CSVs with 50,000+ columns
  • Memory Bound: Limits worst-case memory usage for column metadata
  • Industry Comparison: Excel (16,384), Google Sheets (18,278), PostgreSQL (~1,600)
  • Recommended Values:
    • Web uploads: 1,000 - 10,000 columns
    • Internal processing: 10,000 - 50,000 columns
    • Scientific data: Adjust based on requirements

§Example

// For processing wide scientific datasets
let config = FromCsvConfig {
    max_columns: 50_000,
    ..Default::default()
};
§max_cell_size: usize

Maximum size of a single cell in bytes (default: 1MB).

This security limit prevents “cell bomb” attacks where malicious CSV files contain enormous individual cells that cause memory exhaustion.

§Security Impact

  • DoS Protection: Prevents attackers from using 10MB+ cells
  • Memory Bound: Each cell is read into memory as a String
  • Cumulative: Multiple large cells multiply the impact
  • Recommended Values:
    • Web uploads: 64KB - 1MB
    • Internal processing: 1MB - 10MB
    • Text-heavy data: Adjust based on requirements

§Example

// For processing long text fields (e.g., descriptions, comments)
let config = FromCsvConfig {
    max_cell_size: 5_242_880, // 5MB
    ..Default::default()
};
§max_total_size: usize

Maximum total CSV size in bytes after decompression (default: 100MB).

This security limit prevents “decompression bomb” attacks where compressed CSV files decompress to enormous sizes. A 1MB gzipped file could decompress to 1GB+, bypassing file size checks.

§Security Impact

  • DoS Protection: Prevents decompression bombs
  • Memory Bound: Tracks total bytes read during parsing
  • Transparent: Works even if CSV library handles decompression
  • Recommended Values:
    • Web uploads: 10MB - 100MB
    • Internal processing: 100MB - 1GB
    • Big data: Adjust based on available RAM

§Example

// For processing large datasets on high-memory servers
let config = FromCsvConfig {
    max_total_size: 1_073_741_824, // 1GB
    ..Default::default()
};
§max_header_size: usize

Maximum size of header row in bytes (default: 1MB).

This security limit prevents “header bomb” attacks where malicious CSV files have enormous column names or excessive total header size.

§Security Impact

  • DoS Protection: Prevents huge column names (e.g., 1MB per column)
  • Memory Bound: Limits memory for header parsing
  • Combined with max_columns: Total size = column_count × avg_name_length
  • Recommended Values:
    • Web uploads: 64KB - 1MB
    • Internal processing: 1MB - 10MB
    • Verbose column naming: Adjust based on requirements

§Example

// For datasets with very descriptive column names
let config = FromCsvConfig {
    max_header_size: 5_242_880, // 5MB
    ..Default::default()
};

Implementations§

Source§

impl FromCsvConfig

Source

pub fn unlimited() -> Self

Creates a config with NO security limits (use for trusted input only).

§Security Warning

This configuration disables ALL security limits. Only use this for:

  • Trusted internal data sources
  • Controlled batch processing environments
  • Known-good CSV files

DO NOT use this for:

  • User uploads
  • Web service inputs
  • Untrusted data sources
§Examples
// For internal batch processing with trusted data
let config = FromCsvConfig::unlimited();
Source

pub fn strict() -> Self

Creates a config with strict limits for untrusted input.

§Security

This configuration provides stricter limits suitable for:

  • Web service uploads
  • User-submitted CSV files
  • Untrusted data sources
  • Rate-limited APIs
§Limits
  • max_rows: 1,000,000 (same as default)
  • max_columns: 1,000 (stricter than default 10,000)
  • max_cell_size: 64KB (stricter than default 1MB)
  • max_total_size: 10MB (stricter than default 100MB)
  • max_header_size: 64KB (stricter than default 1MB)
§Examples
// For user uploads in a web service
let config = FromCsvConfig::strict();

Trait Implementations§

Source§

impl Clone for FromCsvConfig

Source§

fn clone(&self) -> FromCsvConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for FromCsvConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for FromCsvConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.