dataload 0.1.0

A flexible data loading library for CSV and Excel files with automatic delimiter detection
Documentation
# dataload


A flexible Rust library for loading CSV and Excel files into [Polars](https://pola.rs) DataFrames.

## Features


- **Automatic file type detection** via magic bytes and file extensions
- **Smart delimiter detection** for CSV files (comma, tab, semicolon, pipe)
- **Excel support** for xlsx, xls, xlsm, xlsb, and ods formats
- **Builder-pattern API** for flexible configuration
- **Feature flags** to minimize dependencies

## Installation


Add to your `Cargo.toml`:

```toml
[dependencies]
dataload = "0.1"
```

To disable Excel support (reduces compile time and dependencies):

```toml
[dependencies]
dataload = { version = "0.1", default-features = false, features = ["csv"] }
```

## Quick Start


```rust
use dataload::{DataLoader, load_file, load_bytes};
use std::path::Path;

// Simple one-liner
let df = load_file(Path::new("data.csv"))?;

// From bytes
let csv_data = b"name,age\nAlice,30\nBob,25";
let df = load_bytes(csv_data, "data.csv")?;

// With custom options
let df = DataLoader::new()
    .with_delimiter(dataload::Delimiter::Tab)
    .with_header(false)
    .with_skip_rows(1)
    .with_max_rows(Some(1000))
    .load_file(Path::new("data.tsv"))?;
```

## CSV Loading


The library automatically detects the delimiter by analyzing the file content:

```rust
use dataload::load_bytes;

// Auto-detects comma delimiter
let df = load_bytes(b"a,b,c\n1,2,3", "data.csv")?;

// Auto-detects tab delimiter
let df = load_bytes(b"a\tb\tc\n1\t2\t3", "data.tsv")?;

// Auto-detects semicolon delimiter
let df = load_bytes(b"a;b;c\n1;2;3", "data.csv")?;
```

Or specify a delimiter explicitly:

```rust
use dataload::{DataLoader, Delimiter};

let df = DataLoader::new()
    .with_delimiter(Delimiter::Pipe)
    .load_bytes(b"a|b|c\n1|2|3", "data.txt")?;
```

## Excel Loading


```rust
use dataload::DataLoader;
use std::path::Path;

// Load first sheet (default)
let df = DataLoader::new()
    .load_file(Path::new("report.xlsx"))?;

// Load specific sheet by name
let df = DataLoader::new()
    .with_sheet_name("Sales Data")
    .load_file(Path::new("report.xlsx"))?;

// Load specific sheet by index (0-based)
let df = DataLoader::new()
    .with_sheet_index(2)
    .load_file(Path::new("report.xlsx"))?;

// List available sheets
let sheets = dataload::list_sheets(&file_bytes)?;
```

## Configuration Options


| Option | Description | Default |
|--------|-------------|---------|
| `delimiter` | CSV delimiter (`Auto`, `Comma`, `Tab`, `Semicolon`, `Pipe`, `Custom(u8)`) | `Auto` |
| `has_header` | First row is header | `true` |
| `skip_rows` | Rows to skip from start | `0` |
| `max_rows` | Maximum rows to read | `None` (all) |
| `sheet_index` | Excel sheet index (0-based) | `None` (first) |
| `sheet_name` | Excel sheet name | `None` |
| `infer_schema` | Infer column types | `true` |
| `infer_schema_length` | Rows for type inference | `Some(1000)` |

## Error Handling


All operations return `Result<T, DataLoadError>`:

```rust
use dataload::{load_file, DataLoadError};
use std::path::Path;

match load_file(Path::new("data.csv")) {
    Ok(df) => println!("Loaded {} rows", df.height()),
    Err(DataLoadError::Io(e)) => eprintln!("File error: {e}"),
    Err(DataLoadError::UnsupportedFileType(ext)) => eprintln!("Unknown type: {ext}"),
    Err(e) => eprintln!("Error: {e}"),
}
```

## Feature Flags


| Feature | Description | Default |
|---------|-------------|---------|
| `csv` | CSV/TSV file support ||
| `excel` | Excel file support (xlsx, xls, etc.) ||

## License


Licensed under Apache License, Version 2.0 

## Contributing


Contributions are welcome! Please feel free to submit a Pull Request.