# dataload
A flexible Rust library for loading CSV and Excel files into [Polars](https://pola.rs) DataFrames.
## Features
- **Automatic file type detection** via magic bytes and file extensions
- **Smart delimiter detection** for CSV files (comma, tab, semicolon, pipe)
- **Excel support** for xlsx, xls, xlsm, xlsb, and ods formats
- **Builder-pattern API** for flexible configuration
- **Feature flags** to minimize dependencies
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
dataload = "0.1"
```
To disable Excel support (reduces compile time and dependencies):
```toml
[dependencies]
dataload = { version = "0.1", default-features = false, features = ["csv"] }
```
## Quick Start
```rust
use dataload::{DataLoader, load_file, load_bytes};
use std::path::Path;
// Simple one-liner
let df = load_file(Path::new("data.csv"))?;
// From bytes
let csv_data = b"name,age\nAlice,30\nBob,25";
let df = load_bytes(csv_data, "data.csv")?;
// With custom options
let df = DataLoader::new()
.with_delimiter(dataload::Delimiter::Tab)
.with_header(false)
.with_skip_rows(1)
.with_max_rows(Some(1000))
.load_file(Path::new("data.tsv"))?;
```
## CSV Loading
The library automatically detects the delimiter by analyzing the file content:
```rust
use dataload::load_bytes;
// Auto-detects comma delimiter
let df = load_bytes(b"a,b,c\n1,2,3", "data.csv")?;
// Auto-detects tab delimiter
let df = load_bytes(b"a\tb\tc\n1\t2\t3", "data.tsv")?;
// Auto-detects semicolon delimiter
let df = load_bytes(b"a;b;c\n1;2;3", "data.csv")?;
```
Or specify a delimiter explicitly:
```rust
use dataload::{DataLoader, Delimiter};
let df = DataLoader::new()
.with_delimiter(Delimiter::Pipe)
.load_bytes(b"a|b|c\n1|2|3", "data.txt")?;
```
## Excel Loading
```rust
use dataload::DataLoader;
use std::path::Path;
// Load first sheet (default)
let df = DataLoader::new()
.load_file(Path::new("report.xlsx"))?;
// Load specific sheet by name
let df = DataLoader::new()
.with_sheet_name("Sales Data")
.load_file(Path::new("report.xlsx"))?;
// Load specific sheet by index (0-based)
let df = DataLoader::new()
.with_sheet_index(2)
.load_file(Path::new("report.xlsx"))?;
// List available sheets
let sheets = dataload::list_sheets(&file_bytes)?;
```
## Configuration Options
| `delimiter` | CSV delimiter (`Auto`, `Comma`, `Tab`, `Semicolon`, `Pipe`, `Custom(u8)`) | `Auto` |
| `has_header` | First row is header | `true` |
| `skip_rows` | Rows to skip from start | `0` |
| `max_rows` | Maximum rows to read | `None` (all) |
| `sheet_index` | Excel sheet index (0-based) | `None` (first) |
| `sheet_name` | Excel sheet name | `None` |
| `infer_schema` | Infer column types | `true` |
| `infer_schema_length` | Rows for type inference | `Some(1000)` |
## Error Handling
All operations return `Result<T, DataLoadError>`:
```rust
use dataload::{load_file, DataLoadError};
use std::path::Path;
match load_file(Path::new("data.csv")) {
Ok(df) => println!("Loaded {} rows", df.height()),
Err(DataLoadError::Io(e)) => eprintln!("File error: {e}"),
Err(DataLoadError::UnsupportedFileType(ext)) => eprintln!("Unknown type: {ext}"),
Err(e) => eprintln!("Error: {e}"),
}
```
## Feature Flags
| `csv` | CSV/TSV file support | ✓ |
| `excel` | Excel file support (xlsx, xls, etc.) | ✓ |
## License
Licensed under Apache License, Version 2.0
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.