csvkit: CSV Processing Toolkit in Rust
csvkit is a CSV file processing toolkit written in Rust. It aims to provide a user-friendly interface, closely mirroring the Python csvkit API, for efficient and reliable CSV manipulation. csvkit-rs leverages Rust's performance, safety, and ease of use to provide a superior CSV processing experience.
Key Features
- Familiar API: Designed to be as close as possible to the Python csvkit API, making it easy for Python users to transition.
- High Performance: Utilizes Rust's performance benefits for fast and efficient CSV processing.
- Reliability: Leverages Rust's strong type system and memory safety guarantees for robust and stable operation.
- Modular Design: Separates Reader and Writer functionalities, enabling flexible CSV processing pipelines.
Installation
-
Install Rust: If you haven't already, install Rust from rustup.rs.
-
add csvkit from github:
cargo add --git https://github.com/StatPan/csvkit
Usage
Reader (DictReader)
The DictReader reads CSV files and returns each row as a HashMap<String, String>.
use csvkit::{
reader::{DictReader, ReaderOptions},
writer::{DictWriter, WriterOptions},
};
use std::fs::File;
use std::io::BufReader;
use std::collections::HashMap;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = File::open("data.csv")?;
let reader = BufReader::new(file);
let options = ReaderOptions::default();
let mut dict_reader = DictReader::new(reader, options)?;
for record in dict_reader {
let row: HashMap<String, String> = record?;
println!("{:?}", row);
}
Ok(())
}
DictReader::new(reader, options): Creates a DictReader with a BufReader and ReaderOptions.
for record in dict_reader: DictReader implements the Iterator trait, allowing you to iterate over each row in the CSV file.
record?: Each row is returned as a Result<HashMap<String, String>, Box<dyn Error>>, so the ? operator is used for error handling.
Writer (DictWriter)
The DictWriter writes data in the form of HashMap<String, String> to a CSV file.
use csvkit::{
reader::ReaderOptions, writer::{DictWriter, WriterOptions},
};
use std::fs::File;
use std::collections::HashMap;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = File::create("output.csv")?;
let fieldnames = vec!["header1".to_string(), "header2".to_string()];
let options = WriterOptions::default();
let mut writer = DictWriter::new(file, fieldnames, options);
writer.writeheader()?;
let mut row1 = HashMap::new();
row1.insert("header1".to_string(), "value1".to_string());
row1.insert("header2".to_string(), "value2".to_string());
writer.writerow(row1)?;
let mut row2 = HashMap::new();
row2.insert("header1".to_string(), "value3".to_string());
row2.insert("header2".to_string(), "value4".to_string());
writer.writerow(row2)?;
Ok(())
}
Writerows
The writerows method allows you to write multiple rows at once.
use csvkit::{
reader::ReaderOptions, writer::{DictWriter, WriterOptions},
};
use std::fs::File;
use std::collections::HashMap;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = File::create("output.csv")?;
let fieldnames = vec!["header1".to_string(), "header2".to_string()];
let options = WriterOptions::default();
let mut writer = DictWriter::new(file, fieldnames, options);
writer.writeheader()?;
let mut rows: Vec<HashMap<String, String>> = Vec::new();
let mut row1 = HashMap::new();
row1.insert("header1".to_string(), "value1".to_string());
row1.insert("header2".to_string(), "value2".to_string());
rows.push(row1);
let mut row2 = HashMap::new();
row2.insert("header1".to_string(), "value3".to_string());
row2.insert("header2".to_string(), "value4".to_string());
rows.push(row2);
writer.writerows(rows)?;
Ok(())
}
Options
You can control the CSV processing behavior using the ReaderOptions and WriterOptions structs.
delimiter: The field delimiter (default: ,)
quotechar: The quote character (default: ")