Expand description
dsq-core: Core library for dsq data processing
This crate provides the core functionality for dsq, a data processing tool that extends
jq-ish syntax to work with structured data formats like Parquet, Avro, CSV, and more.
dsq leverages Polars DataFrames to provide high-performance
data manipulation across multiple file formats.
§Features
- Format Flexibility: Support for CSV, TSV, Parquet, Avro, JSON Lines, Arrow, and JSON
- Performance: Built on
PolarsDataFrameswith lazy evaluation and columnar operations - Type Safety: Proper type handling with clear error messages
§Quick Start
ⓘ
use dsq_core::{Value, ops, io};
// Read a CSV file
let data = io::read_file_sync("data.csv", &io::ReadOptions::default())?;
// Apply operations
let result = ops::OperationPipeline::new()
.select(vec!["name".to_string(), "age".to_string()])
.sort(vec![ops::SortOptions::desc("age".to_string())])
.head(10)
.execute(data)?;
// Write to Parquet
io::write_file_sync(&result, "output.parquet", &io::WriteOptions::default())?;§Architecture
The library is organized into several key modules:
value- Core value type that bridges JSON andDataFramesops- Data operations (select, filter, aggregate, join, transform)- [
io] - Input/output for various file formats - [
filter] - jq-compatible filter compilation and execution error- Error handling and result typesformat- File format detection and metadata
§Examples
§Basic DataFrame Operations
ⓘ
use dsq_core::{Value, ops::basic::*};
use polars::prelude::*;
let df = df! {
"name" => ["Alice", "Bob", "Charlie"],
"age" => [30, 25, 35],
"department" => ["Engineering", "Sales", "Engineering"]
}?;
let data = Value::DataFrame(df);
// Select columns
let selected = select_columns(&data, &["name".to_string(), "age".to_string()])?;
// Sort by age
let sorted = sort_by_columns(&selected, &[SortOptions::desc("age")])?;
// Take first 2 rows
let result = head(&sorted, 2)?;§Aggregation Operations
ⓘ
use dsq_core::{Value, ops::aggregate::*};
// Group by department and calculate statistics
let aggregated = group_by_agg(
&data,
&["department".to_string()],
&[
AggregationFunction::Count,
AggregationFunction::Mean("age".to_string()),
AggregationFunction::Sum("salary".to_string()),
]
)?;§Join Operations
ⓘ
use dsq_core::{Value, ops::join::*};
let keys = JoinKeys::on(vec!["id".to_string()]);
let options = JoinOptions {
join_type: JoinType::Inner,
..Default::default()
};
let joined = join(&left_data, &right_data, &keys, &options)?;§Format Conversion
ⓘ
use dsq_core::io;
// Convert CSV to Parquet
io::convert_file(
"data.csv",
"data.parquet",
&io::ReadOptions::default(),
&io::WriteOptions::default()
)?;§Filter Execution
ⓘ
use dsq_core::filter::{FilterExecutor, ExecutorConfig};
let mut executor = FilterExecutor::with_config(
ExecutorConfig {
lazy_evaluation: true,
dataframe_optimizations: true,
..Default::default()
}
);
// Execute jq-style filter on DataFrame
let result = executor.execute_str(
r#"map(select(.age > 30)) | sort_by(.name)"#,
data
)?;§Error Handling
All operations return Result<T> where errors are represented by the Error type:
ⓘ
use dsq_core::{Error, Result, TypeError, FormatError};
match some_operation() {
Ok(value) => println!("Success: {:?}", value),
Err(Error::Type(TypeError::InvalidConversion { from, to })) => {
eprintln!("Cannot convert from {} to {}", from, to);
}
Err(Error::Format(FormatError::Unknown(format))) => {
eprintln!("Unknown format: {}", format);
}
Err(e) => eprintln!("Other error: {}", e),
}§Performance Tips
- Use lazy evaluation for large datasets with
LazyFrame - Prefer columnar operations over row-by-row processing
- Use appropriate data types to minimize memory usage
- Consider using streaming for very large files that don’t fit in memory
- Enable DataFrame-specific optimizations in the filter executor
§Feature Flags
This crate supports several optional features:
default- Includes all commonly used functionalityio-csv- CSV/TSV reading and writing supportio-parquet- Parquet format supportio-json- JSON and JSON Lines supportio-avro- Avro format support (planned)io-arrow- Arrow IPC format supportfilter- jq-compatible filter compilation and executionrepl- Interactive REPL support (for CLI usage)
Re-exports§
pub use crate::error::Error;pub use crate::error::FilterError;pub use crate::error::FormatError;pub use crate::error::Result;pub use crate::error::TypeError;pub use ops::recommended_batch_size;pub use ops::supports_operation;pub use ops::Operation;pub use ops::OperationPipeline;pub use ops::OperationType;pub use ops::basic::count;pub use ops::basic::filter_values;pub use ops::basic::head;pub use ops::basic::reverse;pub use ops::basic::select_columns;pub use ops::basic::slice;pub use ops::basic::sort_by_columns;pub use ops::basic::tail;pub use ops::basic::unique;pub use ops::basic::SortOptions;pub use ops::aggregate::group_by;pub use ops::aggregate::group_by_agg;pub use ops::aggregate::pivot;pub use ops::aggregate::unpivot;pub use ops::aggregate::AggregationFunction;pub use ops::aggregate::WindowFunction;pub use ops::join::inner_join;pub use ops::join::join;pub use ops::join::left_join;pub use ops::join::outer_join;pub use ops::join::right_join;pub use ops::join::JoinKeys;pub use ops::join::JoinOptions;pub use ops::join::JoinType;pub use ops::transform::Transform;pub use utils::array;pub use utils::object;
Modules§
- error
- Error types and handling
- ops
- Operations module for dsq
- prelude
- Prelude module for convenient imports
- utils
- Utility functions for working with dsq
Structs§
- Build
Info - Build information structure
- Format
Options - Options for format detection
Enums§
- Data
Format - Supported data formats for reading and writing
- Value
Constants§
- BUILD_
INFO - Build information for dsq-core
- VERSION
- Version information
Functions§
- detect_
format_ from_ content - Detect format from file content (magic bytes)