SQL Stream

A CLI tool for executing SQL queries against CSV/JSON files using a zero-copy, streaming architecture powered by Apache DataFusion and Apache Arrow.

Installation

From Cargo (Recommended)

cargo install sql-stream

From GitHub Releases

Download pre-built binaries for your platform from the releases page:

Linux (x86_64)

wget https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-linux-x86_64

chmod +x sql-stream-linux-x86_64

sudo mv sql-stream-linux-x86_64 /usr/local/bin/sql-stream

macOS (Intel)

curl -L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-x86_64 -o sql-stream

chmod +x sql-stream

sudo mv sql-stream /usr/local/bin/

macOS (Apple Silicon)

curl -L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-aarch64 -o sql-stream

chmod +x sql-stream

sudo mv sql-stream /usr/local/bin/

Windows

Download sql-stream-windows-x86_64.exe from the releases page
Rename to sql-stream.exe and add to your PATH

From Source

git clone https://github.com/1cbyc/sql-stream.git

cd sql-stream

cargo build --release

The binary will be available at target/release/sql-stream.

Usage

Basic Query on CSV

sql-stream -f data.csv -q "SELECT * FROM data LIMIT 10"

Custom Table Name

sql-stream -f employees.csv -t employees -q "SELECT name, salary FROM employees WHERE age > 30"

JSON Files

sql-stream -f data.json -q "SELECT COUNT(*) as total FROM data"

Aggregations and Group By

sql-stream -f sales.csv -q "SELECT region, SUM(revenue) as total_revenue FROM data GROUP BY region ORDER BY total_revenue DESC"

Enable Verbose Logging

sql-stream -f data.csv -q "SELECT * FROM data" --verbose

Command Line Options

sql-stream -f <FILE> -q <SQL> [OPTIONS]

Options:
  -f, --file <FILE>           Path to CSV or JSON file (required)
  -q, --query <SQL>           SQL query to execute (required)
  -t, --table-name <NAME>     Table name for SQL queries (default: "data")
  -v, --verbose               Enable verbose debug logging
  -h, --help                  Print help information
  -V, --version               Print version information

Examples

Data Analysis

# Find top 5 highest salaries

sql-stream -f employees.csv -q "SELECT name, salary FROM data ORDER BY salary DESC LIMIT 5"


# Calculate average age by city

sql-stream -f employees.csv -q "SELECT city, AVG(age) as avg_age FROM data GROUP BY city"


# Filter and count

sql-stream -f sales.json -q "SELECT product, COUNT(*) as count FROM data WHERE revenue > 1000 GROUP BY product"

Complex Queries

# Multiple aggregations

sql-stream -f data.csv -q "
  SELECT 
    category,
    COUNT(*) as count,
    AVG(price) as avg_price,
    MAX(price) as max_price,
    MIN(price) as min_price
  FROM data
  GROUP BY category
  HAVING count > 10
  ORDER BY avg_price DESC
"

Architecture

SQL Stream is built with a modular architecture:

Engine Module (src/engine.rs): DataFusion SessionContext management and query execution
CLI Module (src/cli.rs): Argument parsing with clap
Error Module (src/error.rs): Type-safe error handling with thiserror
Main Binary (src/main.rs): Async runtime and orchestration

Technology Stack

Apache DataFusion (45.x): SQL query engine
Apache Arrow (53.x): In-memory columnar format
Tokio: Async runtime
Clap (v4): CLI parsing
Tracing: Structured logging

Performance Considerations

Zero-Copy Streaming: DataFusion processes data using Arrow's columnar format without unnecessary copying
Lazy Evaluation: Queries are optimized before execution
Parallel Processing: Multi-threaded execution for CPU-intensive operations
Memory Efficiency: Streaming results prevent loading entire datasets into memory

For performance profiling, see docs/PROFILING.md.

Development

Prerequisites

Rust 1.70 or later
Cargo

Building

# Debug build

cargo build


# Release build

cargo build --release

Testing

# Run all tests

cargo test


# Run with output

cargo test -- --nocapture


# Run specific test

cargo test test_csv_simple_select

Code Quality

# Format code

cargo fmt


# Run linter

cargo clippy -- -D warnings


# Generate documentation

cargo doc --open

CI/CD

The project includes GitHub Actions workflows:

CI Workflow: Runs on every push and pull request
- Format checking (cargo fmt)
- Linting (cargo clippy)
- Cross-platform tests (Linux, macOS, Windows)
Release Workflow: Automated releases on version tags
- Builds binaries for Linux (x86_64, musl), Windows (x86_64), macOS (Intel, Apple Silicon)
- Uploads release artifacts to GitHub
- Publishes to crates.io

Using as a Library

SQL Stream can be used as a Rust library in your projects:

[dependencies]

sql-stream = "0.1"

use sql_stream::{QueryEngine, SqlStreamError};

#[tokio::main]
async fn main() -> Result<(), SqlStreamError> {
    let mut engine = QueryEngine::new()?;
    
    // Register a CSV file as a table
    engine.register_file("data.csv", "my_table").await?;
    
    // Execute a SQL query
    let results = engine.execute_query("SELECT * FROM my_table WHERE age > 30").await?;
    
    // Print results
    engine.print_results(results).await?;
    
    Ok(())
}

See the API documentation for more details.

Contributing

Contributions are welcome! Please ensure:

All tests pass (cargo test)
Code is formatted (cargo fmt)
No clippy warnings (cargo clippy)
Documentation is updated

License

This project is dual-licensed under MIT OR Apache-2.0.

Acknowledgments

Built with:

sql-stream 0.1.0