SQL Stream
A CLI tool for executing SQL queries against CSV/JSON files using a zero-copy, streaming architecture powered by Apache DataFusion and Apache Arrow.
Installation
From Cargo (Recommended)
From GitHub Releases
Download pre-built binaries for your platform from the releases page:
Linux (x86_64)
macOS (Intel)
macOS (Apple Silicon)
Windows
- Download
sql-stream-windows-x86_64.exefrom the releases page - Rename to
sql-stream.exeand add to your PATH
From Source
The binary will be available at target/release/sql-stream.
Usage
Basic Query on CSV
Custom Table Name
JSON Files
Aggregations and Group By
Enable Verbose Logging
Command Line Options
sql-stream -f <FILE> -q <SQL> [OPTIONS]
Options:
-f, --file <FILE> Path to CSV or JSON file (required)
-q, --query <SQL> SQL query to execute (required)
-t, --table-name <NAME> Table name for SQL queries (default: "data")
-v, --verbose Enable verbose debug logging
-h, --help Print help information
-V, --version Print version information
Examples
Data Analysis
# Find top 5 highest salaries
# Calculate average age by city
# Filter and count
Complex Queries
# Multiple aggregations
Architecture
SQL Stream is built with a modular architecture:
- Engine Module (
src/engine.rs): DataFusion SessionContext management and query execution - CLI Module (
src/cli.rs): Argument parsing with clap - Error Module (
src/error.rs): Type-safe error handling with thiserror - Main Binary (
src/main.rs): Async runtime and orchestration
Technology Stack
- Apache DataFusion (45.x): SQL query engine
- Apache Arrow (53.x): In-memory columnar format
- Tokio: Async runtime
- Clap (v4): CLI parsing
- Tracing: Structured logging
Performance Considerations
- Zero-Copy Streaming: DataFusion processes data using Arrow's columnar format without unnecessary copying
- Lazy Evaluation: Queries are optimized before execution
- Parallel Processing: Multi-threaded execution for CPU-intensive operations
- Memory Efficiency: Streaming results prevent loading entire datasets into memory
For performance profiling, see docs/PROFILING.md.
Development
Prerequisites
- Rust 1.70 or later
- Cargo
Building
# Debug build
# Release build
Testing
# Run all tests
# Run with output
# Run specific test
Code Quality
# Format code
# Run linter
# Generate documentation
CI/CD
The project includes GitHub Actions workflows:
-
CI Workflow: Runs on every push and pull request
- Format checking (
cargo fmt) - Linting (
cargo clippy) - Cross-platform tests (Linux, macOS, Windows)
- Format checking (
-
Release Workflow: Automated releases on version tags
- Builds binaries for Linux (x86_64, musl), Windows (x86_64), macOS (Intel, Apple Silicon)
- Uploads release artifacts to GitHub
- Publishes to crates.io
Using as a Library
SQL Stream can be used as a Rust library in your projects:
[]
= "0.1"
use ;
async
See the API documentation for more details.
Contributing
Contributions are welcome! Please ensure:
- All tests pass (
cargo test) - Code is formatted (
cargo fmt) - No clippy warnings (
cargo clippy) - Documentation is updated
License
This project is dual-licensed under MIT OR Apache-2.0.
Acknowledgments
Built with: