sql-stream 0.1.0

A production-grade CLI tool for executing SQL queries against CSV/JSON files using streaming architecture
Documentation
# SQL Stream


A CLI tool for executing SQL queries against CSV/JSON files using a zero-copy, streaming architecture powered by Apache DataFusion and Apache Arrow.

## Installation


### From Cargo (Recommended)


```bash
cargo install sql-stream
```

### From GitHub Releases


Download pre-built binaries for your platform from the [releases page](https://github.com/1cbyc/sql-stream/releases):

**Linux (x86_64)**
```bash
wget https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-linux-x86_64
chmod +x sql-stream-linux-x86_64
sudo mv sql-stream-linux-x86_64 /usr/local/bin/sql-stream
```

**macOS (Intel)**
```bash
curl -L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-x86_64 -o sql-stream
chmod +x sql-stream
sudo mv sql-stream /usr/local/bin/
```

**macOS (Apple Silicon)**
```bash
curl -L https://github.com/1cbyc/sql-stream/releases/latest/download/sql-stream-macos-aarch64 -o sql-stream
chmod +x sql-stream
sudo mv sql-stream /usr/local/bin/
```

**Windows**
- Download `sql-stream-windows-x86_64.exe` from the releases page
- Rename to `sql-stream.exe` and add to your PATH

### From Source


```bash
git clone https://github.com/1cbyc/sql-stream.git
cd sql-stream
cargo build --release
```

The binary will be available at `target/release/sql-stream`.

## Usage


### Basic Query on CSV


```bash
sql-stream -f data.csv -q "SELECT * FROM data LIMIT 10"
```

### Custom Table Name


```bash
sql-stream -f employees.csv -t employees -q "SELECT name, salary FROM employees WHERE age > 30"
```

### JSON Files


```bash
sql-stream -f data.json -q "SELECT COUNT(*) as total FROM data"
```

### Aggregations and Group By


```bash
sql-stream -f sales.csv -q "SELECT region, SUM(revenue) as total_revenue FROM data GROUP BY region ORDER BY total_revenue DESC"
```

### Enable Verbose Logging


```bash
sql-stream -f data.csv -q "SELECT * FROM data" --verbose
```

## Command Line Options


```
sql-stream -f <FILE> -q <SQL> [OPTIONS]

Options:
  -f, --file <FILE>           Path to CSV or JSON file (required)
  -q, --query <SQL>           SQL query to execute (required)
  -t, --table-name <NAME>     Table name for SQL queries (default: "data")
  -v, --verbose               Enable verbose debug logging
  -h, --help                  Print help information
  -V, --version               Print version information
```

## Examples


### Data Analysis


```bash
# Find top 5 highest salaries

sql-stream -f employees.csv -q "SELECT name, salary FROM data ORDER BY salary DESC LIMIT 5"

# Calculate average age by city

sql-stream -f employees.csv -q "SELECT city, AVG(age) as avg_age FROM data GROUP BY city"

# Filter and count

sql-stream -f sales.json -q "SELECT product, COUNT(*) as count FROM data WHERE revenue > 1000 GROUP BY product"
```

### Complex Queries


```bash
# Multiple aggregations

sql-stream -f data.csv -q "
  SELECT 
    category,
    COUNT(*) as count,
    AVG(price) as avg_price,
    MAX(price) as max_price,
    MIN(price) as min_price
  FROM data
  GROUP BY category
  HAVING count > 10
  ORDER BY avg_price DESC
"
```

## Architecture


SQL Stream is built with a modular architecture:

- **Engine Module** (`src/engine.rs`): DataFusion SessionContext management and query execution
- **CLI Module** (`src/cli.rs`): Argument parsing with clap
- **Error Module** (`src/error.rs`): Type-safe error handling with thiserror
- **Main Binary** (`src/main.rs`): Async runtime and orchestration

### Technology Stack


- **Apache DataFusion** (45.x): SQL query engine
- **Apache Arrow** (53.x): In-memory columnar format
- **Tokio**: Async runtime
- **Clap** (v4): CLI parsing
- **Tracing**: Structured logging

## Performance Considerations


- **Zero-Copy Streaming**: DataFusion processes data using Arrow's columnar format without unnecessary copying
- **Lazy Evaluation**: Queries are optimized before execution
- **Parallel Processing**: Multi-threaded execution for CPU-intensive operations
- **Memory Efficiency**: Streaming results prevent loading entire datasets into memory

For performance profiling, see [docs/PROFILING.md](docs/PROFILING.md).

## Development


### Prerequisites


- Rust 1.70 or later
- Cargo

### Building


```bash
# Debug build

cargo build

# Release build

cargo build --release
```

### Testing


```bash
# Run all tests

cargo test

# Run with output

cargo test -- --nocapture

# Run specific test

cargo test test_csv_simple_select
```

### Code Quality


```bash
# Format code

cargo fmt

# Run linter

cargo clippy -- -D warnings

# Generate documentation

cargo doc --open
```

## CI/CD


The project includes GitHub Actions workflows:

- **CI Workflow**: Runs on every push and pull request
  - Format checking (`cargo fmt`)
  - Linting (`cargo clippy`)
  - Cross-platform tests (Linux, macOS, Windows)
  
- **Release Workflow**: Automated releases on version tags
  - Builds binaries for Linux (x86_64, musl), Windows (x86_64), macOS (Intel, Apple Silicon)
  - Uploads release artifacts to GitHub
  - Publishes to crates.io

## Using as a Library


SQL Stream can be used as a Rust library in your projects:

```toml
[dependencies]
sql-stream = "0.1"
```

```rust
use sql_stream::{QueryEngine, SqlStreamError};

#[tokio::main]

async fn main() -> Result<(), SqlStreamError> {
    let mut engine = QueryEngine::new()?;
    
    // Register a CSV file as a table
    engine.register_file("data.csv", "my_table").await?;
    
    // Execute a SQL query
    let results = engine.execute_query("SELECT * FROM my_table WHERE age > 30").await?;
    
    // Print results
    engine.print_results(results).await?;
    
    Ok(())
}
```

See the [API documentation](https://docs.rs/sql-stream) for more details.

## Contributing


Contributions are welcome! Please ensure:

1. All tests pass (`cargo test`)
2. Code is formatted (`cargo fmt`)
3. No clippy warnings (`cargo clippy`)
4. Documentation is updated

## License


This project is dual-licensed under MIT OR Apache-2.0.

## Acknowledgments


Built with:
- [Apache DataFusion]https://github.com/apache/arrow-datafusion
- [Apache Arrow]https://github.com/apache/arrow-rs
- [Tokio]https://tokio.rs/