Pirkle — Query CSV and SQLite with PRQL
Pirkle is a fast, lightweight command-line tool that brings the power of PRQL (Pipelined Relational Query Language) to CSV and SQLite files. Transform, filter, and join your data with expressive, readable queries that compile to optimized SQL.
Why Pirkle?
- 🚀 Fast: Built in Rust with optimized SQLite backend
- 📊 Flexible: Query CSV files as if they were database tables
- 🔗 Powerful: Join multiple files and data sources
- 📝 Readable: PRQL's pipeline syntax is intuitive and maintainable
- 🎯 Versatile: Multiple output formats (table, CSV, JSON, logfmt)
Table of Contents
- Features
- Installation
- Quick Start
- Usage
- Common Use Cases
- Performance Tips
- Troubleshooting
- Requirements
- Example Data
- License
Features
- Query CSV files as structured tables
- Join CSV and SQLite files together
- Write expressive queries using PRQL
- Output as a pretty table, CSV, JSON, or logfmt
- Inspect the generated SQL
- View schema information for files
- Lightweight, fast, and written in Rust
Installation
Prebuilt Binaries
Download the latest release for your platform:
| Platform | Download |
|---|---|
| Windows | pirkle-x86_64-pc-windows-msvc.zip |
| macOS (Apple Silicon) | pirkle-aarch64-apple-darwin.tar.gz |
| macOS (Intel) | pirkle-x86_64-apple-darwin.tar.gz |
| Linux (x86_64) | pirkle-x86_64-unknown-linux-musl.tar.gz |
Package Managers
# Cargo (Rust)
# Homebrew (coming soon)
# brew install pirkle
From Source
Install using Rust:
Quick Start
# Install pirkle
# Query a CSV file
# View file structure
Usage
Basic Queries
# Query a CSV file. CSV files are auto-loaded as SQLite tables.
# Query a SQLite file
# Alternative syntax using -- delimiter
Reading from Standard Input
Pirkle supports reading CSV data from standard input, making it easy to pipe data from other commands:
# Pipe data into pirkle
|
# Use stdin with files
|
# Custom table name for stdin data
|
Pipeline Integration
Pirkle integrates seamlessly with Unix pipelines:
# From curl/API responses
|
# From other command output
|
# Complex pipeline
|
Key features:
- Auto-detection: Data on stdin is loaded as a table named "stdin"
- Explicit reference: Use the filename
stdinto read from stdin - Custom naming: Use
stdin:tablenamefor custom table names - Query from stdin: If no query is provided with
--queryor--, Pirkle will read the query from stdin:| - Multiple references: Use the same stdin data with different table names
|
Pirkle intelligently determines how to use stdin based on your command arguments, making it a flexible tool for data pipelines.
Viewing Schema Information
To see the structure of your tables:
# View schemas with the --schema flag
)
)
)
)
)
)
Show SQL without executing
You can use the --show-sql flag to see the SQL that would be generated without executing the query:
)
This also works with PRQL files:
)
)
Output Formats
| Format | Use Case | Example |
|---|---|---|
table |
Human-readable terminal output | Data exploration |
csv |
Spreadsheet import, further processing | pirkle data.csv --format csv > result.csv |
jsonl |
API integration, log analysis | pirkle logs.csv --format jsonl | jq '.' |
logfmt |
Structured logging, monitoring | Integration with log aggregators |
Default is a readable table format.
To output CSV:
Other supported formats:
# JSON Lines format
}
}
}
# logfmt format
id="1" name="John Smith" department="Engineering" age="32" salary="85000" country="USA"
id="3" name="Robert Johnson" department="Engineering" age="41" salary="92000" country="USA"
id="9" name="James Brown" department="Sales" age="39" salary="85000" country="USA"
Using PRQL files
You can use prewritten PRQL query files:
# Use a PRQL file directly with --query
Joining tables
To join tables, use the join operation:
Common Use Cases
Data Analysis
# Find average salary and employee count by department
Data Cleaning
# Remove duplicates and filter valid records
Joining Data Sources
# Combine sales data with customer information
Time Series Analysis
# Analyze daily sales trends
Data Exploration
# Quick summary statistics
Performance Tips
- Schema inference: Pirkle automatically detects column types for optimal performance
- Memory usage: Large CSV files are streamed efficiently through SQLite
- Query optimization: PRQL compiles to optimized SQL - complex queries often perform better than you'd expect
- File formats: SQLite files are queried directly without loading into memory
- Early filtering: For large datasets, filter early in your pipeline to reduce processing overhead
Troubleshooting
Common Issues
File not found errors
# Ensure file paths are correct
Query syntax errors
# Use --show-sql to debug generated SQL
Large file performance
# For very large files, consider filtering early in the pipeline
Memory issues with large datasets
# Process data in chunks or use more specific filters
Example Data
Included example files:
examples/employees.csv: Employee data with department, salary, and country informationexamples/departments.csv: Department names and IDsexamples/customers.csv,examples/orders.csv: Customer-order relationship dataexamples/queries/*.prql: Sample PRQL queries
License
MIT