duckquill
A Rust text2sql engine for agents, packaged as duckquill.
The primary user is an agent running in Codex CLI or Claude Code that opens this repo, reads AGENTS.md plus .codex/skills/object-storage-ops/SKILL.md, and drives one binary through a schema-first workflow.
Core idea
raw file -> parquet -> schema -> query -> benchmark
Why this shape:
- Parquet lets DuckDB prune columns and push predicates
schemagrounds SQL before executionparquet_selectivestays the default pathfull_downloadexists as a measurable fallback, not the main happy path
Architecture and product intent:
Who this is for
Codex CLI / Claude Code
Use this repo when the agent needs:
- one Cargo package
- one binary entrypoint
- local CLI loops faster than standing up a bigger service
- repo-local guidance for parquet/object-storage workflows
Preferred agent loop:
cargo run -- convertcargo run -- schemacargo run -- querycargo run -- serveonly when the HTTP contract itself needs verificationcargo run -- benchmarkbefore changing guidance aroundfull_download
Example prompt:
Use the object-storage-ops skill. Convert ./testdata/owid-covid-latest.csv to parquet, inspect schema, then run a parquet_selective query.
Humans
Humans can use the binary directly, but the docs are optimized for agent operators first.
What the binary exposes
CLI:
serveconvertschemaquerybenchmark
HTTP:
GET /healthPOST /schemaPOST /queryPOST /convertPOST /benchmark
Query modes:
hybrid— for local parquet files at or below 10 MiB, run the materialize-firstfull_downloadpath; otherwise resolve toparquet_selectiveparquet_selectivefull_download
Install
If the crate is published on crates.io:
If you want to install the current GitHub version before a crates.io release:
For local development from this checkout:
Quickstart
1) Convert a real fixture to parquet
If a CSV is not UTF-8, pass an explicit encoding label:
For Korean public-data workflows, the same flag is where you would pass labels such as cp949 or euc-kr.
2) Inspect schema first
3) Query from the CLI
4) Benchmark the current approach
See BENCHMARK.md for the measured local results and the Hugging Face comparison case.
HTTP examples
Start the server
Inspect schema via HTTP
/schema now returns typed columns plus a small preview_rows sample to help agents ground SQL generation with real row context.
Query via HTTP
Use hybrid when you want the binary to choose the small-file fast path automatically:
- local parquet files
<= 10 MiBresolve tofull_download - larger local parquet files and remote/object-storage datasets resolve to
parquet_selective
Object storage
The binary supports S3 / MinIO directly.
Convert to S3:
TEXT2SQL_S3_REGION=ap-northeast-2 \
TEXT2SQL_S3_ENDPOINT=http://127.0.0.1:9000 \
TEXT2SQL_S3_ACCESS_KEY_ID=minioadmin \
TEXT2SQL_S3_SECRET_ACCESS_KEY=minioadmin \
TEXT2SQL_S3_ALLOW_HTTP=true \
For agent-facing object-storage workflow guidance, use:
.codex/skills/object-storage-ops/SKILL.md
Current guidance
- keep
parquet_selectiveas the default for larger datasets and object storage - use
hybridwhen you want the binary to auto-pickfull_downloadfor local parquet files<= 10 MiBand otherwise stay parquet-first - use
full_downloadfor debugging or when you intentionally want full materialization first - quoted local parquet globs already work for multi-file queries, e.g.
--dataset './tmp/shard-*.parquet' - blank numeric CSV cells already aggregate as NULL after convert;
TRY_CASTis not required for that current path - use
--csv-encoding <label>when the CSV is not UTF-8 - benchmark before changing that recommendation
- do not claim selective reads lose matching data unless tests prove it
Real fixtures in this repo
testdata/owid-covid-latest.csvtestdata/owid-covid-latest.jsontestdata/canada-wastewater-aggregate.csvtestdata/keyfoods_0708.xlsx
Acceptance snapshots
Convert

Schema

Query

Benchmark

Verification
Current status
What is proven locally:
- convert works
- schema works
- query works
- benchmark works
- a real Hugging Face parquet comparison case is documented in
BENCHMARK.md
What is still environment-blocked on this machine:
- writable remote MinIO/S3 acceptance for live end-to-end object-storage benchmarking