dft - Batteries included DataFusion
🚧 DOCS UNDER CONSTRUCTION
Documentation is undergoing a significant revamp - the new documentation will be finalized as part of the v0.3 release in the late Spring or early Summer of 2025.
Overview
dft is a batteries-included suite of DataFusion applications that provides:
- Data Source Integration: Query files from S3, local filesystems, or HuggingFace datasets
- Table Format Support: Native support for Delta Lake
- Extensibility: UDFs defined in WASM (and soon Python)
- Helper Functions: Built-in functions for JSON and Parquet data processing
The project offers four complementary interfaces:
- Text User Interface (TUI): An interactive SQL IDE with real-time query analysis, benchmarking, and catalog exploration (requires
tuifeature) - Command Line Interface (CLI): A scriptable engine for executing queries from files or command line
- FlightSQL Server: A standards-compliant SQL interface for programmatic access (requires
flightsqlfeature) - HTTP Server: A REST API for SQL queries and catalog exploration (requires
httpfeature)
All interfaces share the same execution engine, allowing you to develop locally with the TUI and then seamlessly deploy with the server implementations.
dft builds upon datafusion-cli with enhanced interactivity, additional integrations, and ready-to-use server implementations.
User Guide
Installation
From crates.io (Recommended)
# Core CLI and server interfaces
# With TUI interface
# For full functionality with all features (including TUI)
If you don't have Rust installed, follow the installation instructions.
Feature Flags
Common feature combinations:
# Core with S3 support
# With TUI interface
# TUI with S3 and data lake formats
# Data lake formats
# With JSON and Parquet functions
See the Features documentation for all available features.
Note: The TUI (Text User Interface) is optional and requires the tui feature flag. The CLI, FlightSQL server, and HTTP server are always available.
Running the apps
# Interactive TUI (requires `tui` feature)
# CLI with direct query execution
# CLI with file-based query
# Benchmark a query (with stats)
# Concurrent benchmark (measures throughput under load)
# Save benchmark results to CSV
# Start FlightSQL Server (requires `flightsql` feature)
# Start HTTP Server (requires `http` feature)
# Generate TPC-H data in the configured DB path
Benchmarking
dft includes built-in benchmarking to measure query performance with detailed timing breakdowns:
# Serial benchmark (default) - measures query performance in isolation
# Concurrent benchmark - measures throughput under load
# Custom iteration count
# Save results to CSV for analysis
# Compare serial vs concurrent performance
Benchmark Output:
- Timing breakdown by phase: logical planning, physical planning, execution
- Statistics: min, max, mean, median for each phase
- Row counts validation across all runs
- CSV export with
concurrency_modecolumn for result comparison
Serial vs Concurrent:
- Serial: Pure query execution time without contention (baseline performance)
- Concurrent: Throughput measurement with parallel execution (reveals bottlenecks and contention)
- Concurrent mode uses adaptive concurrency:
min(iterations, CPU cores)
Setting Up Tables with DDL
dft can automatically load table definitions at startup, giving you a persistent "database-like" experience.
Using DDL Files
- Create a DDL file (default:
~/.config/dft/ddl.sql) - Add your table and view definitions:
-- S3 data source (requires s3 feature)
CREATE EXTERNAL TABLE users
STORED AS NDJSON
LOCATION 's3://bucket/users';
-- Parquet files
CREATE EXTERNAL TABLE transactions
STORED AS PARQUET
LOCATION 's3://bucket/transactions';
-- Local files
CREATE EXTERNAL TABLE listings
STORED AS PARQUET
LOCATION 'file://folder/listings';
-- Create views from tables
SELECT * FROM users
LEFT JOIN listings USING (user_id);
-- Delta Lake table (requires deltalake feature)
CREATE EXTERNAL TABLE delta_table
STORED AS DELTATABLE
LOCATION 's3://bucket/delta_table';
Loading DDL
- TUI (requires
tuifeature): DDL is automatically loaded at startup - CLI: Add
--run-ddlflag to execute DDL before your query - Custom Path: Configure a custom DDL path in your config file
[] = "/path/to/my/ddl.sql"
Quick Reference
| Feature | Documentation |
|---|---|
| Core Features | Features Guide |
| Database | Database Guide |
| TUI Interface | TUI Guide |
| CLI Usage | CLI Guide |
| FlightSQL Server | FlightSQL Guide |
| HTTP Server | HTTP Guide |
| Configuration Options | Config Reference |