datafusion-server crate
Multiple session, variety of data sources query server implemented by Rust.
- Asynchronous architecture used by Tokio ecosystem
- Apache Arrow with Apache DataFusion
- Supports multiple data source with SQL queries
- Python plugin feature for data source connector and post processor
- Horizontal scaling architecture between servers using the Arrow Flight gRPC feature
Please see the Documentation for an introductory tutorial and a full usage guide. Additionally, the REST API documentation is available according to the OpenAPI specification. Also, refer to the CHANGELOG for the latest information.
System Overview
License
License under the MIT
Copyright © 2022 - 2025 SAL Ltd. - https://sal.co.jp
Supported environment
- Linux
- BSD based Unix incl. macOS / Mac OSX
- SVR based Unix
- Windows incl. WSL2 / Cygwin
and other LLVM supported environment.
Using pre-built Docker image (Currently available amd64 architecture only)
Pre-require
- Docker CE / EE v20+
Pull container image from GitHub container registry
or built without Python plugin version.
Executing container
If you are only using sample data in a container, omit the -v ./data:/var/xapi-server/data.
Build container your self
Pre-require
- Docker CE / EE v20+
Build two containers, datafusion-server and datafusion-server-without-plugin
Executing container
If you are only using sample data in a container, omit the -v ./bin/data:/var/xapi-server/data.
Build from source code for use in your project
Pre-require
- Rust Toolchain 1.81+ (Edition 2021) from https://www.rust-lang.org
- or the Rust official container from https://hub.docker.com/_/rust
How to run
Example of Cargo.toml
[]
= "server-executor"
= "0.1.0"
= "2021"
[]
= "0.20.9"
= { = "4.5", = ["derive"] }
Example of src/main.rs
use PathBuf;
use Parser;
use Settings;
For details, further reading main.rs and Config.toml.
Example of config.toml
# Configuration file of datafusion-server
[]
= 4000
= 50051
= "/"
= "./data"
= "./plugins"
[]
= 3600 # in seconds
= 20 # MB
[]
# trace, debug, info, warn, error
= "debug"
Debug build and run
datafusion-server with Python plugins feature
Require Python interpreter v3.7+
How to run
Example of Cargo.toml
[]
= { = "0.20.9", = ["plugin"] }
Debug build and run
Release build with full optimization
Example of Cargo.toml
[]
= 'z'
= true
= "fat"
= 1
[]
= { = "0.20.9", = ["plugin"] }
Build for release
Clean workspace
Usage
Multiple data sources with SQL query
- Can be used many kind of data source format (Parquet, JSON, ndJSON, CSV, ...).
- Data can be retrieved from the local file system and from external REST services.
- Processing by JSONPath can be performed if necessary.
- Query execution across multiple data sources.
- SQL query engine uses Arrow DataFusion.
- Details https://arrow.apache.org/datafusion/user-guide/sql/index.html for more information.
- SQL query engine uses Arrow DataFusion.
- Arrow, JSON and CSV formats to response.
Example (local file)
Example (remote REST API)
Example (Python datasource connector plugin)