datafusion-server crate
Multiple session, variety of data sources query server implemented by Rust.
- Asynchronous architecture used by Tokio ecosystem
- Apache Arrow with Arrow DataFusion
- Supports multiple data source with SQL queries
- Python plugin feature for data source connector and post processor
- Horizontal scaling architecture between servers using the Arrow Flight gRPC feature
Please see the Documentation for an introductory tutorial and a full usage guide. Additionally, REST API documentation is available according to the OpenAPI specification.
License
License under the MIT
Copyright (c) 2022 - 2024 SAL Ltd. - https://sal.co.jp
Supported environment
- Linux
- BSD based Unix incl. macOS 10.6+
- SVR4 based Unix
- Windows 10+ incl. WSL 2
and other LLVM supported environment.
Using pre-built Docker image (Currently available amd64 architecture only)
Pre-require
- Docker CE / EE v20+
Pull container image from GitHub container registry
or built without Python plugin version.
Executing container
If you are only using sample data in a container, omit the -v ./data:/var/xapi-server/data.
Build container your self
Pre-require
- Docker CE / EE v20+
Build two containers, datafusion-server and datafusion-server-without-plugin
Executing container
If you are only using sample data in a container, omit the -v ./bin/data:/var/xapi-server/data.
Build from source code for use in your project
Pre-require
- Rust Toolchain 1.74+ (Edition 2021) from https://www.rust-lang.org
- or the Rust official container from https://hub.docker.com/_/rust
How to run
Example of Cargo.toml
[]
= "server-executor"
= "0.1.0"
= "2021"
[]
= "0.13.1"
Example of src/main.rs
use PathBuf;
use Parser;
use Settings;
For details, further reading main.rs and Config.toml.
Example of config.toml
# Configuration file of datafusion-server
[]
= 4000
= 50051
= "/"
= "./data"
= "./plugins"
[]
= 3600 # in seconds
= 20 # MB
[]
# trace, debug, info, warn, error
= "debug"
Debug build and run
datafusion-server with Python plugins feature
Require Python interpreter v3.7+
How to run
Example of Cargo.toml
[]
= "server-executor"
= "0.1.0"
= "2021"
[]
= { = "0.13.1", = ["plugin"] }
Debug build and run
Release build with full optimization
Example of Cargo.toml
[]
= "server-executor"
= "0.1.0"
= "2021"
[]
= 'z'
= true
= "fat"
= 1
[]
= { = "0.13.1", = ["plugin"] }
Build for release
Clean workspace
Usage
Multiple data sources with SQL query
- Can be used many kind of data source format (Parquet, JSON, ndJSON, CSV, ...).
- Data can be retrieved from the local file system and from external REST services.
- Processing by JSONPath can be performed if necessary.
- Query execution across multiple data sources.
- SQL query engine uses Arrow DataFusion.
- Details https://arrow.apache.org/datafusion/user-guide/sql/index.html for more information.
- SQL query engine uses Arrow DataFusion.
- Arrow, JSON and CSV formats to response.
Example (local file)
Example (remote REST API)
Example (Python datasource connector plugin)