Elusion 🦀 DataFrame / Data Engineering / Data Analysis Library for Everybody!

Best Way to learn Elusion:
Udemy Course - Click to start learning on Udemy!
Elusion is a high-performance DataFrame / Data Engineering / Data Analysis library designed for in-memory data formats such as CSV, JSON, PARQUET, DELTA, as well as for Azure Blob Storage Connections, as well as for creating JSON files from REST API's which can be forwarded to DataFrame.
All of the DataFrame operations, Reading and Writing can be placed in PipelineScheduler for automated Data Engineering Pipelines.
DataFrame operations are built atop the DataFusion SQL query engine, Azure BLOB HTTPS operations are built atop Azure Storage with BLOB and DFS (Data Lake Storage Gen2) endpoints available, Pipeline Scheduling is built atop Tokio Cron Scheduler, REST API is build atop Reqwest. Report Creation is built atop Plotly and AG GRID. (scroll down for examples)
Tailored for Data Engineers and Data Analysts seeking a powerful abstraction over data transformations. Elusion streamlines complex operations like filtering, joining, aggregating, and more with its intuitive, chainable DataFrame API, and provides a robust interface for managing and querying data efficiently. It also has Integrated Plotting and Interactive Dashboard features.
Core Philosophy
Elusion wants you to be you!
Elusion offers flexibility in constructing queries without enforcing specific patterns or chaining orders, unlike SQL, PySpark, Polars, or Pandas. You can build your queries in any sequence that best fits your logic, writing functions in a manner that makes sense to you. Regardless of the order of function calls, Elusion ensures consistent results.
Platform Compatibility
Tested for MacOS, Linux and Windows

Security
Codebase has Undergone Rigorous Auditing and Security Testing, ensuring that it is fully prepared for Production.
Key Features
🔄 Job Scheduling (PipelineScheduler)
Flexible Intervals: From 1 minute to 30 days scheduling intervals. Graceful Shutdown: Built-in Ctrl+C signal handling for clean termination. Async Support: Built on tokio for non-blocking operations.
🌐 External Data Sources Integration
- Azure Blob Storage: Direct integration with Azure Blob Storage for Reading and Writing data files.
- REST API's: Create JSON files from REST API endpoints with Customizable Headers, Params, Date Ranges, Pagination...
🚀 High-Performance DataFrame Query Operations
Seamless Data Loading: Easily load and process data from CSV, PARQUET, JSON, and DELTA table files. SQL-Like Transformations: Execute transformations such as SELECT, AGG, STRING FUNCTIONS, JOIN, FILTER, HAVING, GROUP BY, ORDER BY, DATETIME and WINDOW with ease.
🚀 Caching and Views
The caching and views functionality offer several significant advantages over regular querying:
Reduced Computation Time:
Complex queries (especially with joins, aggregations, and string functions) only need to be computed once Subsequent requests use pre-computed results, which can be 10-100x faster
Memory Management:
Large intermediate results are managed better in cached form (prevents stack overflow) The query execution plan doesn't need to be rebuilt each time Memory allocation patterns become more predictable
Query Optimization:
Results are stored in an optimized format (Arrow RecordBatches) Repeated access doesn't require re-parsing SQL or rebuilding execution plans
Interactive Analysis:
Data scientists can explore data interactively without waiting for the same queries to execute repeatedly Makes iterative analysis practical even with large datasets
Dashboards and Reports:
Multiple visualizations can share the same underlying data without redundant computation Refresh only when needed (TTL-based expiration)
Resource Utilization:
Reduced CPU usage for repeated queries Less I/O pressure for file-based data sources More efficient use of memory (prevent re-allocations)
Concurrency:
Multiple users/processes can access the same cached results Reduces contention for system resources
📉 Aggregations and Analytics
Comprehensive Aggregations: Utilize built-in functions like SUM, AVG, MEAN, MEDIAN, MIN, COUNT, MAX, and more. Advanced Scalar Math: Perform calculations using functions such as ABS, FLOOR, CEIL, SQRT, ISNAN, ISZERO, PI, POWER, and others.
🔗 Flexible Joins
Diverse Join Types: Perform joins using INNER, LEFT, RIGHT, FULL, and other join types. Intuitive Syntax: Easily specify join conditions and aliases for clarity and simplicity.
🪟 Window Functions
Analytical Capabilities: Implement window functions like RANK, DENSE_RANK, ROW_NUMBER, and custom partition-based calculations to perform advanced analytics.
🔄 Pivot and Unpivot Functions
Data Reshaping: Transform your data structure using PIVOT and UNPIVOT functions to suit your analytical needs.
📊 Create REPORTS
Create HTML files with Interactive Dashboards with multiple interactive Plots and Tables. Plots Available: TimeSeries, Bar, Pie, Donut, Histogram, Scatter, Box... Tables can Paginate pages, Filter, Resize, Reorder columns... Export Tables data to EXCEL and CSV
🧹 Clean Query Construction
Readable Queries: Construct SQL queries that are both readable and reusable. Advanced Query Support: Utilize Common Table Expressions (CTEs), subqueries, and set operations such as APPEND, UNION, UNION ALL, INTERSECT, and EXCEPT. For multiple Dataframea operations: APPEND_MANY, UNION_MANY, UNION_ALL_MANY.
🛠️ Easy-to-Use API
Chainable Interface: Build queries using a chainable and intuitive API for streamlined development. Debugging Support: Access readable debug outputs of the generated SQL for easy verification and troubleshooting. Data Preview: Quickly preview your data by displaying a subset of rows in the terminal. Composable Queries: Seamlessly chain transformations to create reusable and testable workflows.
Installation
To add Elusion to your Rust project, include the following lines in your Cargo.toml under [dependencies]:
= "3.7.4"
= { = "1.42.1", = ["rt-multi-thread"] }
Rust version needed
>= 1.81
Feature Flags
Elusion uses Cargo feature flags to keep the library lightweight and modular. You can enable only the features you need, which helps reduce dependencies and compile time.
Available Features
azure:
Enables Azure BLOB storage connectivity.
dashboard:
Enables data visualization and dashboard creation capabilities. This adds the plotly dependency.
api:
Enables HTTP API integration for fetching data from web services. This adds the reqwest and urlencoding dependencies.
all:
Enables all available features.
Usage:
- In your Cargo.toml, specify which features you want to enable:
- Add the DASHBOARD feature when specifying the dependency:
[]
= { = "3.7.4", = ["dashboard"] }
When building your project, use the DASHBOARD feature:
cargo build --features dashboard
cargo run --features dashboard
- Add the AZURE feature when specifying the dependency:
[]
= { = "3.7.4", = ["azure"] }
When building your project, use the AZURE feature:
cargo build --features azure
cargo run --features azure
- Add the API feature when specifying the dependency:
elusion =
This enables HTTP client functionality to fetch data from APIs:
cargo build --features api
cargo run --features api
4.Using NO Features (minimal dependencies):
elusion = "3.7.4"
- Using multiple specific features:
elusion =
Or build with multiple features:
cargo build --features "dashboard api"
cargo run --features "dashboard api"
- Using all features:
elusion =
Feature Implications
When a feature is not enabled, the corresponding methods will still be available in your code, but they will return an error indicating that the feature needs to be enabled. This approach ensures API compatibility regardless of which features you choose to enable.
For example, if you try to use API functions without the "api" feature enabled:
let api = new;
let result = api.from_api.await;
You'll receive an error: Error: API feature not enabled. Recompile with --features api
Compilation Benefits
Faster Compilation: Only compile the dependencies you need Reduced Binary Size: Final executable only includes the code you use Fewer Dependencies: Minimize dependency tree complexity Customized Build: Tailor the library to your specific needs
NORMALIZATION
DataFrame (your files) Column Names will be normalized to LOWERCASE(), TRIM() and REPLACE(" ","_")
All DataFrame query expresions, functions, aliases and column names will be normalized to LOWERCASE(), TRIM() and REPLACE(" ","_")
Schema
SCHEMA IS DYNAMICALLY INFERED
Usage examples:
MAIN function
// Import everything needed
use *;
async
CREATING DATA FRAMES and QUICK EXAMPLES TO GET YOU STARTED
- Loading data into CustomDataFrame can be from:
- Empty() DataFrames
- In-Memory data formats: CSV, JSON, PARQUET, DELTA
- Azure Blob Storage endpoints (BLOB, DFS)
-> NEXT is example for reading data from local files,
down bellow are examples for Azure Blob Storage, ODBC
LOADING data from Files into CustomDataFrame (in-memory data formats)
- File extensions are automatically recognized
- All you have to do is to provide path to your file
let csv_data = "C:\\Borivoj\\RUST\\Elusion\\sales_data.csv";
let parquet_path = "C:\\Borivoj\\RUST\\Elusion\\prod_data.parquet";
let json_path = "C:\\Borivoj\\RUST\\Elusion\\db_data.json";
let delta_path = "C:\\Borivoj\\RUST\\Elusion\\agg_sales"; // for DELTA you just specify folder name without extension
Creating CustomDataFrame
2 arguments needed: Path, Table Alias
let df_sales = new.await?;
let df_customers = new.await?;
LOADING data from Databases into CustomDataFrame (scroll down for full example)
let pg_df = from_db.await?;
LOADING data from Azure BLOB Storage into CustomDataFrame (scroll down for full example)
let df = from_azure_with_sas_token.await?;
CREATE EMPTY DATA FRAME
Create empty() DataFrame and populate it with data
let temp_df = empty.await?;
let date_table = temp_df
.datetime_functions
.elusion.await?;
date_table.display.await?;
RESULT:
+--------------+---------------------+---------------------+--------------+------------------+
| current_date | week_start | next_week_start | current_year | current_week_num |
+--------------+---------------------+---------------------+--------------+------------------+
| 2025-03-07 | 2025-03-03T00:00:00 | 2025-03-10T00:00:00 | 2025.0 | 10.0 |
+--------------+---------------------+---------------------+--------------+------------------+
CREATE DATE TABLE
Create Date Table from Range of Dates
let date_table = create_date_range_table.await?;
date_table.display.await?;
RESULT:
+------------+------+-------+-----+---------+----------+-------------+------------------+-------------+------------+-------------+---------------+------------+------------+
| date | year | month | day | quarter | week_num | day_of_week | day_of_week_name | day_of_year | week_start | month_start | quarter_start | year_start | is_weekend |
+------------+------+-------+-----+---------+----------+-------------+------------------+-------------+------------+-------------+---------------+------------+------------+
| 2025-01-01 | 2025 | 1 | 1 | 1 | 1 | 3 | Wednesday | 1 | 2024-12-29 | 2025-01-01 | 2025-01-01 | 2025-01-01 | false |
| 2025-01-02 | 2025 | 1 | 2 | 1 | 1 | 4 | Thursday | 2 | 2024-12-29 | 2025-01-01 | 2025-01-01 | 2025-01-01 | false |
| 2025-01-03 | 2025 | 1 | 3 | 1 | 1 | 5 | Friday | 3 | 2024-12-29 | 2025-01-01 | 2025-01-01 | 2025-01-01 | false |
| 2025-01-04 | 2025 | 1 | 4 | 1 | 1 | 6 | Saturday | 4 | 2024-12-29 | 2025-01-01 | 2025-01-01 | 2025-01-01 | true |
| 2025-01-05 | 2025 | 1 | 5 | 1 | 1 | 0 | Sunday | 5 | 2025-01-05 | 2025-01-01 | 2025-01-01 | 2025-01-01 | true |
| 2025-01-06 | 2025 | 1 | 6 | 1 | 2 | 1 | Monday | 6 | 2025-01-05 | 2025-01-01 | 2025-01-01 | 2025-01-01 | false |
| 2025-01-07 | 2025 | 1 | 7 | 1 | 2 | 2 | Tuesday | 7 | 2025-01-05 | 2025-01-01 | 2025-01-01 | 2025-01-01 | false |
| 2025-01-08 | 2025 | 1 | 8 | 1 | 2 | 3 | Wednesday | 8 | 2025-01-05 | 2025-01-01 | 2025-01-01 | 2025-01-01 | false |
| 2025-01-09 | 2025 | 1 | 9 | 1 | 2 | 4 | Thursday | 9 | 2025-01-05 | 2025-01-01 | 2025-01-01 | 2025-01-01 | false |
| .......... | .... | . | . | . | . | . | ................ | .......... | .......... | .......... | ............. | ...........| .......... |
+------------+------+-------+-----+---------+----------+-------------+------------------+-------------+------------+-------------+---------------+------------+------------+
CREATE DATE TABLE WITH CUSTOM FORMATS
You can create Date Table with Custom formats (ISO, Compact, Human Readable...) and week, month, quarter, year Ranges (start-end)
let date_table = create_formatted_date_range_table.await?;
date_table.display.await?;
RESULT:
+-------------+------+-------+-----+---------+----------+-------------+------------------+-------------+------------+-------------+-------------+-------------+-------------+---------------+-------------+-------------+-------------+
| date | year | month | day | quarter | week_num | day_of_week | day_of_week_name | day_of_year | is_weekend | week_start | week_end | month_start | month_end | quarter_start | quarter_end | year_start | year_end |
+-------------+------+-------+-----+---------+----------+-------------+------------------+-------------+------------+-------------+-------------+-------------+-------------+---------------+-------------+-------------+-------------+
| 1 Jan 2025 | 2025 | 1 | 1 | 1 | 1 | 2 | Wednesday | 1 | false | 30 Dec 2024 | 5 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 2 Jan 2025 | 2025 | 1 | 2 | 1 | 1 | 3 | Thursday | 2 | false | 30 Dec 2024 | 5 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 3 Jan 2025 | 2025 | 1 | 3 | 1 | 1 | 4 | Friday | 3 | false | 30 Dec 2024 | 5 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 4 Jan 2025 | 2025 | 1 | 4 | 1 | 1 | 5 | Saturday | 4 | true | 30 Dec 2024 | 5 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 5 Jan 2025 | 2025 | 1 | 5 | 1 | 1 | 6 | Sunday | 5 | true | 30 Dec 2024 | 5 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 6 Jan 2025 | 2025 | 1 | 6 | 1 | 2 | 0 | Monday | 6 | false | 6 Jan 2025 | 12 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 7 Jan 2025 | 2025 | 1 | 7 | 1 | 2 | 1 | Tuesday | 7 | false | 6 Jan 2025 | 12 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 8 Jan 2025 | 2025 | 1 | 8 | 1 | 2 | 2 | Wednesday | 8 | false | 6 Jan 2025 | 12 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| 9 Jan 2025 | 2025 | 1 | 9 | 1 | 2 | 3 | Thursday | 9 | false | 6 Jan 2025 | 12 Jan 2025 | 1 Jan 2025 | 31 Jan 2025 | 1 Jan 2025 | 31 Mar 2025 | 1 Jan 2025 | 31 Dec 2025 |
| ........... | .... | .. | .. | . | . | . | ......... | ... | ..... | ........... | .......... | .......... | ........... | .......... | ........... | .......... | ........... |
+-------------+------+-------+-----+---------+----------+-------------+------------------+-------------+------------+-------------+-------------+-------------+-------------+---------------+-------------+-------------+-------------+
ALL AVAILABLE DATE FORMATS
IsoDate, // YYYY-MM-DD
IsoDateTime, // YYYY-MM-DD HH:MM:SS
UsDate, // MM/DD/YYYY
EuropeanDate, // DD.MM.YYYY
EuropeanDateDash, // DD-MM-YYYY
BritishDate, // DD/MM/YYYY
HumanReadable, // 1 Jan 2025
HumanReadableTime, // 1 Jan 2025 00:00
SlashYMD, // YYYY/MM/DD
DotYMD, // YYYY.MM.DD
CompactDate, // YYYYMMDD
YearMonth, // YYYY-MM
MonthYear, // MM-YYYY
MonthNameYear, // January 2025
Custom // Custom format string
For Custom Date formats some of the common format specifiers:
%Y - Full year
%y - Short year
%m - Month as number
%b - Abbreviated month name
%B - Full month name
%d - Day of month
%e - Day of month, space-padded
%a - Abbreviated weekday name
%A - Full weekday name
%H - Hour
%I - Hour
%M - Minute
%S - Second
%p - AM/PM
EXAMPLES:
Custom, // "01 Jan 2025 00:00"
// ISO 8601 with T separator and timezone
Custom
// US date with 12-hour time
Custom
// Custom format with weekday
Custom // "Monday, January 1, 2025"
EXTRACTING VALUES: extract_value_from_df()
Example how you can extract values from DataFrame and use it within REST API
//create calendar dataframe
let date_calendar = create_formatted_date_range_table.await?;
// take columns from Calendar
let week_range_2025 = date_calendar
.select
.order_by
.elusion
.await?;
// create empty dataframe
let temp_df = empty.await?;
//populate empty dataframe with current week number
let current_week = temp_df
.datetime_functions
.elusion.await?;
// join data frames to get range for current week
let week_for_api = week_range_2025
.join
.select
.elusion
.await?;
// Extract Date Value from DataFrame based on column name and Row Index
let date_from = extract_value_from_df.await?;
let date_to = extract_value_from_df.await?;
//PRINT results for preview
week_for_api.display.await?;
println!;
println!;
RESULT:
+------------------+------------------+
| datefrom | dateto |
+------------------+------------------+
| 3 Mar 2025 00:00 | 9 Mar 2025 00:00 |
+------------------+------------------+
Date from: 3 Mar 2025 00:00
Date to: 9 Mar 2025 00:00
NOW WE CAN USE THESE EXTRACTED VALUES:
let post_df = new;
post_df.from_api_with_dates.await?;
EXTRACTING ROWS: extract_row_from_df()
Example how you can extract Row from DataFrame and use it within REST API.
//create calendar dataframe
let date_calendar = create_formatted_date_range_table.await?;
//take columns from calendar
let week_range_2025 = date_calendar
.select
.order_by
.elusion
.await?;
// create empty dataframe
let temp_df = empty.await?;
//populate empty dataframe with current week number
let current_week = temp_df
.datetime_functions
.elusion.await?;
// join data frames to ge range for current week
let week_for_api = week_range_2025
.join
.select
.elusion
.await?;
// Extract Row Values from DataFrame based on Row Index
let row_values = extract_row_from_df.await?;
// PRINT row for preview
println!;
RESULT:
DataFrame row:
NOW WE CAN USE THESE EXTRACTED ROW:
let post_df = new;
post_df.from_api_with_dates.await?;
CREATE VIEWS and CACHING
Materialized Views:
For long-term storage of complex query results. When results need to be referenced by name. For data that changes infrequently. Example: Monthly sales summaries, customer metrics, product analytics
Query Caching:
For transparent performance optimization. When the same query might be run multiple times in a session. For interactive analysis scenarios. Example: Dashboard queries, repeated data exploration.
let sales = "C:\\Borivoj\\RUST\\Elusion\\SalesData2022.csv";
let products = "C:\\Borivoj\\RUST\\Elusion\\Products.csv";
let customers = "C:\\Borivoj\\RUST\\Elusion\\Customers.csv";
let sales_df = new.await?;
let customers_df = new.await?;
let products_df = new.await?;
// Example 1: Using materialized view for customer count
// The TTL parameter (3600) specifies how long the view remains valid in seconds (1 hour)
customers_df.clone
.select
.limit
.create_view
.await?;
// Access the view by name - no recomputation needed
let customer_count = from_view.await?;
customer_count.display.await?;
// Example 2: Using query caching with complex joins and aggregations
// First execution computes and stores the result
let join_result = sales_df.clone
.join_many
.select
.agg
.group_by
.having_many
.order_by_many
.elusion_with_cache
.await?;
join_result.display.await?;
// Other useful cache/view management functions:
invalidate_cache; // Clear cache for specific tables
clear_cache; // Clear entire cache
refresh_view.await?; // Refresh a materialized view
drop_view.await?; // Remove a materialized view
list_views.await; // Get info about all views
DATAFRAME WRANGLING (lets start from scratch...)
SELECT
ALIAS column names in SELECT() function (AS is case insensitive)
let df_AS = select_df
.select;
let df_select_all = select_df.select;
let df_count_all = select_df.select;
let df_distinct = select_df.select;
Where to use which Functions:
Scalar and Operators -> in SELECT() function
Aggregation Functions -> in AGG() function
String Column Functions -> in STRING_FUNCTIONS() function
DateTime Functions -> in DATETIME_FUNCTIONS() function
Numerical Operators (supported +, -, * , / , %)
let num_ops_sales = sales_order_df
.select
.filter
.order_by
.limit;
let num_ops_res = num_ops_sales.elusion.await?;
num_ops_res.display.await?;
FILTER (used before aggregations)
let filter_df = sales_order_df
.select
.filter_many
.order_by
.limit;
let filtered = filter_df.elusion.await?;
filtered.display.await?;
// exmple 2
const FILTER_CUSTOMER: &str = "customer_name == 'Customer IRRVL'";
let filter_query = sales_order_df
.select
.agg
.filter
.group_by_all
.order_by_many
HAVING (used after aggregations)
//Example 1 with aggregatied column names
let example1 = sales_df
.join_many
.select
.agg
.group_by
.having_many
.order_by_many;
let result = example1.elusion.await?;
result.display.await?;
//Example 2 with aggregation in having
let df_having= sales_df
.join
.select
.agg
.group_by
.having_many
.order_by
.limit;
let result = df_having.elusion.await?;
result.display.await?;
SCALAR functions
let scalar_df = sales_order_df
.select
.filter
.order_by
.limit;
let scalar_res = scalar_df.elusion.await?;
scalar_res.display.await?;
AGGREGATE functions with nested Scalar functions
let scalar_df = sales_order_df
.select
.agg
.group_by
.filter
.order_by
.limit;
let scalar_res = scalar_df.elusion.await?;
scalar_res.display.await?;
STRING functions
let df = sales_df
.select
.string_functions;
let result_df = df.elusion.await?;
result_df.display.await?;
Numerical Operators, Scalar Functions, Aggregated Functions...
let mix_query = sales_order_df
.select
.agg
.filter
.group_by_all
.order_by_many;
let mix_res = mix_query.elusion.await?;
mix_res.display.await?;
Supported Aggregation functions
SUM, AVG, MEAN, MEDIAN, MIN, COUNT, MAX,
LAST_VALUE, FIRST_VALUE,
GROUPING, STRING_AGG, ARRAY_AGG, VAR, VAR_POP,
VAR_POPULATION, VAR_SAMP, VAR_SAMPLE,
BIT_AND, BIT_OR, BIT_XOR, BOOL_AND, BOOL_OR
Supported Scalar Math Functions
ABS, FLOOR, CEIL, SQRT, ISNAN, ISZERO,
PI, POW, POWER, RADIANS, RANDOM, ROUND,
FACTORIAL, ACOS, ACOSH, ASIN, ASINH,
COS, COSH, COT, DEGREES, EXP,
SIN, SINH, TAN, TANH, TRUNC, CBRT,
ATAN, ATAN2, ATANH, GCD, LCM, LN,
LOG, LOG10, LOG2, NANVL, SIGNUM
JOIN
JOIN examples with single condition and 2 dataframes, AGGREGATION, GROUP BY
let single_join = df_sales
.join
.select
.agg
.group_by
.having
.order_by // true is ascending, false is descending
.limit;
let join_df1 = single_join.elusion.await?;
join_df1.display.await?;
JOIN with single conditions and 3 dataframes, AGGREGATION, GROUP BY, HAVING, SELECT, ORDER BY
let many_joins = df_sales
.join_many
.select
.agg
.group_by
.having_many
.order_by_many
.limit;
let join_df3 = many_joins.elusion.await?;
join_df3.display.await?;
JOIN with multiple conditions and 2 data frames
let result_join = orders_df
.join
.select
.string_functions
.agg
.group_by;
let res_joins = result_join.elusion.await?;
res_joins.display.await?;
JOIN_MANY with multiple conditions and 3 data frames
let result_join_many = order_join_df
.join_many
.select
.string_functions
.agg
.group_by_all
.having
.order_by;
let res_joins_many = result_join_many.elusion.await?;
res_joins_many.display.await?;
JOIN_MANY with single condition and 3 dataframes, STRING FUNCTIONS, AGGREGATION, GROUP BY, HAVING_MANY, ORDER BY
let str_func_joins = df_sales
.join_many
.select
.string_functions
.agg
.group_by_all
.having_many
.order_by_many;
let join_str_df3 = str_func_joins.elusion.await?;
join_str_df3.display.await?;
Currently implemented join types
"INNER", "LEFT", "RIGHT", "FULL",
"LEFT SEMI", "RIGHT SEMI",
"LEFT ANTI", "RIGHT ANTI", "LEFT MARK"
STRING FUNCTIONS
let string_functions_df = df_sales
.join_many
.select
.string_functions
.agg
.filter
.group_by_all
.having
.order_by;
let str_df = string_functions_df.elusion.await?;
str_df.display.await?;
Currently Available String functions
1.Basic String Functions:
TRIM - Remove leading/trailing spaces
LTRIM - Remove leading spaces
RTRIM - Remove trailing spaces
UPPER - Convert to uppercase
LOWER - Convert to lowercase
LENGTH or LEN - Get string length
LEFT - Extract leftmost characters
RIGHT - Extract rightmost characters
SUBSTRING - Extract part of string
2. String concatenation:
CONCAT - Concatenate strings
CONCAT_WS - Concatenate with separator
3. String Position and Search:
POSITION - Find position of substring
STRPOS - Find position of substring
INSTR - Find position of substring
LOCATE - Find position of substring
4. String Replacement and Modification:
REPLACE - Replace all occurrences of substring
TRANSLATE - Replace characters
OVERLAY - Replace portion of string
REPEAT - Repeat string
REVERSE - Reverse string characters
5. String Pattern Matching:
LIKE - Pattern matching with wildcards
REGEXP or RLIKE - Pattern matching with regular expressions
6. String Padding:
LPAD - Pad string on left
RPAD - Pad string on right
SPACE - Generate spaces
7. String Case Formatting:
INITCAP - Capitalize first letter of each word
8. String Extraction:
SPLIT_PART - Split string and get nth part
SUBSTR - Get substring
9. String Type Conversion:
TO_CHAR - Convert to string
CAST - Type conversion
CONVERT - Type conversion
10. Control Flow:
CASE
DATETIME FUNCTIONS
Work best with YYYY-MM-DD format
let dt_query = sales_order_df
.select
.datetime_functions
.order_by
let dt_res = dt_query.elusion.await?;
dt_res.display.await?;
Currently Available DateTime Functions
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
NOW
TODAY
DATE_PART
DATE_TRUNC
DATE_BIN
MAKE_DATE
DATE_FORMAT
WINDOW functions
Aggregate, Ranking and Analytical functions
let window_query = df_sales
.join
.select
//aggregated window functions
.window
.window
.window
.window
.window
//ranking window functions
.window
.window
.window
.window
.window
.window
// analytical window functions
.window
.window
.window
.window
.window;
let window_df = window_query.elusion.await?;
window_df.display.await?;
Rolling Window Functions
let rollin_query = df_sales
.join
.select
//aggregated rolling windows
.window
.window;
let rollin_df = rollin_query.elusion.await?;
rollin_df.display.await?;
JSON functions
.json()
function works with Columns that only have simple JSON values
***NOTE: make sure to write AS with capital letters
example json structure: [{"Key1":"Value1","Key2":"Value2","Key3":"Value3"}]
example usage
let path = "C:\\Borivoj\\RUST\\Elusion\\jsonFile.csv";
let json_df = new.await?;
let df_extracted = json_df.json
.select
.elusion.await?;
df_extracted.display.await?;
RESULT:
+---------------+---------------+---------------+---------------+---------------+
| column_name_1 | column_name_2 | column_name_3 | some_column1 | some_column2 |
+---------------+---------------+---------------+---------------+---------------+
| registrations | 2022-09-15 | CustomerCode | 779-0009E3370 | 646443D134762 |
| registrations | 2023-09-11 | CustomerCode | 770-00009ED61 | 463497C334762 |
| registrations | 2017-10-01 | CustomerCode | 889-000049C9E | 634697C134762 |
| registrations | 2019-03-26 | CustomerCode | 000-00006C4D5 | 446397D134762 |
| registrations | 2021-08-31 | CustomerCode | 779-0009E3370 | 463643D134762 |
| registrations | 2019-05-09 | CustomerCode | 770-00009ED61 | 634697C934762 |
| registrations | 2005-10-24 | CustomerCode | 889-000049C9E | 123397C334762 |
| registrations | 2023-02-14 | CustomerCode | 000-00006C4D5 | 932393D134762 |
| registrations | 2021-01-20 | CustomerCode | 779-0009E3370 | 323297C334762 |
| registrations | 2018-07-17 | CustomerCode | 000-00006C4D5 | 322097C921462 |
+---------------+---------------+---------------+---------------+---------------+
.json_array()
function works with Columns that has Array of objects with pathern "column.'$ValueField:IdField=IdValue' AS column_alias"
The function parameters: column: The column containing the JSON array ValueField: The field to extract from matching objects IdField: The field to use as identifier IdValue: The value to match on the identifier field column_alias: The output column name
example json structure
example usage
let multiple_values = df_json.json_array
.select
.elusion
.await?;
multiple_values.display.await?;
RESULT:
+-----------------+-------------------+----------+-------+-------+-------+--------+
| date | made_by | timeline | etr_1 | etr_2 | etr_3 | id |
+-----------------+-------------------+----------+-------+-------+-------+--------+
| 2022-09-15 | Borivoj Grujicic | 1.0 | 1.0 | 1.0 | 1.0 | 77E10C |
| 2023-09-11 | | 5.0 | | | | 770C24 |
| 2017-10-01 | | | | | | 7795FA |
| 2019-03-26 | | 1.0 | | | | 77F2E6 |
| 2021-08-31 | | 5.0 | | | | 77926E |
| 2019-05-09 | | | | | | 77CC0F |
| 2005-10-24 | | | | | | 7728BA |
| 2023-02-14 | | | | | | 77F7F8 |
| 2021-01-20 | | | | | | 7731F6 |
| 2018-07-17 | | 3.0 | | | | 77FB18 |
+-----------------+-------------------+----------+-------+-------+-------+--------+
APPEND, APPEND_MANY
APPEND: Combines rows from two dataframes, keeping all rows
APPEND_MANY: Combines rows from many dataframes, keeping all rows
let df1 = "C:\\Borivoj\\RUST\\Elusion\\API\\df1.json";
let df2 = "C:\\Borivoj\\RUST\\Elusion\\API\\df2.json";
let df3 = "C:\\Borivoj\\RUST\\Elusion\\API\\df3.json";
let df4 = "C:\\Borivoj\\RUST\\Elusion\\API\\df4.json";
let df5 = "C:\\Borivoj\\RUST\\Elusion\\API\\df5.json";
let df1 = new.await?;
let df2 = new.await?;
let df3 = new.await?;
let df4 = new.await?;
let df5 = new.await?;
let res_df1 = df1.select.string_functions;
let result_df1 = res_df1.elusion.await?;
let res_df2 = df2.select.string_functions;
let result_df2 = res_df2.elusion.await?;
let res_df3 = df3.select.string_functions;
let result_df3 = res_df3.elusion.await?;
let res_df4 = df4.select.string_functions;
let result_df4 = res_df4.elusion.await?;
let res_df5 = df5.select.string_functions;
let resuld_df5 = res_df5.elusion.await?;
//APPEND
let append_df = result_df1.append.await?;
//APPEND_MANY
let append_many_df = result_df1.append_many.await?;
UNION, UNION ALL, EXCEPT, INTERSECT
UNION: Combines rows from both, removing duplicates
UNION ALL: Combines rows from both, keeping duplicates
EXCEPT: Difference of two sets (only rows in left minus those in right).
INTERSECT: Intersection of two sets (only rows in both).
//UNION
let df1 = sales_df.clone
.join
.select
.string_functions;
let df2 = sales_df.clone
.join
.select
.string_functions;
let result_df1 = df1.elusion.await?;
let result_df2 = df2.elusion.await?;
let union_df = result_df1.union.await?;
let union_df_final = union_df.limit.elusion.await?;
union_df_final.display.await?;
//UNION ALL
let union_all_df = result_df1.union_all.await?;
//EXCEPT
let except_df = result_df1.except.await?;
//INTERSECT
let intersect_df = result_df1.intersect.await?;
UNION_MANY, UNION_ALL_MANY
UNION_MANY: Combines rows from many dataframes, removing duplicates
UNION_ALL_MANY: Combines rows from many dataframes, keeping duplicates
let df1 = "C:\\Borivoj\\RUST\\Elusion\\API\\df1.json";
let df2 = "C:\\Borivoj\\RUST\\Elusion\\API\\df2.json";
let df3 = "C:\\Borivoj\\RUST\\Elusion\\API\\df3.json";
let df4 = "C:\\Borivoj\\RUST\\Elusion\\API\\df4.json";
let df5 = "C:\\Borivoj\\RUST\\Elusion\\API\\df5.json";
let df1 = new.await?;
let df2 = new.await?;
let df3 = new.await?;
let df4 = new.await?;
let df5 = new.await?;
let res_df1 = df1.select.string_functions;
let result_df1 = res_df1.elusion.await?;
let res_df2 = df2.select.string_functions;
let result_df2 = res_df2.elusion.await?;
let res_df3 = df3.select.string_functions;
let result_df3 = res_df3.elusion.await?;
let res_df4 = df4.select.string_functions;
let result_df4 = res_df4.elusion.await?;
let res_df5 = df5.select.string_functions;
let resuld_df5 = res_df5.elusion.await?;
//UNION_MANY
let union_all_df = result_df1.union_many.await?;
//UNION_ALL_MANY
let union_all_many_df = result_df1.union_all_many.await?;
PIVOT and UNPIVOT
Pivot and Unpivot functions are ASYNC function
They should be used separately from other functions: 1. directly on initial CustomDataFrame, 2. after .elusion() evaluation.
Future needs to be in final state so .await? must be used
// PIVOT
// directly on initial CustomDataFrame
let sales_p = "C:\\Borivoj\\RUST\\Elusion\\SalesData2022.csv";
let df_sales = new.await?;
let pivoted = df_sales
.pivot.await?;
let result_pivot = pivoted.elusion.await?;
result_pivot.display.await?;
// after .elusion() evaluation
let sales_path = "C:\\Borivoj\\RUST\\Elusion\\sales_order_report.csv";
let sales_order_df = new.await?;
let scalar_df = sales_order_df
.select
.filter
.order_by
.limit;
// elusion evaluation
let scalar_res = scalar_df.elusion.await?;
let pivoted_scalar = scalar_res
.pivot.await?;
let pitvoted_scalar = pivoted_scalar.elusion.await?;
pitvoted_scalar.display.await?;
// UNPIVOT
let unpivoted = result_pivot
.unpivot.await?;
let result_unpivot = unpivoted.elusion.await?;
result_unpivot.display.await?;
// example 2
let unpivot_scalar = scalar_res
.unpivot.await?;
let result_unpivot_scalar = unpivot_scalar.elusion.await?;
result_unpivot_scalar.display.await?;
Statistical Functions
These Functions can give you quick statistical overview of your DataFrame columns and correlations
Currently available: display_stats(), display_null_analysis(), display_correlation_matrix()
df.display_stats.await?;
=== Column Statistics ===
--------------------------------------------------------------------------------
Column: abs_billable_value
------------------------------------------------------------------------------
| Metric | Value | Min | Max |
------------------------------------------------------------------------------
| Records | 10 | - | - |
| Non-null Records | 10 | - | - |
| Mean | 1025.71 | - | - |
| Standard Dev | 761.34 | - | - |
| Value Range | - | 67.4 | 2505.23 |
------------------------------------------------------------------------------
Column: sqrt_billable_value
------------------------------------------------------------------------------
| Metric | Value | Min | Max |
------------------------------------------------------------------------------
| Records | 10 | - | - |
| Non-null Records | 10 | - | - |
| Mean | 29.48 | - | - |
| Standard Dev | 13.20 | - | - |
| Value Range | - | 8.21 | 50.05 |
------------------------------------------------------------------------------
// Display null analysis
// Keep None if you want all columns to be analized
df.display_null_analysis.await?;
----------------------------------------------------------------------------------------
| Column | Total Rows | Null Count | Null Percentage |
----------------------------------------------------------------------------------------
| total_billable | 10 | 0 | 0.00% |
| order_count | 10 | 0 | 0.00% |
| customer_name | 10 | 0 | 0.00% |
| order_date | 10 | 0 | 0.00% |
| abs_billable_value | 10 | 0 | 0.00% |
----------------------------------------------------------------------------------------
// Display correlation matrix
df.display_correlation_matrix.await?;
=== Correlation Matrix ===
-------------------------------------------------------------------------------------------
| | abs_billable_va | sqrt_billable_v | double_billable | percentage_bill |
-------------------------------------------------------------------------------------------
| abs_billable_va | 1.00 | 0.98 | 1.00 | 1.00 |
| sqrt_billable_v | 0.98 | 1.00 | 0.98 | 0.98 |
| double_billable | 1.00 | 0.98 | 1.00 | 1.00 |
| percentage_bill | 1.00 | 0.98 | 1.00 | 1.00 |
-------------------------------------------------------------------------------------------
AZURE Blob Storage Connector
Storage connector available with BLOB and DFS url endpoints, along with SAS token provided
Currently supported file types .JSON and .CSV
DFS endpoint is “Data Lake Storage Gen2” and behave more like a real file system. This makes reading operations more efficient—especially at large scale.
BLOB endpoint example
let blob_url= "https://your_storage_account_name.blob.core.windows.net/your-container-name";
let sas_token = "your_sas_token";
let df = from_azure_with_sas_token.await?;
let data_df = df.select;
let test_data = data_df.elusion.await?;
test_data.display.await?;
DFS endpoint example
let dfs_url= "https://your_storage_account_name.dfs.core.windows.net/your-container-name";
let sas_token = "your_sas_token";
let df = from_azure_with_sas_token.await?;
let data_df = df.select;
let test_data = data_df.elusion.await?;
test_data.display.await?;
Pipeline Scheduler
Time is set according to UTC
Currently available job frequencies
"1min","2min","5min","10min","15min","30min" ,
"1h","2h","3h","4h","5h","6h","7h","8h","9h","10h","11h","12h","24h"
"2days","3days","4days","5days","6days","7days","14days","30days"
PipelineScheduler Example (parsing data from Azure BLOB Stoarge, DataFrame operation and Writing to Parquet)
use *;
async
JSON files
Currently supported files can include: Fileds, Arrays, Objects.
Best performance with flat json ("key":"value")
for JSON, all field types are infered to VARCHAR/TEXT/STRING
// example json structure with key:value pairs
let json_path = "C:\\Borivoj\\RUST\\Elusion\\test.json";
let json_df = new.await?;
let df = json_df.select.limit;
let result = df.elusion.await?;
result.display.await?;
// example json structure with Fields and Arrays
let json_path = "C:\\Borivoj\\RUST\\Elusion\\test2.json";
let json_df = new.await?;
REST API
Creating JSON files from REST API's
Customizable Headers, Params, Pagination, Date Ranges...
FROM API
// example 1
let posts_df = new;
posts_df
.from_api.await?;
// example 2
let users_df = new;
users_df.from_api.await?;
// example 3
let ceo = new;
ceo.from_api.await?;
FROM API WITH HEADERS
// example 1
let mut headers = new;
headers.insert;
let bin_df = new;
bin_df.from_api_with_headers.await?;
// example 2
let mut headers = new;
headers.insert;
headers.insert;
let git_hub = new;
git_hub.from_api_with_headers.await?;
// example 3
let mut headers = new;
headers.insert;
headers.insert;
let pokemon_df = new;
pokemon_df.from_api_with_headers.await?;
FROM API WITH PARAMS
// Using OpenLibrary API with params
let mut params = new;
params.insert;
params.insert;
let open_lib = new;
open_lib.from_api_with_params.await?;
// Random User Generator API with params
let mut params = new;
params.insert;
params.insert;
let generator = new;
generator.from_api_with_params.await?;
// JSON Placeholder with multiple endpoints
let mut params = new;
params.insert;
params.insert;
let multi = new;
multi.from_api_with_params.await?;
// NASA Astronomy Picture of the Day
let mut params = new;
params.insert;
params.insert;
let nasa = new;
nasa.from_api_with_params.await?;
// example 5
let mut params = new;
params.insert;
params.insert;
params.insert;
params.insert;
params.insert;
params.insert;
let api = new;
api.from_api_with_params.await?;
FROM API WITH PARAMS AND HEADERS
let mut params = new;
params.insert;
params.insert;
let mut headers = new;
headers.insert;
headers.insert;
let commits_df = new;
commits_df.from_api_with_params_and_headers.await?;
FROM API WITH DATES
// example 1
let post_df = new;
post_df.from_api_with_dates.await?;
// Example 2: COVID-19 historical data
let covid_df = new;
covid_df.from_api_with_dates.await?;
FROM API WITH PAGINATION
// example 1
let reqres = new;
reqres.from_api_with_pagination.await?;
FROM API WITH SORT
let movie_db = new;
movie_db.from_api_with_sort.await?;
FROM API WITH HEADERS AND SORT
let mut headers = new;
headers.insert;
headers.insert;
let movie_db = new;
movie_db.from_api_with_headers_and_sort.await?;
WRITERS
Writing to Parquet File
We have 2 writing modes: Overwrite and Append
// overwrite existing file
result_df
.write_to_parquet
.await?;
// append to exisiting file
result_df
.write_to_parquet
.await?;
Writing to CSV File
CSV Writing options are mandatory
has_headers: TRUE is dynamically set for Overwrite mode, and FALSE for Append mode.
let custom_csv_options = CsvWriteOptions ;
We have 2 writing modes: Overwrite and Append
// overwrite existing file
result_df
.write_to_csv
.await?;
// append to exisiting file
result_df
.write_to_csv
.await?;
Writing to JSON File
JSON writer can only overwrite, so only 2 arguments needed
1. Path, 2. If you want pretty-printed JSON or not (true or false)
df.write_to_json.await?;
Writing to DELTA table / lake
We can write to delta in 2 modes Overwrite and Append
Partitioning column is OPTIONAL and if you decide to use column for partitioning, make sure that you don't need that column as you won't be able to read it back to dataframe
Once you decide to use partitioning column for writing your delta table, if you want to APPEND to it, append also need to have same column for partitioning
// Overwrite
result_df
.write_to_delta_table
.await
.expect;
// Append
result_df
.write_to_delta_table
.await
.expect;
Writing Parquet to Azure BLOB Storage
We have 2 writing options "overwrite" and "append"
Writing is set to Default, Compression: SNAPPY and Parquet 2.0
Threshold file size is 1GB
let df = new.await?;
let query = df.select;
let data = query.elusion.await?;
let url_to_folder = "https://your_storage_account_name.dfs.core.windows.net/your-container-name/folder/sales.parquet";
let sas_write_token = "your_sas_token"; // make sure SAS token has writing permissions
data.write_parquet_to_azure_with_sas.await?;
// append version
data.write_parquet_to_azure_with_sas.await?;
Writing JSON to Azure BLOB Storage
Only can create new or overwrite exisitng file
Threshold file size is 1GB
let df = new.await?;
let query = df.select;
let data = query.elusion.await?;
let url_to_folder = "https://your_storage_account_name.dfs.core.windows.net/your-container-name/folder/data.json";
let sas_write_token = "your_sas_token"; // make sure SAS token has writing permissions
data.write_json_to_azure_with_sas.await?;
REPORTING
CREATING REPORT with Interactive Plots/Visuals and Tables
Export Table data to EXCEL and CSV
Currently available Interactive Plots: TimeSeries, Box, Bar, Histogram, Pie, Donut, Scatter...
Interactive Tables can: Paginate pages, Filter, Reorder, Resize columns...
let ord = "C:\\Borivoj\\RUST\\Elusion\\sales_order_report.csv";
let sales_order_df = new.await?;
let mix_query = sales_order_df.clone
.select
.agg
.filter
.group_by_all
.order_by_many;
let mix_res = mix_query.elusion.await?;
//INTERACTIVE PLOTS
// Line plot showing sales over time
let line = mix_res.plot_line.await?;
// Bar plot showing aggregated values
let bars = mix_res
.plot_bar.await?;
// Time series showing sales trend
let time_series = mix_res
.plot_time_series.await?;
// Histogram showing distribution of abs billable values
let histogram = mix_res
.plot_histogram.await?;
// Box plot showing abs billable value distribution
let box_plot = mix_res
.plot_box.await?;
// Scatter plot showing relationship between original and doubled values
let scatter = mix_res
.plot_scatter.await?;
// Pie chart showing sales distribution
let pie = mix_res
.plot_pie.await?;
// Donut chart alternative view
let donut = mix_res
.plot_donut.await?;
// Create Tables to add to report
let summary_table = mix_res.clone //Clone for multiple usages
.select
.order_by_many
.elusion
.await?;
let transactions_table = mix_res
.select
.order_by_many
.elusion
.await?;
// Create comprehensive dashboard with all plots
let plots = ;
// Add tables array
let tables = ;
let layout = ReportLayout ;
let table_options = TableOptions ;
// Generate the enhanced interactive report with all plots and tables
create_report.await?;
Dashboard Demo
License
Elusion is distributed under the MIT License. However, since it builds upon DataFusion, which is distributed under the Apache License 2.0, some parts of this project are subject to the terms of the Apache License 2.0. For full details, see the LICENSE.txt file.
Acknowledgments
This library leverages the power of Rust's type system and libraries like DataFusion , Appache Arrow, Arrow ODBC, Tokio Cron Scheduler, Tokio... for efficient query processing. Special thanks to the open-source community for making this project possible.
