floe-core 0.4.5

Core library for Floe, a YAML-driven technical ingestion tool.
Documentation

Floe logo

Floe

Floe is a single-node, YAML-driven data ingestion framework written in Rust. You describe your data contract in a config file; Floe reads raw files, enforces your schema and quality rules, and writes clean rows to your sink — routing invalid rows to a separate rejected output.

How it works

Floe architecture

Each floe run executes a four-stage pipeline per entity:

Stage What happens
1. Resolve inputs Discover and download source files from local or cloud storage
2. File-level checks Validate schema structure, file format, and headers
3. Row-level checks Apply type casting and not_null checks row by row
4. Entity-level checks Apply unique / primary-key checks across all input rows plus existing accepted data

Rows that pass all checks go to the accepted sink. Rows that fail go to the rejected sink. A JSON run report is written after every run.

Inputs: CSV · TSV · JSON · Parquet · ORC · Avro · XLSX · XML · Fixed-width
Accepted outputs: Parquet · Delta Lake · Apache Iceberg
Storage: local · S3 · ADLS · GCS
Catalogs: AWS Glue · Iceberg REST (Polaris, Nessie, Snowflake) · Databricks Unity Catalog

Install

macOS / Linux — Homebrew

brew tap malon64/floe
brew install floe

Windows — Scoop

scoop bucket add floe https://github.com/malon64/scoop-floe
scoop install floe

Docker

docker pull ghcr.io/malon64/floe:latest
docker run --rm -v "$PWD:/work" ghcr.io/malon64/floe:latest run -c /work/config.yml

Or download a prebuilt binary from GitHub Releases, or cargo install floe-cli.
Full installation guide

Quick start

floe validate -c config.yml   # validate config and schema
floe run      -c config.yml   # run the pipeline

Config reference · Example config

Documentation

Topic Doc
Configuration reference docs/config.md
How it works (deep dive) docs/how-it-works.md
Checks (not_null, unique, cast) docs/checks.md
Delta sink + Unity Catalog docs/sinks/delta.md
Iceberg sink + Glue / REST docs/sinks/iceberg.md
Sink options docs/sinks/options.md
Write modes docs/write_modes.md
S3 storage docs/storages/s3.md
ADLS storage docs/storages/adls.md
GCS storage docs/storages/gcs.md
Environement specific profile config docs/profiles.md docs/variables.md
Run reports docs/report.md
CLI reference docs/cli.md
Orchestration (Dagster / Airflow) docs/summary.md
Support matrix docs/support-matrix.md
Changelog CHANGELOG.md

License

MIT