floe-cli-0.6.3 is not a library.

Floe

Floe is a Polars-powered data contract runtime for reliable file ingestion. It validates raw files or extracted datasets before they enter your trusted layer, routing accepted rows to lakehouse sinks and rejected rows to quarantine with audit reports.

Use Floe when you already have a platform such as Databricks, Fabric-style lakehouses, Snowflake/Open Catalog, MotherDuck, Airflow, or Dagster, but you need a lightweight entry gate for file contracts, quality checks, rejected rows, and run evidence.

Floe complements extract/load tools such as dlt, ingestr, and Airbyte: they get data out of source systems; Floe decides what is allowed into trusted storage.

What Floe does

Defines source, schema, checks, accepted output, rejected output, write mode, metadata, and reporting in one human-readable YAML contract.
Reads common file exports and extracted datasets from local or cloud storage.
Applies schema, casting, nullability, and uniqueness checks before write.
Writes accepted rows to Parquet, Delta Lake, Apache Iceberg, or DuckDB.
Writes invalid rows separately and emits deterministic JSON run reports.
Runs as a CLI binary, Docker image, Python library, or orchestrated job.

How it works

Floe architecture

Each floe run executes a deterministic gate per entity:

Stage	What happens
1. Resolve inputs	Discover and download source files from local or cloud storage
2. File-level checks	Validate schema structure, file format, and headers
3. Row-level checks	Apply type casting and `not_null` checks row by row
4. Entity-level checks	Apply `unique` / primary-key checks across all input rows plus existing accepted data
5. Write outputs	Route valid rows to accepted sinks, invalid rows to rejected sinks, and write reports

Floe uses Rust, Polars, and Arrow for single-node columnar execution. At the sink boundary, Arrow RecordBatches are handed to table-format writers without an extra serialization hop.

Inputs: CSV · TSV · JSON · Parquet · ORC · Avro · XLSX · XML · Fixed-width
Accepted outputs: Parquet · Delta Lake · Apache Iceberg · DuckDB / MotherDuck
Storage: local · S3 · ADLS · GCS
Catalogs: AWS Glue · Iceberg REST (Polaris, Nessie, Snowflake) · Databricks Unity Catalog

Feature index

Capability	Start here
Contracts and full YAML reference	docs/config.md
Pipeline phases and execution details	docs/how-it-works.md
Checks: schema mismatch, cast, `not_null`, `unique`	docs/checks.md
Supported inputs, outputs, storage, and catalogs	docs/support-matrix.md
Write modes: `overwrite`, `append`, `merge_scd1`, `merge_scd2`	docs/write_modes.md
Parquet, Delta, Iceberg, and DuckDB sinks	docs/sinks/parquet.md, docs/sinks/delta.md, docs/sinks/iceberg.md, docs/sinks/duckdb.md
S3, ADLS, and GCS storage	docs/storages/s3.md, docs/storages/adls.md, docs/storages/gcs.md
Incremental file state	docs/incremental.md
Profiles and variables	docs/profiles.md, docs/variables.md
PII masking	docs/pii.md
Reports, logs, and OpenLineage	docs/report.md, docs/logging.md, docs/lineage.md
Airflow and Dagster manifests	docs/manifest.md, orchestrators/airflow-floe/README.md, orchestrators/dagster-floe/README.md
Python and notebooks	docs/python-bindings.md
Installation and CLI usage	docs/installation.md, docs/cli.md

Install

macOS / Linux — Homebrew

brew tap malon64/floe
brew install floe

Windows — Scoop

scoop bucket add floe https://github.com/malon64/scoop-floe
scoop install floe

Docker

docker pull ghcr.io/malon64/floe:latest
docker run --rm -v "$PWD:/work" ghcr.io/malon64/floe:latest run -c /work/config.yml

Or download a prebuilt binary from GitHub Releases, or cargo install floe-cli.
→ Full installation guide

DuckDB sink is shipped as a companion (the default artifacts are lean): use the ghcr.io/malon64/floe-duckdb image, a floe-duckdb binary on your PATH, or the off-PyPI floe-duckdb wheel. The lean floe auto-delegates DuckDB-sink runs to it. → DuckDB support

Quick start

floe validate -c config.yml   # validate config and schema
floe run      -c config.yml   # run the pipeline

→ Config reference · Example config

For the full documentation entry point, see docs/summary.md.

License

MIT