floe-cli 0.1.6

CLI for Floe, a YAML-driven technical ingestion tool.
floe-cli-0.1.6 is not a library.

Floe logo

Floe

Technical ingestion on a single node, driven by YAML contracts.

Floe is a Rust + Polars tool for technical ingestion on a single node. It ingests raw files into typed datasets using YAML contracts, applying schema enforcement and simple data quality rules with clear, auditable outputs.

What Floe solves

  • Schema enforcement and type casting (strict vs coerce)
  • Nullability checks (not_null)
  • Uniqueness checks (unique)
  • Policy behavior: warn / reject / abort
  • Accepted vs rejected outputs for clean separation
  • JSON run reports for observability and audit

Why Polars + Rust

  • Polars provides fast, columnar execution on a single node without JVM overhead.
  • Rust gives predictable performance and low-level control while keeping memory usage tight.
  • The combo fits contract-driven ingestion: schema checks, deterministic outputs, and stable reports.

Minimal config example

version: "0.1"
report:
  path: "./reports"
entities:
  - name: "customer"
    source:
      format: "csv"
      path: "./example/in/customer"
    sink:
      accepted:
        format: "parquet"
        path: "./example/out/accepted/customer"
      rejected:
        format: "csv"
        path: "./example/out/rejected/customer"
    policy:
      severity: "reject"
    schema:
      columns:
        - name: "customer_id"
          type: "string"
          nullable: false
          unique: true
        - name: "created_at"
          type: "datetime"
          nullable: true

Full example: example/config.yml

Config reference: docs/config.md

Quickstart (Homebrew)

Install

brew tap malon64/floe
brew install floe
floe --version

Validate

floe validate -c example/config.yml

Run

floe run -c example/config.yml

Troubleshooting

If Homebrew is unavailable:

  • GitHub Releases: download the prebuilt binary from the latest release
  • Cargo: cargo install floe-cli

More CLI details: docs/cli.md

Sample console output

run id: run-123
report base: ./reports
==> entity customer (severity=reject, format=csv)
  REJECTED customers.csv rows=10 accepted=8 rejected=2 elapsed_ms=12 accepted_out=customer rejected_out=customers_rejected.csv
Totals: files=1 rows=10 accepted=8 rejected=2
Overall: rejected (exit_code=0)
Run summary: ./reports/run_run-123/run.summary.json

Outputs explained

  • Accepted output: entities[].sink.accepted.path
  • Rejected output: entities[].sink.rejected.path
  • Reports: <report.path>/run_<run_id>/<entity.name>/run.json

Reports include per-entity JSON, a run summary, and key counters (rows, accepted/rejected, errors).

Report details: docs/report.md

Severity policy

  • warn: keep all rows and report violations
  • reject: reject only rows with violations; keep valid rows
  • abort: reject the entire file on first violation

Checks and policy details: docs/checks.md

Supported formats

Inputs:

  • CSV (local and S3)
  • Parquet (local and S3)
  • NDJSON (local and S3)

Outputs:

  • Accepted: Parquet, Delta
  • Rejected: CSV

Sink details:

Cloud integration and storages

Floe resolves all paths through a storage registry in the config. By default, paths use local://. To use cloud storage, define a storage (with credentials or bucket info) and reference it on source/sink. Currently only S3 is implemented; Google Cloud Storage, Azure Data Lake Storage, and dbfs:// (Databricks) are on the roadmap.

Example (S3 storage):

storages:
  default: local
  definitions:
    - name: local
      type: local
    - name: s3_raw
      type: s3
      bucket: my-bucket
      region: eu-west-1
      # credentials via standard AWS env vars or profile
entities:
  - name: customer
    source:
      storage: s3_raw
      path: raw/customer/

Storage guide: docs/storages/s3.md

Roadmap (near term)

  • Cloud integration for storage and compute
  • Python library release
  • Orchestrator integrations (Airflow, Dagster)
  • More input/output formats, including database sources and sinks
  • Data platform integrations (Databricks, Microsoft Fabric, Snowflake)

Feature tracking: docs/features.md

License

MIT