floe-cli 0.1.4

CLI for Floe, a YAML-driven technical ingestion tool.
floe-cli-0.1.4 is not a library.

Floe

Floe is a YAML-driven technical ingestion tool for single-node, medium-sized datasets. It targets data engineering workflows where raw files (initially CSV) are ingested into a clean, typed dataset with simple technical rules:

  • Schema enforcement (types + nullable)
  • Data quality checks (not_null, unique, cast mismatch)
  • Row-level rejection by default
  • Run report (JSON) with counts and aggregated violations

Floe is intentionally not a distributed engine and is not meant to replace Spark. This repository is a learning project in Rust, with a working core pipeline that is intentionally small and readable.

What you can do today

  • Validate configs with floe validate
  • Run local CSV ingestion with floe run
  • Emit accepted/rejected outputs
  • Generate per-entity run reports

Non-goals (for now)

  • Distributed execution or orchestration
  • Advanced rule engines or UDFs
  • Multi-format IO beyond CSV input + parquet output
  • Incremental state beyond a single run

Repository layout

  • crates/floe-core/: core library (config parsing, checks, IO, reporting)
  • crates/floe-cli/: CLI interface
  • docs/: docs for checks, reports, CLI, and features
  • example/: sample configs and input data

Quick start

floe validate -c example/config.yml --entities customer
floe run -c example/config.yml --entities customer

Docs

  • Checks: docs/checks.md
  • Config: docs/config.md
  • Reports: docs/report.md
  • CLI: docs/cli.md
  • Features: docs/features.md
  • Release: docs/release.md
  • Docs index: see the list above for the current, version-agnostic docs set.

License

MIT