floe-cli-0.1.4 is not a library.
Floe
Floe is a YAML-driven technical ingestion tool for single-node, medium-sized datasets. It targets data engineering workflows where raw files (initially CSV) are ingested into a clean, typed dataset with simple technical rules:
- Schema enforcement (types + nullable)
- Data quality checks (not_null, unique, cast mismatch)
- Row-level rejection by default
- Run report (JSON) with counts and aggregated violations
Floe is intentionally not a distributed engine and is not meant to replace Spark. This repository is a learning project in Rust, with a working core pipeline that is intentionally small and readable.
What you can do today
- Validate configs with
floe validate - Run local CSV ingestion with
floe run - Emit accepted/rejected outputs
- Generate per-entity run reports
Non-goals (for now)
- Distributed execution or orchestration
- Advanced rule engines or UDFs
- Multi-format IO beyond CSV input + parquet output
- Incremental state beyond a single run
Repository layout
crates/floe-core/: core library (config parsing, checks, IO, reporting)crates/floe-cli/: CLI interfacedocs/: docs for checks, reports, CLI, and featuresexample/: sample configs and input data
Quick start
Docs
- Checks:
docs/checks.md - Config:
docs/config.md - Reports:
docs/report.md - CLI:
docs/cli.md - Features:
docs/features.md - Release:
docs/release.md - Docs index: see the list above for the current, version-agnostic docs set.
License
MIT