Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
etl-unit
A semantic data model for ETL — qualities and measurements over
subjects and time, built on Polars.
etl-unit separates the logical schema (what your data means) from
the physical data (what's in the parquet/CSV columns). You declare
the schema once, point it at any source whose physical columns can be
mapped to your canonical names, and the same downstream code (subset
requests, derivations, signal policies, time-window resampling) works
unchanged.
At a glance
EtlSchema + BoundSource(s) ──► Universe ──► SubsetRequest ──► DataFrame
(logical) (physical) (resolved) (filter/query) (results)
What's in the box
EtlSchema— declarative schema: subjects, qualities, measurements, derivations. Build programmatically viaEtlSchema::new(...)or load from a TOML file viafrom_toml_file().MeasurementUnit— a time-varying observation. Carries aSignalPolicy(how raw samples become a stable signal: instant vs. sliding-window) and asample_rate(native cadence).QualityUnit— static metadata about a subject (1:1 with subject).Derivation— shape-preserving derived measurements. Three axes:Pointwise— combine multiple units on the same(subject, time)row (e.g.any_on,count_non_zero, sum, ratio).OverTime— time-axis transforms within a subject (Derivative,RollingMean,Lag,Lead).OverSubjects— cross-subject transforms at a single time point (Rank,Quantile/Deciles,ZScore).
BoundSource— physical→canonical column binding. Supports direct mapping, computed columns, and unpivoting wide data into long form.Universe— the resolved data layer: all subjects, all measurements, per-source binding rules baked in. Cheap to query repeatedly.EtlUnitSubsetRequest— declarative query: filter subjects, filter qualities, pick measurements, time range, optional interval-bucketed reporting.SubsetUniverse— the materialized result: a PolarsDataFrameplus typed metadata (per-measurement stats, stage trace, interval stats, …).
Quick start
# Cargo.toml
[]
= "0.1"
= { = "0.51", = ["lazy", "dtype-datetime", "temporal"] }
Build a schema programmatically
use ;
let schema = new
.subject
.time
.measurement_with_defaults
.measurement_with_defaults
.measurement_with_defaults
.build?;
# Ok::
measurement_with_defaults sets a 60-second instant signal policy and a
60-second sample rate. For production, set them explicitly via
.with_policy(...) and .with_sample_rate(...) so you don't hide
configuration mistakes behind defaults.
Load a schema from TOML
use EtlSchema;
let schema = from_toml_file?;
# Ok::
# pump_station.toml
= "pump_station"
= "station_id"
= "observation_time"
[]
= "measure"
= "60s"
= { = "60s", = { = "instant" } }
[]
= "measure"
= "60s"
= { = "60s", = { = "instant" } }
[]
= "categorical"
= { = "any_on", = ["engine_1", "engine_2"] }
Measurements, qualities, and derivations are keyed by canonical name
([measurements.sump], not [[measurements]]).
TOML is a convenience layer
from_toml_file is a thin loader that uses the same EtlSchema types
underneath. For richer config (sources, chart hints, partitioning,
upstream fetchers, intent examples, …) you'll typically want a
config-layer crate above etl-unit that owns the TOML shape and
translates it into EtlSchema via the builder. The published
etl-unit-pipeline crate (planned) covers exactly that path.
Signal policies
Every measurement carries a SignalPolicy that tells the runtime how to
turn raw, irregularly-timed samples into a stable signal:
- Instant — keep the most recent value within
max_staleness; emitnullif no sample arrives in that window. - Sliding — apply an aggregation (mean / max / min / sum / first /
last) over a rolling window of duration
Drequiring at leastmin_samples.
Pair this with sample_rate (the measurement's native cadence) and
upsample / downsample strategies on the source, and the runtime will
align everything onto a regular time grid.
Composition
Multiple BoundSources under one EtlSchema build a single Universe.
Sources with different sampling rates, different physical column names,
even different parquet partition layouts compose into one logical view
keyed on (subject, time). The composition is declarative — bring your
own sources, the universe is resolved when you build it.
What this is not
- Not a query planner / SQL replacement — it's a thin semantic layer on Polars. The heavy lifting (LazyFrame, parallel execution, columnar arithmetic) is all Polars.
- Not a storage layer — bring your own parquet/CSV/in-memory data.
- Not async — the core is sync. (Source acquisition layers above
etl-unittypically are async.) - Not a chart renderer —
ChartHintsis metadata for downstream renderers; the renderer itself is out of scope.
License
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.