df-derive-core 0.3.0

Shared dataframe runtime trait identity for df-derive.
Documentation

df-derive

Crates.io Docs.rs CI Downloads License

df-derive derives fast conversions from Rust structs into Polars DataFrames. The normal user-facing crate now includes a default runtime trait surface, so most projects can write #[derive(ToDataFrame)] without a local trait module or #[df_derive(trait = "...")] override.

What This Crate Does

Deriving ToDataFrame on structs and tuple structs generates allocation-conscious code to:

  • Convert a single value to a polars::prelude::DataFrame
  • Convert slices through a columnar batch path
  • Inspect generated column names and DataTypes through T::schema()

The derive supports nested structs flattened with dot notation, nullable shapes with Option<T>, list shapes with Vec<T>, tuple structs, tuple-typed fields, generic structs, borrowed fields, smart pointers, datetime types, duration types, byte blobs, and decimal backends.

Quick Start

[dependencies]
df-derive = "0.3"
polars = "0.53"

# If your models use these types:
chrono = { version = "0.4", features = ["serde"] }
rust_decimal = { version = "1.42", default-features = false, features = ["std"] }

With the default df-derive facade, generated impls use hidden runtime re-exports for implementation details such as polars-arrow; downstream crates do not need to depend on polars-arrow directly. Keep polars direct when your code names Polars types. The default runtime enables the Polars dtype features required by the supported matrix below.

use df_derive::prelude::*;

#[derive(ToDataFrame)]
struct Trade {
    symbol: String,
    price: f64,
    size: u64,
}

fn main() -> polars::prelude::PolarsResult<()> {
    let rows = vec![
        Trade { symbol: "AAPL".into(), price: 187.23, size: 100 },
        Trade { symbol: "MSFT".into(), price: 411.61, size: 200 },
    ];

    let df = rows.as_slice().to_dataframe()?;
    println!("{df}");
    Ok(())
}

The default runtime API is available as df_derive::dataframe::*. The prelude exports the derive macro plus ToDataFrame, Columnar, ToDataFrameVec, and Decimal128Encode; it also exports the trait as ToDataFrameTrait for code that wants an unambiguous type-namespace alias.

Crate Layout

This repository uses a serde-like three-crate architecture:

  • df-derive: the normal facade crate. It re-exports the derive macro from df-derive-macros and the runtime API from df-derive-core.
  • df-derive-core: a normal library crate that owns the shared dataframe::{ToDataFrame, Columnar, ToDataFrameVec, Decimal128Encode} trait identity, the () impls, and the optional reference Decimal128Encode for rust_decimal::Decimal impl.
  • df-derive-macros: the proc-macro implementation. Power users can depend on this directly and target df-derive-core, paft, or a custom runtime.

Because df-derive-core owns the default trait identity, models derived in different crates can compose as nested ToDataFrame types when they use the facade/default runtime.

Generated API

For each struct or tuple struct T, the macro generates:

  • impl ToDataFrame for T
    • fn to_dataframe(&self) -> PolarsResult<DataFrame>
    • fn empty_dataframe() -> PolarsResult<DataFrame>
    • fn schema() -> PolarsResult<Vec<(String, DataType)>>
  • impl Columnar for T
    • fn columnar_to_dataframe(items: &[Self]) -> PolarsResult<DataFrame>
    • fn columnar_from_refs(items: &[&Self]) -> PolarsResult<DataFrame>

The direct &[Self] method is generated so top-level slice conversion does not allocate a temporary Vec<&Self>. The borrowed &[&Self] method remains for nested and generic composition.

Supported Types And Shapes

Container and wrapper support:

  • Named structs: each field becomes one or more columns.
  • Nested structs: fields flatten recursively with dot notation.
  • Vec of primitives and structs: Vec<T> becomes a Polars List column; Vec<Nested> becomes one list column per nested field.
  • Option<T>: scalar and list columns carry null validity.
  • Tuple structs: unnamed fields become field_0, field_1, and so on.
  • Tuple-typed fields: pair: (A, B) flattens to pair.field_0, pair.field_1; Option<(A, B)> and Vec<(A, B)> distribute the outer wrapper across the element columns.
  • Empty structs: an instance produces shape (1, 0) and an empty slice produces shape (0, 0).
  • Generics: generic structs are supported; the macro injects the necessary ToDataFrame + Columnar bounds, plus Decimal128Encode for generic parameters annotated with decimal(...).
  • Transparent pointers: Box<T>, Rc<T>, Arc<T>, borrowed references &T, and Cow<'_, T> with a sized inner peel transparently and preserve the bare field's column shape and dtype.

Common leaf types:

  • Primitives: String, &str, bool, signed and unsigned integer types including i128/u128 and isize/usize, std::num::NonZero* integer types, f32, and f64.
  • Time: chrono::DateTime<Tz> and chrono::NaiveDateTime encode as Datetime(Milliseconds, None) by default; use #[df_derive(time_unit = "ms" | "us" | "ns")] to override. DateTime<Tz> values are encoded as UTC instants, so use #[df_derive(as_string)] if the textual timezone or offset matters.
  • Date and time-of-day: chrono::NaiveDate encodes as Date, and chrono::NaiveTime encodes as Time. These encodings are fixed and do not accept time_unit.
  • Duration: std::time::Duration, core::time::Duration, and chrono::Duration encode as Duration(Nanoseconds) by default; use time_unit to choose milliseconds, microseconds, or nanoseconds. Bare Duration is rejected as ambiguous.
  • Decimal: bare Decimal and rust_decimal::Decimal encode as Decimal(38, 10) by default. Custom decimal backends opt in with #[df_derive(decimal(precision = N, scale = S))].
  • Binary blobs: #[df_derive(as_binary)] opts Vec<u8>, &[u8], or Cow<'_, [u8]> shapes into Polars Binary; unannotated Vec<u8> remains List(UInt8).

Dtype Support Matrix

The default df-derive facade and df-derive-core runtime enable the Polars features in this table on their polars dependency. If you use df-derive-macros with a custom runtime and no df-derive-core dependency, enable the matching features on that runtime's direct polars dependency.

Rust leaf family Polars dtype emitted Polars feature for custom runtimes
bool Boolean none
String, &str, as_str, as_string String none
i8, NonZeroI8 Int8 dtype-i8
i16, NonZeroI16 Int16 dtype-i16
i32, i64, isize, matching NonZero* Int32 / Int64 none
i128, NonZeroI128 Int128 dtype-i128
u8, NonZeroU8 UInt8 dtype-u8
u16, NonZeroU16 UInt16 dtype-u16
u32, u64, usize, matching NonZero* UInt32 / UInt64 none
u128, NonZeroU128 UInt128 dtype-u128
f32, f64 Float32 / Float64 none
chrono::DateTime<Tz>, chrono::NaiveDateTime Datetime dtype-datetime, plus timezones for timezone-aware values
chrono::NaiveDate Date dtype-date
chrono::NaiveTime Time dtype-time
std::time::Duration, core::time::Duration, chrono::Duration Duration dtype-duration
Decimal, rust_decimal::Decimal, custom decimal backends Decimal dtype-decimal
#[df_derive(as_binary)] byte buffers Binary none

Option<T>, Vec<T>, tuples, and nested structs preserve the leaf dtype; each Vec layer wraps the leaf in List(...).

For Polars 0.53, dtype-decimal enables the decimal column machinery and its internal Int128 backing path. You only need an explicit dtype-i128 feature when your derived structs expose i128 / NonZeroI128 fields as Int128 columns.

Useful field attributes:

  • #[df_derive(skip)]: omit a field from generated schema and DataFrame output.
  • #[df_derive(as_string)]: format values with Display into a string column using a reused scratch buffer.
  • #[df_derive(as_str)]: borrow via AsRef<str> without Display formatting or an intermediate scratch buffer.
  • #[df_derive(as_binary)]: encode byte-buffer shapes as Binary.
  • #[df_derive(decimal(precision = N, scale = S))]: choose a decimal dtype or opt a custom decimal backend into Decimal128Encode.
  • #[df_derive(time_unit = "ms" | "us" | "ns")]: choose datetime or duration units.

skip is useful for caches, source metadata, handles, or unsupported helper fields that should remain on the Rust struct but not become DataFrame columns. It is mutually exclusive with conversion attributes because skipped fields are not analyzed or emitted. Tuple struct fields can be skipped too; remaining tuple columns keep their original field_{index} names.

as_string is useful for enums or validated newtypes that should appear as string columns. It formats each value into a reusable String scratch buffer before pushing the resulting &str into the column builder; the builder still copies bytes into the output column, and the scratch can grow to fit the largest formatted value. If a field already implements AsRef<str>, prefer as_str: it borrows through the same columnar buffer used for bare String/&str fields and skips both Display formatting and the scratch buffer. The two attributes are mutually exclusive.

as_binary accepts Vec<u8>, Option<Vec<u8>>, Vec<Vec<u8>>, Vec<Option<Vec<u8>>>, Option<Vec<Vec<u8>>>, and the same shapes over &[u8] and Cow<'_, [u8]>. Bare u8, Option<u8>, Vec<Option<u8>>, non-u8 leaves, and String are rejected. The binary attribute is mutually exclusive with as_str, as_string, decimal(...), and time_unit.

Enums and unions are not supported as derive targets; use as_string or as_str on enum fields. Direct fields of type () are rejected, but () is supported as a generic payload and contributes zero columns.

Tuple fields cannot carry field-level conversion attributes such as as_str, as_binary, decimal(...), or time_unit; hoist that value into a named struct when you need an attributed field. Nested tuples inside an outer Option or Vec are rejected for now; use a named struct for those shapes.

Column Naming

  • Named struct fields use the Rust field name, such as symbol.
  • Nested structs use dot notation recursively, such as address.city.
  • Vec<Nested> fields use the outer field plus nested field name, such as quotes.close.
  • Tuple-typed fields use field.field_0, field.field_1, and recurse for unwrapped nested tuples.
  • Tuple structs use field_0, field_1, and so on.

Limitations And Guidance

  • Maps such as HashMap<_, _> and BTreeMap<_, _> are not supported; use Vec<(K, V)> or a named row struct when you need a tabular representation.
  • Sets such as HashSet<_> and BTreeSet<_> are not supported; use Vec<T> when you need a list representation.
  • Sequence collections such as VecDeque<T> and LinkedList<T> are not supported; use Vec<T> instead.
  • All nested custom structs must also derive ToDataFrame.
  • Obvious direct self-recursive nested fields using Self, the bare deriving type name, self::Type, or crate::Type are rejected after transparent wrapper peeling, including shapes such as Node, Box<Node>, Option<Box<Node>>, and tuple fields containing the same. Use identifier fields or a separate flat representation for recursive data structures.
  • Consecutive Option layers above a Vec collapse to one list-level validity bit, so None and Some(None) are indistinguishable in the resulting list column.
  • Borrowed byte slices and Cow<'_, [u8]> require #[df_derive(as_binary)]; other borrowed slice forms are rejected. Use Vec<T> for list columns.

Runtime Discovery And Overrides

Explicit container attributes always win:

#[derive(df_derive::ToDataFrame)]
#[df_derive(
    trait = "my_runtime::dataframe::ToDataFrame",
    columnar = "my_runtime::dataframe::Columnar",
    decimal128_encode = "my_runtime::dataframe::Decimal128Encode",
)]
struct Row {
    amount: MyDecimal,
}

If only trait = "x::ToDataFrame" is provided, the macro infers x::Columnar and x::Decimal128Encode unless those paths are explicitly overridden.

Explicit paths to the built-in facade/core runtimes, df_derive::dataframe::ToDataFrame or df_derive_core::dataframe::ToDataFrame (including dependency renames), still use the default-runtime dependency roots from that same dataframe module's hidden __private re-exports. They do not require a direct polars-arrow dependency just because the trait path was written explicitly.

columnar = "..." must be paired with trait = "..."; a standalone Columnar override would create mixed runtime impls that are incompatible with both runtimes' ToDataFrameVec extension traits. Explicit trait + columnar pairs also cannot mix the built-in df_derive/df_derive_core dataframe runtime with a custom runtime. Use the matching built-in Columnar path, omit columnar so it is inferred from the built-in trait, or provide a fully custom pair.

Without overrides, the macro discovers a dataframe module in this order:

  1. df_derive::dataframe
  2. df_derive_core::dataframe
  3. paft_utils::dataframe
  4. paft::dataframe
  5. crate::core::dataframe

Discovery uses proc_macro_crate::crate_name, so dependency renames are respected. For example, a dependency declared as dfd = { package = "df-derive", version = "0.3" } is emitted as ::dfd::dataframe.

The final crate::core::dataframe fallback is for legacy/local runtimes in crates that use df-derive-macros directly without df-derive, df-derive-core, paft-utils, or paft. Any runtime reached by this default discovery path must expose dataframe::__private::{polars, polars_arrow} for generated-code dependency roots.

Power-User Runtime Choices

Use the facade for the default runtime:

use df_derive::prelude::*;

#[derive(ToDataFrame)]
struct Row {
    id: u32,
}

Use the macro crate directly with the shared core runtime:

[dependencies]
df-derive-core = "0.3"
df-derive-macros = "0.3"
polars = "0.53"
use df_derive_core::dataframe::{ToDataFrame as _, ToDataFrameVec as _};
use df_derive_macros::ToDataFrame;

#[derive(ToDataFrame)]
struct Row {
    id: u32,
}

Use a custom runtime by providing compatible traits and overriding paths. Outside the built-in facade/core paths described above, custom runtimes selected with #[df_derive(trait = "...")] must name a compatible direct polars dependency. They also need a compatible direct polars-arrow dependency when the derived fields use shapes that require generated Arrow array builders, such as list, nullable primitive, string, or binary columns. Scalar-only numeric/bool derives do not need polars-arrow. The minimum trait surface is:

mod runtime {
    pub mod dataframe {
        use polars::prelude::{DataFrame, DataType, PolarsResult};

        pub trait ToDataFrame {
            fn to_dataframe(&self) -> PolarsResult<DataFrame>;
            fn empty_dataframe() -> PolarsResult<DataFrame>;
            fn schema() -> PolarsResult<Vec<(String, DataType)>>;
        }

        pub trait Columnar: Sized {
            fn columnar_to_dataframe(items: &[Self]) -> PolarsResult<DataFrame> {
                let refs: Vec<&Self> = items.iter().collect();
                Self::columnar_from_refs(&refs)
            }

            fn columnar_from_refs(items: &[&Self]) -> PolarsResult<DataFrame>;
        }

        pub trait Decimal128Encode {
            fn try_to_i128_mantissa(&self, target_scale: u32) -> Option<i128>;
        }
    }
}

Decimal Backends

df-derive-core provides Decimal128Encode for rust_decimal::Decimal behind the rust_decimal feature, which is enabled by default on both df-derive and df-derive-core.

To disable it:

df-derive = { version = "0.3", default-features = false }

Custom decimal backends should implement Decimal128Encode and use #[df_derive(decimal(precision = N, scale = S))] on fields that should be encoded as Polars decimal columns. Implementations must return an i128 mantissa rescaled to the requested scale, using round-half-to-even when scaling down. Returning None surfaces as a Polars compute error. The generated code verifies that the returned mantissa fits the declared precision before constructing the Polars decimal column.

Unannotated decimal detection is syntax-based. A procedural macro receives tokens, not rustc's resolved type information, so bare Decimal and canonical rust_decimal::Decimal are treated as decimals automatically. Qualified paths such as domain::Decimal are treated as nested custom structs unless you opt them into decimal encoding with decimal(...).

Temporal detection is syntax-based for the same reason. Bare or canonical chrono::NaiveDate, chrono::NaiveTime, chrono::NaiveDateTime, chrono::DateTime<Tz>, chrono::Duration, and chrono::TimeDelta are treated as temporal types, along with std::time::Duration and core::time::Duration. Qualified domain paths such as domain::NaiveDate remain custom structs.

If your decimal trait lives somewhere other than the discovered runtime module, point at it explicitly:

#[derive(df_derive::ToDataFrame)]
#[df_derive(
    trait = "my_runtime::dataframe::ToDataFrame",
    decimal128_encode = "my_runtime::decimal_backend::Decimal128Encode",
)]
struct Tx {
    #[df_derive(decimal(precision = 38, scale = 10))]
    amount: MyDecimal,
}

Compatibility

  • Rust edition: 2024
  • Minimum supported Rust version: 1.90. This is above the edition's 1.85 floor because the Polars 0.53 dependency graph uses language features that first compile on Rust 1.90.
  • Polars: 0.53
  • polars-arrow: 0.53 through the default runtime facade. Custom runtimes selected with explicit trait overrides need a compatible direct dependency only for derived field shapes that emit public Arrow array builders; explicit facade/core runtime paths keep using the hidden default-runtime re-export.
  • Polars feature flags: the default df-derive facade and df-derive-core runtime enable every Polars dtype flag required by the support matrix above. If you use df-derive-macros with a custom runtime and no df-derive-core dependency, enable the matching Polars feature flags on that runtime's polars dependency.

Performance Notes

Using df_derive::dataframe::Columnar instead of paft::dataframe::Columnar has no inherent runtime performance penalty. The macro generates the hot column-building code at the impl site either way; the runtime path only selects which trait receives the impl.

The generated columnar_to_dataframe(&[Self]) path avoids the old top-level Vec<&Self> allocation. Nested and generic emitters still use columnar_from_refs(&[&Self]) so borrowed composition remains clone-free.

The generated hot path is shape-dependent. Primitive scalar fields are populated in one row loop. Nested fields collect references and call the nested type's columnar implementation, so each nested field may add a scan over the outer items. Tuple-typed fields are emitted per projection path, so tuple elements may each add their own scan; Vec-bearing tuple projections also scan the outer items to build offsets, validity, and leaf buffers. This cost model matters most for wide nested schemas and tuple-heavy shapes.

Criterion benches in df-derive/benches/ cover wide rows, nested structs, deep Vec shapes, decimals, strings, borrowed data, tuple fields, and targeted tuple-heavy / nested-heavy cost-model shapes.

Performance is continuously monitored with Bencher.

Examples

Run any example with:

cargo run -p df-derive --example quickstart
cargo run -p df-derive --example <example_name>

Available examples:

  • quickstart: basic usage with single values and slices.
  • nested: nested structs flattened with dot notation.
  • vec_custom: Vec<T> fields and custom nested structs as list columns.
  • tuple: tuple structs and field_0/field_1 naming.
  • datetime_decimal: chrono datetime values and rust_decimal::Decimal.
  • as_string: #[df_derive(as_string)] for enums and custom values.
  • generics: generic structs, default type parameters, and () payloads.
  • nested_options: nested optional structs.
  • deep_vec: deep Vec<Vec<Vec<T>>> list nesting.
  • multi_option_vec: multiple Option layers above a Vec.
  • nested_generics: generic structs used as nested fields and list items.

License

MIT. See LICENSE.