df-derive
df-derive derives fast conversions from Rust structs into Polars
DataFrames. The normal user-facing crate now includes a default runtime
trait surface, so most projects can write #[derive(ToDataFrame)] without a
local trait module or #[df_derive(trait = "...")] override.
What This Crate Does
Deriving ToDataFrame on structs and tuple structs generates
allocation-conscious code to:
- Convert a single value to a
polars::prelude::DataFrame - Convert slices through a columnar batch path
- Inspect generated column names and
DataTypes throughT::schema()
The derive supports nested structs flattened with dot notation, nullable
shapes with Option<T>, list shapes with Vec<T>, tuple structs,
tuple-typed fields, generic structs, borrowed fields, smart pointers, datetime
types, duration types, byte blobs, and decimal backends.
Quick Start
[]
= "0.3"
= "0.53"
# If your models use these types:
= { = "0.4", = ["serde"] }
= { = "1.42", = false, = ["std"] }
With the default df-derive facade, generated impls use hidden runtime
re-exports for implementation details such as polars-arrow; downstream
crates do not need to depend on polars-arrow directly. Keep polars direct
when your code names Polars types. The default runtime enables the Polars
dtype features required by the supported matrix below.
use *;
The default runtime API is available as df_derive::dataframe::*. The prelude
exports the derive macro plus ToDataFrame, Columnar, ToDataFrameVec, and
Decimal128Encode; it also exports the trait as ToDataFrameTrait for code
that wants an unambiguous type-namespace alias.
Crate Layout
This repository uses a serde-like three-crate architecture:
df-derive: the normal facade crate. It re-exports the derive macro fromdf-derive-macrosand the runtime API fromdf-derive-core.df-derive-core: a normal library crate that owns the shareddataframe::{ToDataFrame, Columnar, ToDataFrameVec, Decimal128Encode}trait identity, the()impls, and the optional referenceDecimal128Encode for rust_decimal::Decimalimpl.df-derive-macros: the proc-macro implementation. Power users can depend on this directly and targetdf-derive-core,paft, or a custom runtime.
Because df-derive-core owns the default trait identity, models derived in
different crates can compose as nested ToDataFrame types when they use the
facade/default runtime.
Generated API
For each struct or tuple struct T, the macro generates:
impl ToDataFrame for Tfn to_dataframe(&self) -> PolarsResult<DataFrame>fn empty_dataframe() -> PolarsResult<DataFrame>fn schema() -> PolarsResult<Vec<(String, DataType)>>
impl Columnar for Tfn columnar_to_dataframe(items: &[Self]) -> PolarsResult<DataFrame>fn columnar_from_refs(items: &[&Self]) -> PolarsResult<DataFrame>
The direct &[Self] method is generated so top-level slice conversion does
not allocate a temporary Vec<&Self>. The borrowed &[&Self] method remains
for nested and generic composition.
Supported Types And Shapes
Container and wrapper support:
- Named structs: each field becomes one or more columns.
- Nested structs: fields flatten recursively with dot notation.
- Vec of primitives and structs:
Vec<T>becomes a PolarsListcolumn;Vec<Nested>becomes one list column per nested field. Option<T>: scalar and list columns carry null validity.- Tuple structs: unnamed fields become
field_0,field_1, and so on. - Tuple-typed fields:
pair: (A, B)flattens topair.field_0,pair.field_1;Option<(A, B)>andVec<(A, B)>distribute the outer wrapper across the element columns. - Empty structs: an instance produces shape
(1, 0)and an empty slice produces shape(0, 0). - Generics: generic structs are supported; the macro injects the
necessary
ToDataFrame + Columnarbounds, plusDecimal128Encodefor generic parameters annotated withdecimal(...). - Transparent pointers:
Box<T>,Rc<T>,Arc<T>, borrowed references&T, andCow<'_, T>with a sized inner peel transparently and preserve the bare field's column shape and dtype.
Common leaf types:
- Primitives:
String,&str,bool, signed and unsigned integer types includingi128/u128andisize/usize,std::num::NonZero*integer types,f32, andf64. - Time:
chrono::DateTime<Tz>andchrono::NaiveDateTimeencode asDatetime(Milliseconds, None)by default; use#[df_derive(time_unit = "ms" | "us" | "ns")]to override.DateTime<Tz>values are encoded as UTC instants, so use#[df_derive(as_string)]if the textual timezone or offset matters. - Date and time-of-day:
chrono::NaiveDateencodes asDate, andchrono::NaiveTimeencodes asTime. These encodings are fixed and do not accepttime_unit. - Duration:
std::time::Duration,core::time::Duration, andchrono::Durationencode asDuration(Nanoseconds)by default; usetime_unitto choose milliseconds, microseconds, or nanoseconds. BareDurationis rejected as ambiguous. - Decimal: bare
Decimalandrust_decimal::Decimalencode asDecimal(38, 10)by default. Custom decimal backends opt in with#[df_derive(decimal(precision = N, scale = S))]. - Binary blobs:
#[df_derive(as_binary)]optsVec<u8>,&[u8], orCow<'_, [u8]>shapes into PolarsBinary; unannotatedVec<u8>remainsList(UInt8).
Dtype Support Matrix
The default df-derive facade and df-derive-core runtime enable the Polars
features in this table on their polars dependency. If you use
df-derive-macros with a custom runtime and no df-derive-core dependency,
enable the matching features on that runtime's direct polars dependency.
| Rust leaf family | Polars dtype emitted | Polars feature for custom runtimes |
|---|---|---|
bool |
Boolean |
none |
String, &str, as_str, as_string |
String |
none |
i8, NonZeroI8 |
Int8 |
dtype-i8 |
i16, NonZeroI16 |
Int16 |
dtype-i16 |
i32, i64, isize, matching NonZero* |
Int32 / Int64 |
none |
i128, NonZeroI128 |
Int128 |
dtype-i128 |
u8, NonZeroU8 |
UInt8 |
dtype-u8 |
u16, NonZeroU16 |
UInt16 |
dtype-u16 |
u32, u64, usize, matching NonZero* |
UInt32 / UInt64 |
none |
u128, NonZeroU128 |
UInt128 |
dtype-u128 |
f32, f64 |
Float32 / Float64 |
none |
chrono::DateTime<Tz>, chrono::NaiveDateTime |
Datetime |
dtype-datetime, plus timezones for timezone-aware values |
chrono::NaiveDate |
Date |
dtype-date |
chrono::NaiveTime |
Time |
dtype-time |
std::time::Duration, core::time::Duration, chrono::Duration |
Duration |
dtype-duration |
Decimal, rust_decimal::Decimal, custom decimal backends |
Decimal |
dtype-decimal |
#[df_derive(as_binary)] byte buffers |
Binary |
none |
Option<T>, Vec<T>, tuples, and nested structs preserve the leaf dtype;
each Vec layer wraps the leaf in List(...).
For Polars 0.53, dtype-decimal enables the decimal column machinery and its
internal Int128 backing path. You only need an explicit dtype-i128 feature
when your derived structs expose i128 / NonZeroI128 fields as Int128
columns.
Useful field attributes:
#[df_derive(skip)]: omit a field from generated schema and DataFrame output.#[df_derive(as_string)]: format values withDisplayinto a string column using a reused scratch buffer.#[df_derive(as_str)]: borrow viaAsRef<str>withoutDisplayformatting or an intermediate scratch buffer.#[df_derive(as_binary)]: encode byte-buffer shapes as Binary.#[df_derive(decimal(precision = N, scale = S))]: choose a decimal dtype or opt a custom decimal backend intoDecimal128Encode.#[df_derive(time_unit = "ms" | "us" | "ns")]: choose datetime or duration units.
skip is useful for caches, source metadata, handles, or unsupported helper
fields that should remain on the Rust struct but not become DataFrame columns.
It is mutually exclusive with conversion attributes because skipped fields are
not analyzed or emitted. Tuple struct fields can be skipped too; remaining
tuple columns keep their original field_{index} names.
as_string is useful for enums or validated newtypes that should appear as
string columns. It formats each value into a reusable String scratch buffer
before pushing the resulting &str into the column builder; the builder still
copies bytes into the output column, and the scratch can grow to fit the
largest formatted value. If a field already implements AsRef<str>, prefer
as_str: it borrows through the same columnar buffer used for bare
String/&str fields and skips both Display formatting and the scratch
buffer. The two attributes are mutually exclusive.
as_binary accepts Vec<u8>, Option<Vec<u8>>, Vec<Vec<u8>>,
Vec<Option<Vec<u8>>>, Option<Vec<Vec<u8>>>, and the same shapes over
&[u8] and Cow<'_, [u8]>. Bare u8, Option<u8>,
Vec<Option<u8>>, non-u8 leaves, and String are rejected. The binary
attribute is mutually exclusive with as_str, as_string, decimal(...),
and time_unit.
Enums and unions are not supported as derive targets; use as_string or
as_str on enum fields. Direct fields of type () are rejected, but () is
supported as a generic payload and contributes zero columns.
Tuple fields cannot carry field-level conversion attributes such as as_str,
as_binary, decimal(...), or time_unit; hoist that value into a named
struct when you need an attributed field. Nested tuples inside an outer
Option or Vec are rejected for now; use a named struct for those shapes.
Column Naming
- Named struct fields use the Rust field name, such as
symbol. - Nested structs use dot notation recursively, such as
address.city. Vec<Nested>fields use the outer field plus nested field name, such asquotes.close.- Tuple-typed fields use
field.field_0,field.field_1, and recurse for unwrapped nested tuples. - Tuple structs use
field_0,field_1, and so on.
Limitations And Guidance
- Maps such as
HashMap<_, _>andBTreeMap<_, _>are not supported; useVec<(K, V)>or a named row struct when you need a tabular representation. - Sets such as
HashSet<_>andBTreeSet<_>are not supported; useVec<T>when you need a list representation. - Sequence collections such as
VecDeque<T>andLinkedList<T>are not supported; useVec<T>instead. - All nested custom structs must also derive
ToDataFrame. - Obvious direct self-recursive nested fields using
Self, the bare deriving type name,self::Type, orcrate::Typeare rejected after transparent wrapper peeling, including shapes such asNode,Box<Node>,Option<Box<Node>>, and tuple fields containing the same. Use identifier fields or a separate flat representation for recursive data structures. - Consecutive
Optionlayers above aVeccollapse to one list-level validity bit, soNoneandSome(None)are indistinguishable in the resulting list column. - Borrowed byte slices and
Cow<'_, [u8]>require#[df_derive(as_binary)]; other borrowed slice forms are rejected. UseVec<T>for list columns.
Runtime Discovery And Overrides
Explicit container attributes always win:
If only trait = "x::ToDataFrame" is provided, the macro infers
x::Columnar and x::Decimal128Encode unless those paths are explicitly
overridden.
Explicit paths to the built-in facade/core runtimes,
df_derive::dataframe::ToDataFrame or
df_derive_core::dataframe::ToDataFrame (including dependency renames), still
use the default-runtime dependency roots from that same dataframe module's
hidden __private re-exports. They do not require a direct polars-arrow
dependency just because the trait path was written explicitly.
columnar = "..." must be paired with trait = "..."; a standalone
Columnar override would create mixed runtime impls that are incompatible
with both runtimes' ToDataFrameVec extension traits.
Explicit trait + columnar pairs also cannot mix the built-in
df_derive/df_derive_core dataframe runtime with a custom runtime. Use the
matching built-in Columnar path, omit columnar so it is inferred from the
built-in trait, or provide a fully custom pair.
Without overrides, the macro discovers a dataframe module in this order:
df_derive::dataframedf_derive_core::dataframepaft_utils::dataframepaft::dataframecrate::core::dataframe
Discovery uses proc_macro_crate::crate_name, so dependency renames are
respected. For example, a dependency declared as
dfd = { package = "df-derive", version = "0.3" } is emitted as
::dfd::dataframe.
The final crate::core::dataframe fallback is for legacy/local runtimes in
crates that use df-derive-macros directly without df-derive,
df-derive-core, paft-utils, or paft. Any runtime reached by this default
discovery path must expose dataframe::__private::{polars, polars_arrow} for
generated-code dependency roots.
Power-User Runtime Choices
Use the facade for the default runtime:
use *;
Use the macro crate directly with the shared core runtime:
[]
= "0.3"
= "0.3"
= "0.53"
use ;
use ToDataFrame;
Use a custom runtime by providing compatible traits and overriding paths.
Outside the built-in facade/core paths described above, custom runtimes
selected with #[df_derive(trait = "...")] must name a compatible direct
polars dependency. They also need a compatible direct polars-arrow
dependency when the derived fields use shapes that require generated Arrow
array builders, such as list, nullable primitive, string, or binary columns.
Scalar-only numeric/bool derives do not need polars-arrow. The minimum trait
surface is:
Decimal Backends
df-derive-core provides Decimal128Encode for rust_decimal::Decimal behind
the rust_decimal feature, which is enabled by default on both df-derive
and df-derive-core.
To disable it:
= { = "0.3", = false }
Custom decimal backends should implement Decimal128Encode and use
#[df_derive(decimal(precision = N, scale = S))] on fields that should be
encoded as Polars decimal columns. Implementations must return an i128
mantissa rescaled to the requested scale, using round-half-to-even when
scaling down. Returning None surfaces as a Polars compute error. The
generated code verifies that the returned mantissa fits the declared precision
before constructing the Polars decimal column.
Unannotated decimal detection is syntax-based. A procedural macro receives
tokens, not rustc's resolved type information, so bare Decimal and canonical
rust_decimal::Decimal are treated as decimals automatically. Qualified paths
such as domain::Decimal are treated as nested custom structs unless you opt
them into decimal encoding with decimal(...).
Temporal detection is syntax-based for the same reason. Bare or canonical
chrono::NaiveDate, chrono::NaiveTime, chrono::NaiveDateTime,
chrono::DateTime<Tz>, chrono::Duration, and chrono::TimeDelta are treated
as temporal types, along with std::time::Duration and
core::time::Duration. Qualified domain paths such as domain::NaiveDate
remain custom structs.
If your decimal trait lives somewhere other than the discovered runtime module, point at it explicitly:
Compatibility
- Rust edition: 2024
- Minimum supported Rust version: 1.90. This is above the edition's 1.85 floor because the Polars 0.53 dependency graph uses language features that first compile on Rust 1.90.
- Polars: 0.53
- polars-arrow: 0.53 through the default runtime facade. Custom runtimes selected with explicit trait overrides need a compatible direct dependency only for derived field shapes that emit public Arrow array builders; explicit facade/core runtime paths keep using the hidden default-runtime re-export.
- Polars feature flags: the default
df-derivefacade anddf-derive-coreruntime enable every Polars dtype flag required by the support matrix above. If you usedf-derive-macroswith a custom runtime and nodf-derive-coredependency, enable the matching Polars feature flags on that runtime'spolarsdependency.
Performance Notes
Using df_derive::dataframe::Columnar instead of paft::dataframe::Columnar
has no inherent runtime performance penalty. The macro generates the hot
column-building code at the impl site either way; the runtime path only
selects which trait receives the impl.
The generated columnar_to_dataframe(&[Self]) path avoids the old top-level
Vec<&Self> allocation. Nested and generic emitters still use
columnar_from_refs(&[&Self]) so borrowed composition remains clone-free.
The generated hot path is shape-dependent. Primitive scalar fields are populated in one row loop. Nested fields collect references and call the nested type's columnar implementation, so each nested field may add a scan over the outer items. Tuple-typed fields are emitted per projection path, so tuple elements may each add their own scan; Vec-bearing tuple projections also scan the outer items to build offsets, validity, and leaf buffers. This cost model matters most for wide nested schemas and tuple-heavy shapes.
Criterion benches in df-derive/benches/ cover wide rows, nested structs,
deep Vec shapes, decimals, strings, borrowed data, tuple fields, and targeted
tuple-heavy / nested-heavy cost-model shapes.
Performance is continuously monitored with Bencher.
Examples
Run any example with:
Available examples:
quickstart: basic usage with single values and slices.nested: nested structs flattened with dot notation.vec_custom:Vec<T>fields and custom nested structs as list columns.tuple: tuple structs andfield_0/field_1naming.datetime_decimal: chrono datetime values andrust_decimal::Decimal.as_string:#[df_derive(as_string)]for enums and custom values.generics: generic structs, default type parameters, and()payloads.nested_options: nested optional structs.deep_vec: deepVec<Vec<Vec<T>>>list nesting.multi_option_vec: multipleOptionlayers above aVec.nested_generics: generic structs used as nested fields and list items.
License
MIT. See LICENSE.