df-derive-macros 0.3.0

Procedural derive macro implementation for df-derive.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
# df-derive

[![Crates.io](https://img.shields.io/crates/v/df-derive.svg)](https://crates.io/crates/df-derive)
[![Docs.rs](https://docs.rs/df-derive/badge.svg)](https://docs.rs/df-derive)
[![CI](https://github.com/gramistella/df-derive/actions/workflows/ci.yml/badge.svg)](https://github.com/gramistella/df-derive/actions/workflows/ci.yml)
[![Downloads](https://img.shields.io/crates/d/df-derive)](https://crates.io/crates/df-derive)
[![License](https://img.shields.io/crates/l/df-derive)](LICENSE)

`df-derive` derives fast conversions from Rust structs into Polars
`DataFrame`s. The normal user-facing crate now includes a default runtime
trait surface, so most projects can write `#[derive(ToDataFrame)]` without a
local trait module or `#[df_derive(trait = "...")]` override.

## What This Crate Does

Deriving `ToDataFrame` on structs and tuple structs generates
allocation-conscious code to:

- Convert a single value to a `polars::prelude::DataFrame`
- Convert slices through a columnar batch path
- Inspect generated column names and `DataType`s through `T::schema()`

The derive supports nested structs flattened with dot notation, nullable
shapes with `Option<T>`, list shapes with `Vec<T>`, tuple structs,
tuple-typed fields, generic structs, borrowed fields, smart pointers, datetime
types, duration types, byte blobs, and decimal backends.

## Quick Start

```toml
[dependencies]
df-derive = "0.3"
polars = "0.53"

# If your models use these types:
chrono = { version = "0.4", features = ["serde"] }
rust_decimal = { version = "1.42", default-features = false, features = ["std"] }
```

With the default `df-derive` facade, generated impls use hidden runtime
re-exports for implementation details such as `polars-arrow`; downstream
crates do not need to depend on `polars-arrow` directly. Keep `polars` direct
when your code names Polars types. The default runtime enables the Polars
dtype features required by the supported matrix below.

```rust
use df_derive::prelude::*;

#[derive(ToDataFrame)]
struct Trade {
    symbol: String,
    price: f64,
    size: u64,
}

fn main() -> polars::prelude::PolarsResult<()> {
    let rows = vec![
        Trade { symbol: "AAPL".into(), price: 187.23, size: 100 },
        Trade { symbol: "MSFT".into(), price: 411.61, size: 200 },
    ];

    let df = rows.as_slice().to_dataframe()?;
    println!("{df}");
    Ok(())
}
```

The default runtime API is available as `df_derive::dataframe::*`. The prelude
exports the derive macro plus `ToDataFrame`, `Columnar`, `ToDataFrameVec`, and
`Decimal128Encode`; it also exports the trait as `ToDataFrameTrait` for code
that wants an unambiguous type-namespace alias.

## Crate Layout

This repository uses a serde-like three-crate architecture:

- `df-derive`: the normal facade crate. It re-exports the derive macro from
  `df-derive-macros` and the runtime API from `df-derive-core`.
- `df-derive-core`: a normal library crate that owns the shared
  `dataframe::{ToDataFrame, Columnar, ToDataFrameVec, Decimal128Encode}` trait
  identity, the `()` impls, and the optional reference
  `Decimal128Encode for rust_decimal::Decimal` impl.
- `df-derive-macros`: the proc-macro implementation. Power users can depend
  on this directly and target `df-derive-core`, `paft`, or a custom runtime.

Because `df-derive-core` owns the default trait identity, models derived in
different crates can compose as nested `ToDataFrame` types when they use the
facade/default runtime.

## Generated API

For each struct or tuple struct `T`, the macro generates:

- `impl ToDataFrame for T`
  - `fn to_dataframe(&self) -> PolarsResult<DataFrame>`
  - `fn empty_dataframe() -> PolarsResult<DataFrame>`
  - `fn schema() -> PolarsResult<Vec<(String, DataType)>>`
- `impl Columnar for T`
  - `fn columnar_to_dataframe(items: &[Self]) -> PolarsResult<DataFrame>`
  - `fn columnar_from_refs(items: &[&Self]) -> PolarsResult<DataFrame>`

The direct `&[Self]` method is generated so top-level slice conversion does
not allocate a temporary `Vec<&Self>`. The borrowed `&[&Self]` method remains
for nested and generic composition.

## Supported Types And Shapes

Container and wrapper support:

- **Named structs**: each field becomes one or more columns.
- **Nested structs**: fields flatten recursively with dot notation.
- **Vec of primitives and structs**: `Vec<T>` becomes a Polars `List` column;
  `Vec<Nested>` becomes one list column per nested field.
- **`Option<T>`**: scalar and list columns carry null validity.
- **Tuple structs**: unnamed fields become `field_0`, `field_1`, and so on.
- **Tuple-typed fields**: `pair: (A, B)` flattens to
  `pair.field_0`, `pair.field_1`; `Option<(A, B)>` and `Vec<(A, B)>`
  distribute the outer wrapper across the element columns.
- **Empty structs**: an instance produces shape `(1, 0)` and an empty slice
  produces shape `(0, 0)`.
- **Generics**: generic structs are supported; the macro injects the
  necessary `ToDataFrame + Columnar` bounds, plus `Decimal128Encode` for
  generic parameters annotated with `decimal(...)`.
- **Transparent pointers**: `Box<T>`, `Rc<T>`, `Arc<T>`, borrowed references
  `&T`, and `Cow<'_, T>` with a sized inner peel transparently and preserve
  the bare field's column shape and dtype.

Common leaf types:

- **Primitives**: `String`, `&str`, `bool`, signed and unsigned integer types
  including `i128`/`u128` and `isize`/`usize`, `std::num::NonZero*` integer
  types, `f32`, and `f64`.
- **Time**: `chrono::DateTime<Tz>` and `chrono::NaiveDateTime` encode as
  `Datetime(Milliseconds, None)` by default; use
  `#[df_derive(time_unit = "ms" | "us" | "ns")]` to override.
  `DateTime<Tz>` values are encoded as UTC instants, so use
  `#[df_derive(as_string)]` if the textual timezone or offset matters.
- **Date and time-of-day**: `chrono::NaiveDate` encodes as `Date`, and
  `chrono::NaiveTime` encodes as `Time`. These encodings are fixed and do not
  accept `time_unit`.
- **Duration**: `std::time::Duration`, `core::time::Duration`, and
  `chrono::Duration` encode as `Duration(Nanoseconds)` by default; use
  `time_unit` to choose milliseconds, microseconds, or nanoseconds. Bare
  `Duration` is rejected as ambiguous.
- **Decimal**: bare `Decimal` and `rust_decimal::Decimal` encode as
  `Decimal(38, 10)` by default. Custom decimal backends opt in with
  `#[df_derive(decimal(precision = N, scale = S))]`.
- **Binary blobs**: `#[df_derive(as_binary)]` opts `Vec<u8>`, `&[u8]`, or
  `Cow<'_, [u8]>` shapes into Polars `Binary`; unannotated `Vec<u8>` remains
  `List(UInt8)`.

## Dtype Support Matrix

The default `df-derive` facade and `df-derive-core` runtime enable the Polars
features in this table on their `polars` dependency. If you use
`df-derive-macros` with a custom runtime and no `df-derive-core` dependency,
enable the matching features on that runtime's direct `polars` dependency.

| Rust leaf family | Polars dtype emitted | Polars feature for custom runtimes |
| --- | --- | --- |
| `bool` | `Boolean` | none |
| `String`, `&str`, `as_str`, `as_string` | `String` | none |
| `i8`, `NonZeroI8` | `Int8` | `dtype-i8` |
| `i16`, `NonZeroI16` | `Int16` | `dtype-i16` |
| `i32`, `i64`, `isize`, matching `NonZero*` | `Int32` / `Int64` | none |
| `i128`, `NonZeroI128` | `Int128` | `dtype-i128` |
| `u8`, `NonZeroU8` | `UInt8` | `dtype-u8` |
| `u16`, `NonZeroU16` | `UInt16` | `dtype-u16` |
| `u32`, `u64`, `usize`, matching `NonZero*` | `UInt32` / `UInt64` | none |
| `u128`, `NonZeroU128` | `UInt128` | `dtype-u128` |
| `f32`, `f64` | `Float32` / `Float64` | none |
| `chrono::DateTime<Tz>`, `chrono::NaiveDateTime` | `Datetime` | `dtype-datetime`, plus `timezones` for timezone-aware values |
| `chrono::NaiveDate` | `Date` | `dtype-date` |
| `chrono::NaiveTime` | `Time` | `dtype-time` |
| `std::time::Duration`, `core::time::Duration`, `chrono::Duration` | `Duration` | `dtype-duration` |
| `Decimal`, `rust_decimal::Decimal`, custom decimal backends | `Decimal` | `dtype-decimal` |
| `#[df_derive(as_binary)]` byte buffers | `Binary` | none |

`Option<T>`, `Vec<T>`, tuples, and nested structs preserve the leaf dtype;
each `Vec` layer wraps the leaf in `List(...)`.

For Polars 0.53, `dtype-decimal` enables the decimal column machinery and its
internal `Int128` backing path. You only need an explicit `dtype-i128` feature
when your derived structs expose `i128` / `NonZeroI128` fields as `Int128`
columns.

Useful field attributes:

- `#[df_derive(skip)]`: omit a field from generated schema and DataFrame output.
- `#[df_derive(as_string)]`: format values with `Display` into a string column using a reused scratch buffer.
- `#[df_derive(as_str)]`: borrow via `AsRef<str>` without `Display` formatting or an intermediate scratch buffer.
- `#[df_derive(as_binary)]`: encode byte-buffer shapes as Binary.
- `#[df_derive(decimal(precision = N, scale = S))]`: choose a decimal dtype or opt a custom decimal backend into `Decimal128Encode`.
- `#[df_derive(time_unit = "ms" | "us" | "ns")]`: choose datetime or duration units.

`skip` is useful for caches, source metadata, handles, or unsupported helper
fields that should remain on the Rust struct but not become DataFrame columns.
It is mutually exclusive with conversion attributes because skipped fields are
not analyzed or emitted. Tuple struct fields can be skipped too; remaining
tuple columns keep their original `field_{index}` names.

`as_string` is useful for enums or validated newtypes that should appear as
string columns. It formats each value into a reusable `String` scratch buffer
before pushing the resulting `&str` into the column builder; the builder still
copies bytes into the output column, and the scratch can grow to fit the
largest formatted value. If a field already implements `AsRef<str>`, prefer
`as_str`: it borrows through the same columnar buffer used for bare
`String`/`&str` fields and skips both `Display` formatting and the scratch
buffer. The two attributes are mutually exclusive.

`as_binary` accepts `Vec<u8>`, `Option<Vec<u8>>`, `Vec<Vec<u8>>`,
`Vec<Option<Vec<u8>>>`, `Option<Vec<Vec<u8>>>`, and the same shapes over
`&[u8]` and `Cow<'_, [u8]>`. Bare `u8`, `Option<u8>`,
`Vec<Option<u8>>`, non-`u8` leaves, and `String` are rejected. The binary
attribute is mutually exclusive with `as_str`, `as_string`, `decimal(...)`,
and `time_unit`.

Enums and unions are not supported as derive targets; use `as_string` or
`as_str` on enum fields. Direct fields of type `()` are rejected, but `()` is
supported as a generic payload and contributes zero columns.

Tuple fields cannot carry field-level conversion attributes such as `as_str`,
`as_binary`, `decimal(...)`, or `time_unit`; hoist that value into a named
struct when you need an attributed field. Nested tuples inside an outer
`Option` or `Vec` are rejected for now; use a named struct for those shapes.

## Column Naming

- Named struct fields use the Rust field name, such as `symbol`.
- Nested structs use dot notation recursively, such as `address.city`.
- `Vec<Nested>` fields use the outer field plus nested field name, such as
  `quotes.close`.
- Tuple-typed fields use `field.field_0`, `field.field_1`, and recurse for
  unwrapped nested tuples.
- Tuple structs use `field_0`, `field_1`, and so on.

## Limitations And Guidance

- Maps such as `HashMap<_, _>` and `BTreeMap<_, _>` are not supported; use
  `Vec<(K, V)>` or a named row struct when you need a tabular representation.
- Sets such as `HashSet<_>` and `BTreeSet<_>` are not supported; use
  `Vec<T>` when you need a list representation.
- Sequence collections such as `VecDeque<T>` and `LinkedList<T>` are not
  supported; use `Vec<T>` instead.
- All nested custom structs must also derive `ToDataFrame`.
- Obvious direct self-recursive nested fields using `Self`, the bare deriving
  type name, `self::Type`, or `crate::Type` are rejected after transparent
  wrapper peeling, including shapes such as `Node`, `Box<Node>`,
  `Option<Box<Node>>`, and tuple fields containing the same. Use identifier
  fields or a separate flat representation for recursive data structures.
- Consecutive `Option` layers above a `Vec` collapse to one list-level
  validity bit, so `None` and `Some(None)` are indistinguishable in the
  resulting list column.
- Borrowed byte slices and `Cow<'_, [u8]>` require `#[df_derive(as_binary)]`;
  other borrowed slice forms are rejected. Use `Vec<T>` for list columns.

## Runtime Discovery And Overrides

Explicit container attributes always win:

```rust
#[derive(df_derive::ToDataFrame)]
#[df_derive(
    trait = "my_runtime::dataframe::ToDataFrame",
    columnar = "my_runtime::dataframe::Columnar",
    decimal128_encode = "my_runtime::dataframe::Decimal128Encode",
)]
struct Row {
    amount: MyDecimal,
}
```

If only `trait = "x::ToDataFrame"` is provided, the macro infers
`x::Columnar` and `x::Decimal128Encode` unless those paths are explicitly
overridden.

Explicit paths to the built-in facade/core runtimes,
`df_derive::dataframe::ToDataFrame` or
`df_derive_core::dataframe::ToDataFrame` (including dependency renames), still
use the default-runtime dependency roots from that same `dataframe` module's
hidden `__private` re-exports. They do not require a direct `polars-arrow`
dependency just because the trait path was written explicitly.

`columnar = "..."` must be paired with `trait = "..."`; a standalone
`Columnar` override would create mixed runtime impls that are incompatible
with both runtimes' `ToDataFrameVec` extension traits.
Explicit `trait` + `columnar` pairs also cannot mix the built-in
`df_derive`/`df_derive_core` dataframe runtime with a custom runtime. Use the
matching built-in `Columnar` path, omit `columnar` so it is inferred from the
built-in `trait`, or provide a fully custom pair.

Without overrides, the macro discovers a `dataframe` module in this order:

1. `df_derive::dataframe`
2. `df_derive_core::dataframe`
3. `paft_utils::dataframe`
4. `paft::dataframe`
5. `crate::core::dataframe`

Discovery uses `proc_macro_crate::crate_name`, so dependency renames are
respected. For example, a dependency declared as
`dfd = { package = "df-derive", version = "0.3" }` is emitted as
`::dfd::dataframe`.

The final `crate::core::dataframe` fallback is for legacy/local runtimes in
crates that use `df-derive-macros` directly without `df-derive`,
`df-derive-core`, `paft-utils`, or `paft`. Any runtime reached by this default
discovery path must expose `dataframe::__private::{polars, polars_arrow}` for
generated-code dependency roots.

## Power-User Runtime Choices

Use the facade for the default runtime:

```rust
use df_derive::prelude::*;

#[derive(ToDataFrame)]
struct Row {
    id: u32,
}
```

Use the macro crate directly with the shared core runtime:

```toml
[dependencies]
df-derive-core = "0.3"
df-derive-macros = "0.3"
polars = "0.53"
```

```rust
use df_derive_core::dataframe::{ToDataFrame as _, ToDataFrameVec as _};
use df_derive_macros::ToDataFrame;

#[derive(ToDataFrame)]
struct Row {
    id: u32,
}
```

Use a custom runtime by providing compatible traits and overriding paths.
Outside the built-in facade/core paths described above, custom runtimes
selected with `#[df_derive(trait = "...")]` must name a compatible direct
`polars` dependency. They also need a compatible direct `polars-arrow`
dependency when the derived fields use shapes that require generated Arrow
array builders, such as list, nullable primitive, string, or binary columns.
Scalar-only numeric/bool derives do not need `polars-arrow`. The minimum trait
surface is:

```rust
mod runtime {
    pub mod dataframe {
        use polars::prelude::{DataFrame, DataType, PolarsResult};

        pub trait ToDataFrame {
            fn to_dataframe(&self) -> PolarsResult<DataFrame>;
            fn empty_dataframe() -> PolarsResult<DataFrame>;
            fn schema() -> PolarsResult<Vec<(String, DataType)>>;
        }

        pub trait Columnar: Sized {
            fn columnar_to_dataframe(items: &[Self]) -> PolarsResult<DataFrame> {
                let refs: Vec<&Self> = items.iter().collect();
                Self::columnar_from_refs(&refs)
            }

            fn columnar_from_refs(items: &[&Self]) -> PolarsResult<DataFrame>;
        }

        pub trait Decimal128Encode {
            fn try_to_i128_mantissa(&self, target_scale: u32) -> Option<i128>;
        }
    }
}
```

## Decimal Backends

`df-derive-core` provides `Decimal128Encode for rust_decimal::Decimal` behind
the `rust_decimal` feature, which is enabled by default on both `df-derive`
and `df-derive-core`.

To disable it:

```toml
df-derive = { version = "0.3", default-features = false }
```

Custom decimal backends should implement `Decimal128Encode` and use
`#[df_derive(decimal(precision = N, scale = S))]` on fields that should be
encoded as Polars decimal columns. Implementations must return an `i128`
mantissa rescaled to the requested scale, using round-half-to-even when
scaling down. Returning `None` surfaces as a Polars compute error. The
generated code verifies that the returned mantissa fits the declared precision
before constructing the Polars decimal column.

Unannotated decimal detection is syntax-based. A procedural macro receives
tokens, not rustc's resolved type information, so bare `Decimal` and canonical
`rust_decimal::Decimal` are treated as decimals automatically. Qualified paths
such as `domain::Decimal` are treated as nested custom structs unless you opt
them into decimal encoding with `decimal(...)`.

Temporal detection is syntax-based for the same reason. Bare or canonical
`chrono::NaiveDate`, `chrono::NaiveTime`, `chrono::NaiveDateTime`,
`chrono::DateTime<Tz>`, `chrono::Duration`, and `chrono::TimeDelta` are treated
as temporal types, along with `std::time::Duration` and
`core::time::Duration`. Qualified domain paths such as `domain::NaiveDate`
remain custom structs.

If your decimal trait lives somewhere other than the discovered runtime module,
point at it explicitly:

```rust
#[derive(df_derive::ToDataFrame)]
#[df_derive(
    trait = "my_runtime::dataframe::ToDataFrame",
    decimal128_encode = "my_runtime::decimal_backend::Decimal128Encode",
)]
struct Tx {
    #[df_derive(decimal(precision = 38, scale = 10))]
    amount: MyDecimal,
}
```

## Compatibility

- **Rust edition**: 2024
- **Minimum supported Rust version**: 1.90. This is above the edition's
  1.85 floor because the Polars 0.53 dependency graph uses language features
  that first compile on Rust 1.90.
- **Polars**: 0.53
- **polars-arrow**: 0.53 through the default runtime facade. Custom runtimes
  selected with explicit trait overrides need a compatible direct dependency
  only for derived field shapes that emit public Arrow array builders; explicit
  facade/core runtime paths keep using the hidden default-runtime re-export.
- **Polars feature flags**: the default `df-derive` facade and
  `df-derive-core` runtime enable every Polars dtype flag required by the
  support matrix above. If you use `df-derive-macros` with a custom runtime
  and no `df-derive-core` dependency, enable the matching Polars feature
  flags on that runtime's `polars` dependency.

## Performance Notes

Using `df_derive::dataframe::Columnar` instead of `paft::dataframe::Columnar`
has no inherent runtime performance penalty. The macro generates the hot
column-building code at the impl site either way; the runtime path only
selects which trait receives the impl.

The generated `columnar_to_dataframe(&[Self])` path avoids the old top-level
`Vec<&Self>` allocation. Nested and generic emitters still use
`columnar_from_refs(&[&Self])` so borrowed composition remains clone-free.

The generated hot path is shape-dependent. Primitive scalar fields are
populated in one row loop. Nested fields collect references and call the
nested type's columnar implementation, so each nested field may add a scan
over the outer items. Tuple-typed fields are emitted per projection path, so
tuple elements may each add their own scan; Vec-bearing tuple projections also
scan the outer items to build offsets, validity, and leaf buffers. This cost
model matters most for wide nested schemas and tuple-heavy shapes.

Criterion benches in `df-derive/benches/` cover wide rows, nested structs,
deep Vec shapes, decimals, strings, borrowed data, tuple fields, and targeted
tuple-heavy / nested-heavy cost-model shapes.

Performance is continuously monitored with
[Bencher](https://bencher.dev/perf/df-derive).

## Examples

Run any example with:

```sh
cargo run -p df-derive --example quickstart
cargo run -p df-derive --example <example_name>
```

Available examples:

- **`quickstart`**: basic usage with single values and slices.
- **`nested`**: nested structs flattened with dot notation.
- **`vec_custom`**: `Vec<T>` fields and custom nested structs as list columns.
- **`tuple`**: tuple structs and `field_0`/`field_1` naming.
- **`datetime_decimal`**: chrono datetime values and `rust_decimal::Decimal`.
- **`as_string`**: `#[df_derive(as_string)]` for enums and custom values.
- **`generics`**: generic structs, default type parameters, and `()` payloads.
- **`nested_options`**: nested optional structs.
- **`deep_vec`**: deep `Vec<Vec<Vec<T>>>` list nesting.
- **`multi_option_vec`**: multiple `Option` layers above a `Vec`.
- **`nested_generics`**: generic structs used as nested fields and list items.

## License

MIT. See `LICENSE`.