serde_arrow/
lib.rs

1//! # `serde_arrow` - convert sequences Rust objects to / from arrow arrays
2//!
3//! The arrow in-memory format is a powerful way to work with data frame like structures. However,
4//! the API of the underlying Rust crates can be at times cumbersome to use due to the statically
5//! typed nature of Rust. `serde_arrow`, offers a simple way to convert Rust objects into Arrow
6//! arrays and back. `serde_arrow` relies on [Serde](https://serde.rs) to interpret Rust objects.
7//! Therefore, adding support for `serde_arrow` to custom types is as easy as using Serde's derive
8//! macros.
9//!
10//! `serde_arrow` mainly targets the [`arrow`](https://github.com/apache/arrow-rs) crate, but also
11//! supports the deprecated [`arrow2`](https://github.com/jorgecarleitao/arrow2) crate. The arrow
12//! implementations can be selected via [features](#features).
13//!
14//! `serde_arrow` relies on a schema to translate between Rust and Arrow as their type systems do
15//! not directly match. The schema is expressed as a collection of Arrow fields with additional
16//! metadata describing the arrays. E.g., to convert a vector of Rust strings representing
17//! timestamps to an arrow `Timestamp` array, the schema should contain a field with data type
18//! `Timestamp`. `serde_arrow` supports to derive the schema from the data or the Rust types
19//! themselves via schema tracing, but does not require it. It is always possible to specify the
20//! schema manually. See the [`schema` module][schema] and [`SchemaLike`][schema::SchemaLike] for
21//! further details.
22//!
23#![cfg_attr(
24    all(has_arrow, has_arrow2),
25    doc = r#"
26## Overview
27
28| Operation        | [`arrow-*`](#features)                                            | [`arrow2-*`](#features)                             | `marrow`                                            |
29|:-----------------|:------------------------------------------------------------------|:----------------------------------------------------|:----------------------------------------------------|
30| Rust to Arrow    | [`to_record_batch`], [`to_arrow`]                                 | [`to_arrow2`]                                       | [`to_marrow`]                                       |
31| Arrow to Rust    | [`from_record_batch`], [`from_arrow`]                             | [`from_arrow2`]                                     | [`from_marrow`]                                     |
32| [`ArrayBuilder`] | [`ArrayBuilder::from_arrow`]                                      | [`ArrayBuilder::from_arrow2`]                       | [`ArrayBuilder::from_marrow`]                       |
33| [`Serializer`]   | [`ArrayBuilder::from_arrow`] + [`Serializer::new`]                | [`ArrayBuilder::from_arrow2`] + [`Serializer::new`] | [`ArrayBuilder::from_marrow`] + [`Serializer::new`] |
34| [`Deserializer`] | [`Deserializer::from_record_batch`], [`Deserializer::from_arrow`] | [`Deserializer::from_arrow2`]                       | [`Deserializer::from_marrow`]                       |
35"#
36)]
37//!
38//! See also:
39//!
40//! - the [quickstart guide][_impl::docs::quickstart] for more examples of how to use this package
41//! - the [status summary][_impl::docs::status] for an overview over the supported Arrow and Rust
42//!   constructs
43//!
44//! ## Example
45//!
46//! ```rust
47//! # use serde::{Deserialize, Serialize};
48//! # #[cfg(has_arrow)]
49//! # fn main() -> serde_arrow::Result<()> {
50//! # use serde_arrow::_impl::arrow;
51//! use arrow::datatypes::FieldRef;
52//! use serde_arrow::schema::{SchemaLike, TracingOptions};
53//!
54//! ##[derive(Serialize, Deserialize)]
55//! struct Record {
56//!     a: f32,
57//!     b: i32,
58//! }
59//!
60//! let records = vec![
61//!     Record { a: 1.0, b: 1 },
62//!     Record { a: 2.0, b: 2 },
63//!     Record { a: 3.0, b: 3 },
64//! ];
65//!
66//! // Determine Arrow schema
67//! let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
68//!
69//! // Build the record batch
70//! let batch = serde_arrow::to_record_batch(&fields, &records)?;
71//! # Ok(())
72//! # }
73//! # #[cfg(not(has_arrow))]
74//! # fn main() { }
75//! ```
76//!
77//! The `RecordBatch` can then be written to disk, e.g., as parquet using the [`ArrowWriter`] from
78//! the [`parquet`] crate.
79//!
80//! [`ArrowWriter`]:
81//!     https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html
82//! [`parquet`]: https://docs.rs/parquet/latest/parquet/
83//!
84//! # Features:
85//!
86//! The version of `arrow` or `arrow2` used can be selected via features. Per default no arrow
87//! implementation is used. In that case only the base features of `serde_arrow` are available.
88//!
89//! The `arrow-*` and `arrow2-*` feature groups are compatible with each other. I.e., it is possible
90//! to use `arrow` and `arrow2` together. Within each group the highest version is selected, if
91//! multiple features are activated. E.g, when selecting  `arrow2-0-16` and `arrow2-0-17`,
92//! `arrow2=0.17` will be used.
93//!
94//! Note that because the highest version is selected, the features are not additive. In particular,
95//! it is not possible to use `serde_arrow::to_arrow` for multiple different `arrow` versions at the
96//! same time. See the next section for how to use `serde_arrow` in library code.
97//!
98//! Available features:
99//!
100//! | Arrow Feature | Arrow Version |
101//! |---------------|---------------|
102// arrow-version:insert: //! | `arrow-{version}`    | `arrow={version}`    |
103//! | `arrow-56`    | `arrow=56`    |
104//! | `arrow-55`    | `arrow=55`    |
105//! | `arrow-54`    | `arrow=54`    |
106//! | `arrow-53`    | `arrow=53`    |
107//! | `arrow-52`    | `arrow=52`    |
108//! | `arrow-51`    | `arrow=51`    |
109//! | `arrow-50`    | `arrow=50`    |
110//! | `arrow-49`    | `arrow=49`    |
111//! | `arrow-48`    | `arrow=48`    |
112//! | `arrow-47`    | `arrow=47`    |
113//! | `arrow-46`    | `arrow=46`    |
114//! | `arrow-45`    | `arrow=45`    |
115//! | `arrow-44`    | `arrow=44`    |
116//! | `arrow-43`    | `arrow=43`    |
117//! | `arrow-42`    | `arrow=42`    |
118//! | `arrow-41`    | `arrow=41`    |
119//! | `arrow-40`    | `arrow=40`    |
120//! | `arrow-39`    | `arrow=39`    |
121//! | `arrow-38`    | `arrow=38`    |
122//! | `arrow-37`    | `arrow=37`    |
123//! | `arrow2-0-17` | `arrow2=0.17` |
124//! | `arrow2-0-16` | `arrow2=0.16` |
125//!
126//! # Usage in  libraries
127//!
128//! In libraries, it is not recommended to use the `arrow` and `arrow2` functions directly. Rather
129//! it is recommended to rely on the [`marrow`] based functionality, as the features of [`marrow`]
130//! are designed to be strictly additive.
131//!
132//! For example to build a record batch, first build the corresponding marrow types and then use
133//! them to build the record batch:
134//!
135//! ```rust
136//! # use serde::{Deserialize, Serialize};
137//! # fn main() -> serde_arrow::Result<()> {
138//! # #[cfg(has_arrow)] {
139//! # use serde_arrow::_impl::arrow;
140//! # use std::sync::Arc;
141//! # use serde_arrow::schema::{SchemaLike, TracingOptions};
142//! #
143//! # #[derive(Serialize, Deserialize)]
144//! # struct Record {
145//! #     a: f32,
146//! #     b: i32,
147//! # }
148//! #
149//! # let records = vec![
150//! #     Record { a: 1.0, b: 1 },
151//! #     Record { a: 2.0, b: 2 },
152//! #     Record { a: 3.0, b: 3 },
153//! # ];
154//! #
155//! // Determine Arrow schema
156//! let fields = Vec::<marrow::datatypes::Field>::from_type::<Record>(TracingOptions::default())?;
157//!
158//! // Build the marrow arrays
159//! let arrays = serde_arrow::to_marrow(&fields, &records)?;
160//!
161//! // Build the record batch
162//! let arrow_fields = fields.iter()
163//!     .map(arrow::datatypes::Field::try_from)
164//!     .collect::<Result<Vec<_>, _>>()?;
165//!
166//! let arrow_arrays = arrays.into_iter()
167//!     .map(arrow::array::ArrayRef::try_from)
168//!     .collect::<Result<Vec<_>, _>>()?;
169//!
170//! let record_batch = arrow::array::RecordBatch::try_new(
171//!     Arc::new(arrow::datatypes::Schema::new(arrow_fields)),
172//!     arrow_arrays,
173//! );
174//! # }
175//! # Ok(())
176//! # }
177//! ```
178
179// be more forgiving without any active implementation
180#[cfg_attr(not(any(has_arrow, has_arrow2)), allow(unused))]
181mod internal;
182
183/// *Internal. Do not use*
184///
185/// This module is an internal implementation detail and not subject to any
186/// compatibility promises. It re-exports the  arrow impls selected via features
187/// to allow usage in doc tests or benchmarks.
188///
189#[rustfmt::skip]
190pub mod _impl {
191
192    #[cfg(has_arrow2_0_17)]
193    #[doc(hidden)]
194    pub use arrow2_0_17 as arrow2;
195
196    #[cfg(has_arrow2_0_16)]
197    pub use arrow2_0_16 as arrow2;
198
199    #[allow(unused)]
200    macro_rules! build_arrow_crate {
201        ($arrow_array:ident, $arrow_schema:ident) => {
202            /// A "fake" arrow crate re-exporting the relevant definitions of the
203            /// used arrow-* subcrates
204            #[doc(hidden)]
205            pub mod arrow {
206                /// The raw arrow packages
207                pub mod _raw {
208                    pub use {$arrow_array as array, $arrow_schema as schema};
209                }
210                pub mod array {
211                    pub use $arrow_array::{RecordBatch, array::{Array, ArrayRef}};
212                }
213                pub mod datatypes {
214                    pub use $arrow_schema::{DataType, Field, FieldRef, Schema, TimeUnit};
215                }
216                pub mod error {
217                    pub use $arrow_schema::ArrowError;
218                }
219            }
220        };
221    }
222
223    // arrow-version:insert:     #[cfg(has_arrow_{version})] build_arrow_crate!(arrow_array_{version}, arrow_schema_{version});
224    #[cfg(has_arrow_56)] build_arrow_crate!(arrow_array_56, arrow_schema_56);
225    #[cfg(has_arrow_55)] build_arrow_crate!(arrow_array_55, arrow_schema_55);
226    #[cfg(has_arrow_54)] build_arrow_crate!(arrow_array_54, arrow_schema_54);
227    #[cfg(has_arrow_53)] build_arrow_crate!(arrow_array_53, arrow_schema_53);
228    #[cfg(has_arrow_52)] build_arrow_crate!(arrow_array_52, arrow_schema_52);
229    #[cfg(has_arrow_51)] build_arrow_crate!(arrow_array_51, arrow_schema_51);
230    #[cfg(has_arrow_50)] build_arrow_crate!(arrow_array_50, arrow_schema_50);
231    #[cfg(has_arrow_49)] build_arrow_crate!(arrow_array_49, arrow_schema_49);
232    #[cfg(has_arrow_48)] build_arrow_crate!(arrow_array_48, arrow_schema_48);
233    #[cfg(has_arrow_47)] build_arrow_crate!(arrow_array_47, arrow_schema_47);
234    #[cfg(has_arrow_46)] build_arrow_crate!(arrow_array_46, arrow_schema_46);
235    #[cfg(has_arrow_45)] build_arrow_crate!(arrow_array_45, arrow_schema_45);
236    #[cfg(has_arrow_44)] build_arrow_crate!(arrow_array_44, arrow_schema_44);
237    #[cfg(has_arrow_43)] build_arrow_crate!(arrow_array_43, arrow_schema_43);
238    #[cfg(has_arrow_42)] build_arrow_crate!(arrow_array_42, arrow_schema_42);
239    #[cfg(has_arrow_41)] build_arrow_crate!(arrow_array_41, arrow_schema_41);
240    #[cfg(has_arrow_40)] build_arrow_crate!(arrow_array_40, arrow_schema_40);
241    #[cfg(has_arrow_39)] build_arrow_crate!(arrow_array_39, arrow_schema_39);
242    #[cfg(has_arrow_38)] build_arrow_crate!(arrow_array_38, arrow_schema_38);
243    #[cfg(has_arrow_37)] build_arrow_crate!(arrow_array_37, arrow_schema_37);
244
245    /// Documentation
246    pub mod docs {
247        #[doc(hidden)]
248        pub mod defs;
249
250        pub mod quickstart;
251
252        #[doc = include_str!("../Status.md")]
253        #[cfg(not(doctest))]
254        pub mod status {}
255    }
256
257    // Reexport for tests
258    #[doc(hidden)]
259    pub use crate::internal::{
260        error::{PanicOnError, PanicOnErrorError},
261        serialization::array_builder::ArrayBuilder,
262    };
263}
264
265#[cfg(all(test, has_arrow, has_arrow2))]
266mod test_with_arrow;
267
268#[cfg(test)]
269mod test;
270
271pub use crate::internal::error::{Error, Result};
272
273pub use crate::internal::deserializer::Deserializer;
274pub use crate::internal::serializer::Serializer;
275
276pub use crate::internal::array_builder::ArrayBuilder;
277
278#[cfg(has_arrow)]
279mod arrow_impl;
280
281#[cfg(has_arrow)]
282pub use arrow_impl::{from_arrow, from_record_batch, to_arrow, to_record_batch};
283
284#[cfg(has_arrow2)]
285mod arrow2_impl;
286
287#[cfg(has_arrow2)]
288pub use arrow2_impl::{from_arrow2, to_arrow2};
289
290#[deny(missing_docs)]
291mod marrow_impl;
292
293pub use marrow_impl::{from_marrow, to_marrow};
294
295#[deny(missing_docs)]
296/// Helpers that may be useful when using `serde_arrow`
297pub mod utils {
298    pub use crate::internal::utils::{Item, Items};
299}
300
301#[deny(missing_docs)]
302/// Deserialization of items
303pub mod deserializer {
304    pub use crate::internal::deserializer::{DeserializerItem, DeserializerIterator};
305}
306
307/// The mapping between Rust and Arrow types
308///
309/// To convert between Rust objects and Arrow types, `serde_arrows` requires
310/// schema information as a list of Arrow fields with additional metadata. See
311/// [`SchemaLike`][crate::schema::SchemaLike] for details on how to specify the
312/// schema.
313///
314/// The default mapping of Rust types to [Arrow types][arrow-types] is as
315/// follows:
316///
317/// [arrow-types]:
318///     https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html
319///
320/// - Unit (`()`): `Null`
321/// - Booleans (`bool`): `Boolean`
322/// - Integers (`u8`, .., `u64`, `i8`, .., `i64`): `UInt8`, .., `Uint64`,
323///   `Int8`, .. `UInt64`
324/// - Floats (`f32`, `f64`): `Float32`, `Float64`
325/// - Strings (`str`, `String`, ..): `LargeUtf8` with i64 offsets
326/// - Sequences: `LargeList` with i64 offsets
327/// - Structs / Map / Tuples: `Struct` type
328/// - Enums: dense Unions. Each variant is mapped to a separate field. Its type
329///   depends on the union type: Field-less variants are mapped to `NULL`. New
330///   type variants are mapped according to their inner type. Other variant
331///   types are mapped to struct types.
332#[deny(missing_docs)]
333pub mod schema {
334    pub use crate::internal::schema::{
335        Overwrites, SchemaLike, SerdeArrowSchema, Strategy, TracingOptions, STRATEGY_KEY,
336    };
337
338    /// Support for [canonical extension types][ext-docs]. This module is experimental without semver guarantees.
339    ///
340    /// [ext-docs]: https://arrow.apache.org/docs/format/CanonicalExtensions.html
341    pub mod ext {
342        pub use crate::internal::schema::extensions::{
343            Bool8Field, FixedShapeTensorField, VariableShapeTensorField,
344        };
345    }
346}
347
348/// Re-export of the used marrow version
349pub use marrow;