serde_arrow/lib.rs
1//! # `serde_arrow` - convert sequences Rust objects to / from arrow arrays
2//!
3//! The arrow in-memory format is a powerful way to work with data frame like structures. However,
4//! the API of the underlying Rust crates can be at times cumbersome to use due to the statically
5//! typed nature of Rust. `serde_arrow`, offers a simple way to convert Rust objects into Arrow
6//! arrays and back. `serde_arrow` relies on [Serde](https://serde.rs) to interpret Rust objects.
7//! Therefore, adding support for `serde_arrow` to custom types is as easy as using Serde's derive
8//! macros.
9//!
10//! `serde_arrow` mainly targets the [`arrow`](https://github.com/apache/arrow-rs) crate, but also
11//! supports the deprecated [`arrow2`](https://github.com/jorgecarleitao/arrow2) crate. The arrow
12//! implementations can be selected via [features](#features).
13//!
14//! `serde_arrow` relies on a schema to translate between Rust and Arrow as their type systems do
15//! not directly match. The schema is expressed as a collection of Arrow fields with additional
16//! metadata describing the arrays. E.g., to convert a vector of Rust strings representing
17//! timestamps to an arrow `Timestamp` array, the schema should contain a field with data type
18//! `Timestamp`. `serde_arrow` supports to derive the schema from the data or the Rust types
19//! themselves via schema tracing, but does not require it. It is always possible to specify the
20//! schema manually. See the [`schema` module][schema] and [`SchemaLike`][schema::SchemaLike] for
21//! further details.
22//!
23# | [`arrow2-*`](#features) | `marrow` |
29|:-----------------|:------------------------------------------------------------------|:----------------------------------------------------|:----------------------------------------------------|
30| Rust to Arrow | [`to_record_batch`], [`to_arrow`] | [`to_arrow2`] | [`to_marrow`] |
31| Arrow to Rust | [`from_record_batch`], [`from_arrow`] | [`from_arrow2`] | [`from_marrow`] |
32| [`ArrayBuilder`] | [`ArrayBuilder::from_arrow`] | [`ArrayBuilder::from_arrow2`] | [`ArrayBuilder::from_marrow`] |
33| [`Serializer`] | [`ArrayBuilder::from_arrow`] + [`Serializer::new`] | [`ArrayBuilder::from_arrow2`] + [`Serializer::new`] | [`ArrayBuilder::from_marrow`] + [`Serializer::new`] |
34| [`Deserializer`] | [`Deserializer::from_record_batch`], [`Deserializer::from_arrow`] | [`Deserializer::from_arrow2`] | [`Deserializer::from_marrow`] |
35"#
36)]
37//!
38//! See also:
39//!
40//! - the [quickstart guide][_impl::docs::quickstart] for more examples of how to use this package
41//! - the [status summary][_impl::docs::status] for an overview over the supported Arrow and Rust
42//! constructs
43//!
44//! ## Example
45//!
46//! ```rust
47//! # use serde::{Deserialize, Serialize};
48//! # #[cfg(has_arrow)]
49//! # fn main() -> serde_arrow::Result<()> {
50//! # use serde_arrow::_impl::arrow;
51//! use arrow::datatypes::FieldRef;
52//! use serde_arrow::schema::{SchemaLike, TracingOptions};
53//!
54//! ##[derive(Serialize, Deserialize)]
55//! struct Record {
56//! a: f32,
57//! b: i32,
58//! }
59//!
60//! let records = vec![
61//! Record { a: 1.0, b: 1 },
62//! Record { a: 2.0, b: 2 },
63//! Record { a: 3.0, b: 3 },
64//! ];
65//!
66//! // Determine Arrow schema
67//! let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
68//!
69//! // Build the record batch
70//! let batch = serde_arrow::to_record_batch(&fields, &records)?;
71//! # Ok(())
72//! # }
73//! # #[cfg(not(has_arrow))]
74//! # fn main() { }
75//! ```
76//!
77//! The `RecordBatch` can then be written to disk, e.g., as parquet using the [`ArrowWriter`] from
78//! the [`parquet`] crate.
79//!
80//! [`ArrowWriter`]:
81//! https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html
82//! [`parquet`]: https://docs.rs/parquet/latest/parquet/
83//!
84//! # Features:
85//!
86//! The version of `arrow` or `arrow2` used can be selected via features. Per default no arrow
87//! implementation is used. In that case only the base features of `serde_arrow` are available.
88//!
89//! The `arrow-*` and `arrow2-*` feature groups are compatible with each other. I.e., it is possible
90//! to use `arrow` and `arrow2` together. Within each group the highest version is selected, if
91//! multiple features are activated. E.g, when selecting `arrow2-0-16` and `arrow2-0-17`,
92//! `arrow2=0.17` will be used.
93//!
94//! Note that because the highest version is selected, the features are not additive. In particular,
95//! it is not possible to use `serde_arrow::to_arrow` for multiple different `arrow` versions at the
96//! same time. See the next section for how to use `serde_arrow` in library code.
97//!
98//! Available features:
99//!
100//! | Arrow Feature | Arrow Version |
101//! |---------------|---------------|
102// arrow-version:insert: //! | `arrow-{version}` | `arrow={version}` |
103//! | `arrow-56` | `arrow=56` |
104//! | `arrow-55` | `arrow=55` |
105//! | `arrow-54` | `arrow=54` |
106//! | `arrow-53` | `arrow=53` |
107//! | `arrow-52` | `arrow=52` |
108//! | `arrow-51` | `arrow=51` |
109//! | `arrow-50` | `arrow=50` |
110//! | `arrow-49` | `arrow=49` |
111//! | `arrow-48` | `arrow=48` |
112//! | `arrow-47` | `arrow=47` |
113//! | `arrow-46` | `arrow=46` |
114//! | `arrow-45` | `arrow=45` |
115//! | `arrow-44` | `arrow=44` |
116//! | `arrow-43` | `arrow=43` |
117//! | `arrow-42` | `arrow=42` |
118//! | `arrow-41` | `arrow=41` |
119//! | `arrow-40` | `arrow=40` |
120//! | `arrow-39` | `arrow=39` |
121//! | `arrow-38` | `arrow=38` |
122//! | `arrow-37` | `arrow=37` |
123//! | `arrow2-0-17` | `arrow2=0.17` |
124//! | `arrow2-0-16` | `arrow2=0.16` |
125//!
126//! # Usage in libraries
127//!
128//! In libraries, it is not recommended to use the `arrow` and `arrow2` functions directly. Rather
129//! it is recommended to rely on the [`marrow`] based functionality, as the features of [`marrow`]
130//! are designed to be strictly additive.
131//!
132//! For example to build a record batch, first build the corresponding marrow types and then use
133//! them to build the record batch:
134//!
135//! ```rust
136//! # use serde::{Deserialize, Serialize};
137//! # fn main() -> serde_arrow::Result<()> {
138//! # #[cfg(has_arrow)] {
139//! # use serde_arrow::_impl::arrow;
140//! # use std::sync::Arc;
141//! # use serde_arrow::schema::{SchemaLike, TracingOptions};
142//! #
143//! # #[derive(Serialize, Deserialize)]
144//! # struct Record {
145//! # a: f32,
146//! # b: i32,
147//! # }
148//! #
149//! # let records = vec![
150//! # Record { a: 1.0, b: 1 },
151//! # Record { a: 2.0, b: 2 },
152//! # Record { a: 3.0, b: 3 },
153//! # ];
154//! #
155//! // Determine Arrow schema
156//! let fields = Vec::<marrow::datatypes::Field>::from_type::<Record>(TracingOptions::default())?;
157//!
158//! // Build the marrow arrays
159//! let arrays = serde_arrow::to_marrow(&fields, &records)?;
160//!
161//! // Build the record batch
162//! let arrow_fields = fields.iter()
163//! .map(arrow::datatypes::Field::try_from)
164//! .collect::<Result<Vec<_>, _>>()?;
165//!
166//! let arrow_arrays = arrays.into_iter()
167//! .map(arrow::array::ArrayRef::try_from)
168//! .collect::<Result<Vec<_>, _>>()?;
169//!
170//! let record_batch = arrow::array::RecordBatch::try_new(
171//! Arc::new(arrow::datatypes::Schema::new(arrow_fields)),
172//! arrow_arrays,
173//! );
174//! # }
175//! # Ok(())
176//! # }
177//! ```
178
179// be more forgiving without any active implementation
180#[cfg_attr(not(any(has_arrow, has_arrow2)), allow(unused))]
181mod internal;
182
183/// *Internal. Do not use*
184///
185/// This module is an internal implementation detail and not subject to any
186/// compatibility promises. It re-exports the arrow impls selected via features
187/// to allow usage in doc tests or benchmarks.
188///
189#[rustfmt::skip]
190pub mod _impl {
191
192 #[cfg(has_arrow2_0_17)]
193 #[doc(hidden)]
194 pub use arrow2_0_17 as arrow2;
195
196 #[cfg(has_arrow2_0_16)]
197 pub use arrow2_0_16 as arrow2;
198
199 #[allow(unused)]
200 macro_rules! build_arrow_crate {
201 ($arrow_array:ident, $arrow_schema:ident) => {
202 /// A "fake" arrow crate re-exporting the relevant definitions of the
203 /// used arrow-* subcrates
204 #[doc(hidden)]
205 pub mod arrow {
206 /// The raw arrow packages
207 pub mod _raw {
208 pub use {$arrow_array as array, $arrow_schema as schema};
209 }
210 pub mod array {
211 pub use $arrow_array::{RecordBatch, array::{Array, ArrayRef}};
212 }
213 pub mod datatypes {
214 pub use $arrow_schema::{DataType, Field, FieldRef, Schema, TimeUnit};
215 }
216 pub mod error {
217 pub use $arrow_schema::ArrowError;
218 }
219 }
220 };
221 }
222
223 // arrow-version:insert: #[cfg(has_arrow_{version})] build_arrow_crate!(arrow_array_{version}, arrow_schema_{version});
224 #[cfg(has_arrow_56)] build_arrow_crate!(arrow_array_56, arrow_schema_56);
225 #[cfg(has_arrow_55)] build_arrow_crate!(arrow_array_55, arrow_schema_55);
226 #[cfg(has_arrow_54)] build_arrow_crate!(arrow_array_54, arrow_schema_54);
227 #[cfg(has_arrow_53)] build_arrow_crate!(arrow_array_53, arrow_schema_53);
228 #[cfg(has_arrow_52)] build_arrow_crate!(arrow_array_52, arrow_schema_52);
229 #[cfg(has_arrow_51)] build_arrow_crate!(arrow_array_51, arrow_schema_51);
230 #[cfg(has_arrow_50)] build_arrow_crate!(arrow_array_50, arrow_schema_50);
231 #[cfg(has_arrow_49)] build_arrow_crate!(arrow_array_49, arrow_schema_49);
232 #[cfg(has_arrow_48)] build_arrow_crate!(arrow_array_48, arrow_schema_48);
233 #[cfg(has_arrow_47)] build_arrow_crate!(arrow_array_47, arrow_schema_47);
234 #[cfg(has_arrow_46)] build_arrow_crate!(arrow_array_46, arrow_schema_46);
235 #[cfg(has_arrow_45)] build_arrow_crate!(arrow_array_45, arrow_schema_45);
236 #[cfg(has_arrow_44)] build_arrow_crate!(arrow_array_44, arrow_schema_44);
237 #[cfg(has_arrow_43)] build_arrow_crate!(arrow_array_43, arrow_schema_43);
238 #[cfg(has_arrow_42)] build_arrow_crate!(arrow_array_42, arrow_schema_42);
239 #[cfg(has_arrow_41)] build_arrow_crate!(arrow_array_41, arrow_schema_41);
240 #[cfg(has_arrow_40)] build_arrow_crate!(arrow_array_40, arrow_schema_40);
241 #[cfg(has_arrow_39)] build_arrow_crate!(arrow_array_39, arrow_schema_39);
242 #[cfg(has_arrow_38)] build_arrow_crate!(arrow_array_38, arrow_schema_38);
243 #[cfg(has_arrow_37)] build_arrow_crate!(arrow_array_37, arrow_schema_37);
244
245 /// Documentation
246 pub mod docs {
247 #[doc(hidden)]
248 pub mod defs;
249
250 pub mod quickstart;
251
252 #[doc = include_str!("../Status.md")]
253 #[cfg(not(doctest))]
254 pub mod status {}
255 }
256
257 // Reexport for tests
258 #[doc(hidden)]
259 pub use crate::internal::{
260 error::{PanicOnError, PanicOnErrorError},
261 serialization::array_builder::ArrayBuilder,
262 };
263}
264
265#[cfg(all(test, has_arrow, has_arrow2))]
266mod test_with_arrow;
267
268#[cfg(test)]
269mod test;
270
271pub use crate::internal::error::{Error, Result};
272
273pub use crate::internal::deserializer::Deserializer;
274pub use crate::internal::serializer::Serializer;
275
276pub use crate::internal::array_builder::ArrayBuilder;
277
278#[cfg(has_arrow)]
279mod arrow_impl;
280
281#[cfg(has_arrow)]
282pub use arrow_impl::{from_arrow, from_record_batch, to_arrow, to_record_batch};
283
284#[cfg(has_arrow2)]
285mod arrow2_impl;
286
287#[cfg(has_arrow2)]
288pub use arrow2_impl::{from_arrow2, to_arrow2};
289
290#[deny(missing_docs)]
291mod marrow_impl;
292
293pub use marrow_impl::{from_marrow, to_marrow};
294
295#[deny(missing_docs)]
296/// Helpers that may be useful when using `serde_arrow`
297pub mod utils {
298 pub use crate::internal::utils::{Item, Items};
299}
300
301#[deny(missing_docs)]
302/// Deserialization of items
303pub mod deserializer {
304 pub use crate::internal::deserializer::{DeserializerItem, DeserializerIterator};
305}
306
307/// The mapping between Rust and Arrow types
308///
309/// To convert between Rust objects and Arrow types, `serde_arrows` requires
310/// schema information as a list of Arrow fields with additional metadata. See
311/// [`SchemaLike`][crate::schema::SchemaLike] for details on how to specify the
312/// schema.
313///
314/// The default mapping of Rust types to [Arrow types][arrow-types] is as
315/// follows:
316///
317/// [arrow-types]:
318/// https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html
319///
320/// - Unit (`()`): `Null`
321/// - Booleans (`bool`): `Boolean`
322/// - Integers (`u8`, .., `u64`, `i8`, .., `i64`): `UInt8`, .., `Uint64`,
323/// `Int8`, .. `UInt64`
324/// - Floats (`f32`, `f64`): `Float32`, `Float64`
325/// - Strings (`str`, `String`, ..): `LargeUtf8` with i64 offsets
326/// - Sequences: `LargeList` with i64 offsets
327/// - Structs / Map / Tuples: `Struct` type
328/// - Enums: dense Unions. Each variant is mapped to a separate field. Its type
329/// depends on the union type: Field-less variants are mapped to `NULL`. New
330/// type variants are mapped according to their inner type. Other variant
331/// types are mapped to struct types.
332#[deny(missing_docs)]
333pub mod schema {
334 pub use crate::internal::schema::{
335 Overwrites, SchemaLike, SerdeArrowSchema, Strategy, TracingOptions, STRATEGY_KEY,
336 };
337
338 /// Support for [canonical extension types][ext-docs]. This module is experimental without semver guarantees.
339 ///
340 /// [ext-docs]: https://arrow.apache.org/docs/format/CanonicalExtensions.html
341 pub mod ext {
342 pub use crate::internal::schema::extensions::{
343 Bool8Field, FixedShapeTensorField, VariableShapeTensorField,
344 };
345 }
346}
347
348/// Re-export of the used marrow version
349pub use marrow;