1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
/*!
This crate provides tools to deserialize structs from XML; most notably, it provides a [derive macro][derive@DeserializeXml] to automate that process (by implementing [`DeserializeXml`] for you).
**Note:** the implementation is highly limited and inelegant. I wrote this purely to help
power a feed reader I'm working on as a personal project; don't expect anything
"production-ready..." (See the [caveats](#caveats) below.)
# Examples
## Basic
Here's how you could use this crate to easily parse a very simple XML structure:
```
use deserialize_xml::DeserializeXml;
#[derive(Default, Debug, DeserializeXml)]
struct StringOnly {
title: String,
author: String,
}
let input = "<stringonly><title>Title</title><author>Author</author></stringonly>";
// `from_str` here was provided by `#[derive(DeserializeXml)]` above
let result = StringOnly::from_str(input).unwrap();
assert_eq!(result.title, "Title");
assert_eq!(result.author, "Author");
```
## Advanced
This example shows more advanced functionality:
```
use deserialize_xml::DeserializeXml;
#[derive(Default, Debug, DeserializeXml)]
// This attribute indicates we should parse this struct upon encountering an <item> tag
#[deserialize_xml(tag = "item")]
struct StringOnly {
title: String,
author: String,
}
#[derive(Default, Debug, DeserializeXml)]
struct Channel {
title: String,
// This allows us to use an idiomatic name for the
// struct member instead of the raw tag name
#[deserialize_xml(tag = "lastUpdated")]
last_updated: String,
ttl: u32,
// (unfortunately, we need to repeat `tag = "item"` here for now)
#[deserialize_xml(tag = "item")]
entries: Vec<StringOnly>,
}
let input = r#"<channel>
<title>test channel please ignore</title>
<lastUpdated>2022-09-22</lastUpdated>
<ttl>3600</ttl>
<item><title>Article 1</title><author>Guy</author></item>
<item><title>Article 2</title><author>Dudette</author></item>
</channel>"#;
let result = Channel::from_str(input).unwrap();
assert_eq!(result.title, "test channel please ignore");
assert_eq!(result.last_updated, "2022-09-22");
assert_eq!(result.ttl, 3600);
assert_eq!(result.entries.len(), 2);
assert_eq!(result.entries[0].title, "Article 1");
assert_eq!(result.entries[0].author, "Guy");
assert_eq!(result.entries[1].title, "Article 2");
assert_eq!(result.entries[1].author, "Dudette");
```
# Caveats
- The support for `Vec<T>`/`Option<T>` is _very_ limited at the moment. Namely, the macro performs a
_textual_ check to see if the member type is, e.g., `Vec<T>`; if so, it creates an empty vec and
pushes the results of [`DeserializeXml::from_reader`] for the inner type (`T`) when it encounters
the matching tag. Note the emphasis on _textual_ check: the macro will fail if you "spell" `Vec<T>`
differently (e.g., by aliasing it), or use your own container type. (The same limitations apply for
`Option<T>`.)
- The macro only supports structs.
- An implementation of [`DeserializeXml`] is provided for `String`s and numeric
types (i.e. `u8`, `i8`, ...). To add support for your own type, see [this
section](#implementing-deserializexml-for-your-own-struct).
- Struct fields of type `Option<T>`, where `T` is also a struct to which
`#[derive(DeserializeXml)]` has been applied, are seemingly skipped during parsing unless the `tag`
attribute is set correctly. (This might also arise in other edge cases, but this one is
instructive.) This is easiest to illustrate with an example:
```
use deserialize_xml::DeserializeXml;
#[derive(Default, Debug, DeserializeXml)]
struct Post {
title: String,
// The inner type has a weird name, but the generated parser uses the field name
// by default, so it will look for <attachment> tags--all good, or so you think...
attachment: Option<WeirdName>,
};
#[derive(Default, Debug, DeserializeXml)]
#[deserialize_xml(tag = "attachment")] // (*) - necessary!
struct WeirdName {
path: String,
mime_type: String,
}
let input = r#"<post>
<title>A Modest Proposal</title>
<attachment>
<path>./proposal_banner.jpg</path>
<mime_type>image/jpeg</mime_type>
</attachment>
</post>"#;
// So far, this looks like a very standard example...
let result = Post::from_str(input).unwrap();
assert_eq!(result.title, "A Modest Proposal");
// ..but without the line marked (*) above, result.attachment is None!
let attachment = result.attachment.unwrap();
assert_eq!(attachment.path, "./proposal_banner.jpg");
assert_eq!(attachment.mime_type, "image/jpeg");
```
Without line `(*)`, what goes wrong? [`Post::from_reader`][DeserializeXml::from_reader] (which is
called by [`Post::from_str`][DeserializeXml::from_str]) will look for `<attachment>` tags and
dutifully call [`WeirdName::from_reader`][DeserializeXml::from_reader] when it sees one. However,
[`WeirdName::from_reader`][DeserializeXml::from_reader] has no knowledge that someone else is
referring to it as `attachment`, so the body of that implementation assumes it should only parse
`<weirdname>` tags. Since it won't find any, we won't parse our `<attachment>`. By adding the
`#[deserialize_xml(tag = "attachment")]` attribute to `WeirdName`, we ensure that the implementation
of [`WeirdName::from_reader`][DeserializeXml::from_reader] instead looks for `<attachment>` tags,
not `<weirdname>` tags. Unfortunately, at the moment there is no convenient way to associate
`WeirdName` with multiple tags.
# Implementing `DeserializeXml` for your own struct
Of course, you can implement [`DeserializeXml`] yourself from scratch, but doing so tends to
involve frequently repeating some boilerplate XML parser manipulation code. Instead, see the
documentation and implementation of [`impl_deserialize_xml_helper`] for a more ergonomic way of
handling the common case.
*/
/// Derive macro to automatically implement [`DeserializeXml`] for structs.
///
/// See the [crate documentation][crate] for more information and examples.
pub use DeserializeXml;
pub use Read;
pub use Peekable;
pub use xml;
pub type Result<T> = Result;
/// Helper macro to minimize boilerplate in custom [`DeserializeXml`] implementations.
///
/// As a motivating example, consider the task of parsing the date from a tag of the form
/// `<date>1918-11-11T11:00:00+01:00</date>`. To do so, one could create a type and implement
/// [`DeserializeXml`] for it from scratch, but doing so involves dealing with some uninteresting
/// XML details (e.g., pop the start tag from the reader, ensure that the next tag is a [Characters
/// event](`xml::reader::XmlEvent::Characters`), extract the actual contents from that event,
/// etc.). Conceptually, one would rather ignore those complications and instead provide a
/// function that parses the string `1918-11-11T11:00:00+01:00` to the appropriate type. This macro
/// provides such an interface; it handles all necessary XML manipulation and calls the
/// user-provided logic to produce a value from the tag contents. The result is an implementation
/// of [`DeserializeXml`] for the specified type. This macro takes three arguments:
///
/// 1. `type`: the type for which [`DeserializeXml`] should be implemented.
///
/// 2. `tag_contents_ident`: the identifier to be used for the variable that represents the tag contents.
/// **Note:** this is only required due to Rust's hygiene requirement for macros; if in doubt, just
/// provide `tag_contents` for this argument.
///
/// 3. `body`: a block that produces a [`Result<type>`](crate::Result), where `type` is what was
/// provided as the first argument. A variable which holds the tag contents as a `String` is
/// available for use in this block; its name will be whatever value you provided for
/// `tag_contents_ident`. Note that a blanket error conversion implementation, `impl<T:
/// std::error::Error> From<T> for deserialize_xml::Error`, is provided, so in many cases
/// calling
/// the `?` operator on any possible intermediate errors will propagate them correctly.
///
/// ## Example
///
/// Here's an example of how we can use this macro to support parsing dates:
/// ```
/// use deserialize_xml::{DeserializeXml, impl_deserialize_xml_helper};
/// use chrono::prelude::*;
///
/// // See Caveats section for why this outer struct is necessary
/// #[derive(Default, Debug, DeserializeXml)]
/// #[deserialize_xml(tag="outer")]
/// struct CustomImplHelperOuter {
/// #[deserialize_xml(tag="inner")]
/// dt: CustomImplHelperInner,
/// }
///
/// #[derive(Default, Debug)]
/// struct CustomImplHelperInner(DateTime<Utc>);
///
/// impl_deserialize_xml_helper!(
/// CustomImplHelperInner, /* type */
/// tag_contents, /* tag_contents_ident */
/// { /* body */
/// // Note: variable `tag_contents` is available here because
/// // that is what was passed for the second argument
/// let dt = tag_contents.parse::<DateTime<Utc>>()?;
/// Ok(CustomImplHelperInner(dt))
/// // Notice that our logic was entirely XML-agnostic!
/// });
///
/// let str_input = "<outer><inner>1918-11-11T11:00:00+01:00</inner></outer>";
/// // CustomImplHelperOuter::from_str -> generated by derive macro; calls the below
/// // CustomImplHelperInner::from_reader -> generated by `impl_deserialize_xml_helper`
/// let result = CustomImplHelperOuter::from_str(str_input).unwrap();
/// assert_eq!(result.dt.0.year(), 1918);
/// assert_eq!(result.dt.0.month(), 11);
/// assert_eq!(result.dt.0.day(), 11);
/// assert_eq!(result.dt.0.hour(), 10);
/// ```
///
/// ## Caveats
///
/// - This macro assumes that the implementation it generates will be called from _within_ an
/// implementation of [`DeserializeXml`] generated by the [derive macro](derive@DeserializeXml)
/// also available from this crate. In other words, the implementation generated by
/// [`impl_deserialize_xml_helper`] can't handle parsing a complete XML document itself; it can
/// only parse the XML fragment associated with the type, and it depends on some other source
/// telling it when to start. This is a somewhat artificial constraint that could probably be
/// removed; however, my guess is that the common case is wanting to parse a large struct while
/// possibly providing custom parsers for some of those struct's fields, so I hope this won't be
/// too cumbersome in practice.
// Implementations for some fundamental types
impl_deserialize_xml_helper!;
// Code generating code generating code...
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;
generate_numeric_impl!;