deserialize_xml 0.2.1

Facilitates parsing structs from XML, particularly via a derive macro to automate the implementation
Documentation
/*!
This crate provides tools to deserialize structs from XML; most notably, it provides a [derive macro][derive@DeserializeXml] to automate that process (by implementing [`DeserializeXml`] for you).

**Note:** the implementation is highly limited and inelegant. I wrote this purely to help
power a feed reader I'm working on as a personal project; don't expect anything
"production-ready..." (See the [caveats](#caveats) below.)

# Examples

## Basic

Here's how you could use this crate to easily parse a very simple XML structure:

```
use deserialize_xml::DeserializeXml;

#[derive(Default, Debug, DeserializeXml)]
struct StringOnly {
    title: String,
    author: String,
}

let input = "<stringonly><title>Title</title><author>Author</author></stringonly>";
// `from_str` here was provided by `#[derive(DeserializeXml)]` above
let result = StringOnly::from_str(input).unwrap();
assert_eq!(result.title, "Title");
assert_eq!(result.author, "Author");
```
## Advanced

This example shows more advanced functionality:

```
use deserialize_xml::DeserializeXml;

#[derive(Default, Debug, DeserializeXml)]
// This attribute indicates we should parse this struct upon encountering an <item> tag
#[deserialize_xml(tag = "item")]
struct StringOnly {
    title: String,
    author: String,
}

#[derive(Default, Debug, DeserializeXml)]
struct Channel {
    title: String,
    // This allows us to use an idiomatic name for the
    // struct member instead of the raw tag name
    #[deserialize_xml(tag = "lastUpdated")]
    last_updated: String,
    ttl: u32,
    // (unfortunately, we need to repeat `tag = "item"` here for now)
    #[deserialize_xml(tag = "item")]
    entries: Vec<StringOnly>,
}

let input = r#"<channel>
      <title>test channel please ignore</title>
      <lastUpdated>2022-09-22</lastUpdated>
      <ttl>3600</ttl>
      <item><title>Article 1</title><author>Guy</author></item>
      <item><title>Article 2</title><author>Dudette</author></item>
    </channel>"#;

let result = Channel::from_str(input).unwrap();
assert_eq!(result.title, "test channel please ignore");
assert_eq!(result.last_updated, "2022-09-22");
assert_eq!(result.ttl, 3600);
assert_eq!(result.entries.len(), 2);
assert_eq!(result.entries[0].title, "Article 1");
assert_eq!(result.entries[0].author, "Guy");
assert_eq!(result.entries[1].title, "Article 2");
assert_eq!(result.entries[1].author, "Dudette");
```

# Caveats

- The support for `Vec<T>`/`Option<T>` is _very_ limited at the moment. Namely, the macro performs a
_textual_ check to see if the member type is, e.g., `Vec<T>`; if so, it creates an empty vec and
pushes the results of [`DeserializeXml::from_reader`] for the inner type (`T`) when it encounters
the matching tag. Note the emphasis on _textual_ check: the macro will fail if you "spell" `Vec<T>`
differently (e.g., by aliasing it), or use your own container type. (The same limitations apply for
`Option<T>`.)

- The macro only supports structs.

- An implementation of [`DeserializeXml`] is provided for `String`s and numeric
types (i.e. `u8`, `i8`, ...). To add support for your own type, see [this
section](#implementing-deserializexml-for-your-own-struct).

- Struct fields of type `Option<T>`, where `T` is also a struct to which
`#[derive(DeserializeXml)]` has been applied, are seemingly skipped during parsing unless the `tag`
attribute is set correctly. (This might also arise in other edge cases, but this one is
instructive.) This is easiest to illustrate with an example:

```
use deserialize_xml::DeserializeXml;

#[derive(Default, Debug, DeserializeXml)]
struct Post {
    title: String,
    // The inner type has a weird name, but the generated parser uses the field name
    // by default, so it will look for <attachment> tags--all good, or so you think...
    attachment: Option<WeirdName>,
};

#[derive(Default, Debug, DeserializeXml)]
#[deserialize_xml(tag = "attachment")] // (*) - necessary!
struct WeirdName {
    path: String,
    mime_type: String,
}

let input = r#"<post>
      <title>A Modest Proposal</title>
      <attachment>
        <path>./proposal_banner.jpg</path>
        <mime_type>image/jpeg</mime_type>
      </attachment>
    </post>"#;

// So far, this looks like a very standard example...
let result = Post::from_str(input).unwrap();
assert_eq!(result.title, "A Modest Proposal");
// ..but without the line marked (*) above, result.attachment is None!
let attachment = result.attachment.unwrap();
assert_eq!(attachment.path, "./proposal_banner.jpg");
assert_eq!(attachment.mime_type, "image/jpeg");
```

Without line `(*)`, what goes wrong? [`Post::from_reader`][DeserializeXml::from_reader] (which is
called by [`Post::from_str`][DeserializeXml::from_str]) will look for `<attachment>` tags and
dutifully call [`WeirdName::from_reader`][DeserializeXml::from_reader] when it sees one. However,
[`WeirdName::from_reader`][DeserializeXml::from_reader] has no knowledge that someone else is
referring to it as `attachment`, so the body of that implementation assumes it should only parse
`<weirdname>` tags. Since it won't find any, we won't parse our `<attachment>`.  By adding the
`#[deserialize_xml(tag = "attachment")]` attribute to `WeirdName`, we ensure that the implementation
of [`WeirdName::from_reader`][DeserializeXml::from_reader] instead looks for `<attachment>` tags,
not `<weirdname>` tags. Unfortunately, at the moment there is no convenient way to associate
`WeirdName` with multiple tags.

# Implementing `DeserializeXml` for your own struct

Of course, you can implement [`DeserializeXml`] yourself from scratch, but doing so tends to
involve frequently repeating some boilerplate XML parser manipulation code. Instead, see the
documentation and implementation of [`impl_deserialize_xml_helper`] for a more ergonomic way of
handling the common case.
*/

/// Derive macro to automatically implement [`DeserializeXml`] for structs.
///
/// See the [crate documentation][crate] for more information and examples.
pub use ::deserialize_xml_derive::DeserializeXml;
#[doc(hidden)]
pub use ::std::io::Read;
#[doc(hidden)]
pub use ::std::iter::Peekable;
#[doc(hidden)]
pub use ::xml;

#[derive(Debug)]
pub struct Error {
    message: String,
}

pub type Result<T> = ::std::result::Result<T, Error>;

impl Error {
    pub fn new(message: String) -> Self {
        Error { message }
    }
}

impl ::std::fmt::Display for Error {
    fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result {
        write!(f, "error occurred while parsing XML: {}", &self.message)
    }
}

impl<T: std::error::Error> From<T> for Error {
    fn from(err: T) -> Self {
        Error {
            message: format!("error occurred while parsing XML: {}", err.to_string()),
        }
    }
}

pub trait DeserializeXml: Default {
    /// The beating heart of this trait. Implementations are expected to maintain the
    /// following invariant: when [`DeserializeXml::from_reader`] is called, the
    /// implementation will consume the next element from the reader (which will be a
    /// [`StartElement`](xml::reader::XmlEvent) event), as well as all elements up to and
    /// including the corresponding [`EndElement`](xml::reader::XmlEvent) event. The
    /// implementation should use the data from those events to construct and return an
    /// element of the type for which this trait is being implemented.
    ///
    /// Note: technically the invariant isn't true for the initial call, which might need
    /// to consume some introductory elements (e.g., `<xml>`) before parsing in
    /// earnest&mdash;that should be of no concern though, since the provided derive macro
    /// is the intended entry point, and it handles this already.
    fn from_reader<R: Read>(reader: &mut Peekable<::xml::reader::Events<R>>) -> Result<Self>;
    /// A convenience function; sets up a suitable reader and calls `from_reader`.
    fn from_str(s: &str) -> Result<Self> {
        let config = ::xml::reader::ParserConfig::new()
            .trim_whitespace(true)
            .cdata_to_characters(true);
        let mut reader = ::xml::reader::EventReader::new_with_config(s.as_bytes(), config)
            .into_iter()
            .peekable();
        Self::from_reader(&mut reader)
    }
}

/// Helper macro to minimize boilerplate in custom [`DeserializeXml`] implementations.
///
/// As a motivating example, consider the task of parsing the date from a tag of the form
/// `<date>1918-11-11T11:00:00+01:00</date>`. To do so, one could create a type and implement
/// [`DeserializeXml`] for it from scratch, but doing so involves dealing with some uninteresting
/// XML details (e.g., pop the start tag from the reader, ensure that the next tag is a [Characters
/// event](`xml::reader::XmlEvent::Characters`), extract the actual contents from that event,
/// etc.).  Conceptually, one would rather ignore those complications and instead provide a
/// function that parses the string `1918-11-11T11:00:00+01:00` to the appropriate type. This macro
/// provides such an interface; it handles all necessary XML manipulation and calls the
/// user-provided logic to produce a value from the tag contents. The result is an implementation
/// of [`DeserializeXml`] for the specified type. This macro takes three arguments:
///
/// 1. `type`: the type for which [`DeserializeXml`] should be implemented.
///
/// 2. `tag_contents_ident`: the identifier to be used for the variable that represents the tag contents.
///    **Note:** this is only required due to Rust's hygiene requirement for macros; if in doubt, just
///    provide `tag_contents` for this argument.
///
/// 3. `body`: a block that produces a [`Result<type>`](crate::Result), where `type` is what was
///    provided as the first argument. A variable which holds the tag contents as a `String` is
///    available for use in this block; its name will be whatever value you provided for
///    `tag_contents_ident`. Note that a blanket error conversion implementation, `impl<T:
///    std::error::Error> From<T> for deserialize_xml::Error`, is provided, so in many cases
///    calling
///    the `?` operator on any possible intermediate errors will propagate them correctly.
///
/// ## Example
///
/// Here's an example of how we can use this macro to support parsing dates:
/// ```
/// use deserialize_xml::{DeserializeXml, impl_deserialize_xml_helper};
/// use chrono::prelude::*;
///
/// // See Caveats section for why this outer struct is necessary
/// #[derive(Default, Debug, DeserializeXml)]
/// #[deserialize_xml(tag="outer")]
/// struct CustomImplHelperOuter {
///     #[deserialize_xml(tag="inner")]
///     dt: CustomImplHelperInner,
/// }
///
/// #[derive(Default, Debug)]
/// struct CustomImplHelperInner(DateTime<Utc>);
///
/// impl_deserialize_xml_helper!(
///     CustomImplHelperInner, /* type */
///     tag_contents,          /* tag_contents_ident */
///     {                      /* body */
///     // Note: variable `tag_contents` is available here because
///     // that is what was passed for the second argument
///     let dt = tag_contents.parse::<DateTime<Utc>>()?;
///     Ok(CustomImplHelperInner(dt))
///     // Notice that our logic was entirely XML-agnostic!
/// });
///
/// let str_input = "<outer><inner>1918-11-11T11:00:00+01:00</inner></outer>";
/// // CustomImplHelperOuter::from_str    -> generated by derive macro; calls the below
/// // CustomImplHelperInner::from_reader -> generated by `impl_deserialize_xml_helper`
/// let result = CustomImplHelperOuter::from_str(str_input).unwrap();
/// assert_eq!(result.dt.0.year(), 1918);
/// assert_eq!(result.dt.0.month(), 11);
/// assert_eq!(result.dt.0.day(), 11);
/// assert_eq!(result.dt.0.hour(), 10);
/// ```
///
/// ## Caveats
///
/// - This macro assumes that the implementation it generates will be called from _within_ an
/// implementation of [`DeserializeXml`] generated by the [derive macro](derive@DeserializeXml)
/// also available from this crate. In other words, the implementation generated by
/// [`impl_deserialize_xml_helper`] can't handle parsing a complete XML document itself; it can
/// only parse the XML fragment associated with the type, and it depends on some other source
/// telling it when to start. This is a somewhat artificial constraint that could probably be
/// removed; however, my guess is that the common case is wanting to parse a large struct while
/// possibly providing custom parsers for some of those struct's fields, so I hope this won't be
/// too cumbersome in practice.
#[macro_export]
macro_rules! impl_deserialize_xml_helper {

    ($type:ty, $tag_contents_ident:ident, $body:block) => {
        // My thanks to following sources for teaching me about $crate and making it possible to
        // use this macro _within_ this crate:
        // https://doc.rust-lang.org/1.5.0/book/macros.html#the-variable-crate
        // https://stackoverflow.com/questions/44950574/using-crate-in-rusts-procedural-macros
        impl DeserializeXml for $type {
            fn from_reader<R: $crate::Read>(reader: &mut $crate::Peekable<$crate::xml::reader::Events<R>>) -> $crate::Result<Self> {
                use $crate::xml::reader::XmlEvent::*;
                let unexpected_end_msg = "XML stream ended unexpectedly";
                let msg_prefix = format!("when parsing {}:", stringify!($type));

                // I think most of these cases should be impossible assuming the "top-level" caller
                // is one of the DeserializeXml methods from the derive-macro implementation, but
                // let's make sure we cover them just in case.
                match reader.next() {
                    Some(Ok(StartElement { .. })) => (),
                    // TODO: should we change this so that this implementation can parse a complete
                    // XML document by itself? (Currently it dies here on the StartDocument event.)
                    Some(Ok(event)) => {
                        let msg = format!("{} expected start tag, but saw {:?}", msg_prefix, event);
                        Err($crate::Error::new(msg))?
                    },
                    Some(Err(e)) => Err(e)?,
                    None => Err($crate::Error::new(unexpected_end_msg.to_string()))?,
                };

               let $tag_contents_ident = match reader.next() {
                    Some(Ok(Characters(s))) => s,
                    Some(Ok(event)) => {
                        let msg = format!(
                            "{} expected to see 'xml::reader::XmlEvent::Characters', but saw {:?}",
                            msg_prefix,
                            event
                        );
                        Err($crate::Error::new(msg))?
                    }
                    Some(Err(e)) => Err(e)?,
                    None => Err($crate::Error::new(unexpected_end_msg.to_string()))?,
                };

                let result: $crate::Result<Self> = $body;

                match reader.next() {
                    // The XML parsing library ensures that closing tags match starting tags for us
                    Some(Ok(EndElement { .. })) => result,
                    Some(Ok(event)) => {
                        let msg = format!("{} expected closing tag, but saw {:?}", msg_prefix, event);
                        Err($crate::Error::new(msg))?
                    },
                    Some(Err(e)) => Err(e)?,
                    None => Err($crate::Error::new(unexpected_end_msg.to_string()))?,
                }
            }
        }
    }
}

// Implementations for some fundamental types

impl_deserialize_xml_helper!(String, tag_contents, { Ok(tag_contents) });

// Code generating code generating code...
macro_rules! generate_numeric_impl {
    ($type:ty) => {
        impl_deserialize_xml_helper!($type, tag_contents, { Ok(tag_contents.parse::<$type>()?) });
    };
}

generate_numeric_impl!(u8);
generate_numeric_impl!(i8);
generate_numeric_impl!(u16);
generate_numeric_impl!(i16);
generate_numeric_impl!(u32);
generate_numeric_impl!(i32);
generate_numeric_impl!(u64);
generate_numeric_impl!(i64);
generate_numeric_impl!(u128);
generate_numeric_impl!(i128);
generate_numeric_impl!(usize);
generate_numeric_impl!(isize);