cbor_edn/
lib.rs

1//! # Tools for processing CBOR Diagnostic Notation (EDN)
2//!
3//! The parser used by this crate is a PEG (Parsing Expression Grammer) parser built from the ABNF
4//! used in the [EDN specification].
5//!
6//! The crate's main types represent not only the parsed items but also all the parts that have no
7//! bearing on the translation to CBOR (spaces, commas, comments) and
8//! choices that may or may not influence the CBOR (encoding indicators). This allows detailed
9//! manipulation (for example inside comments) and a delayed processing of application oriented
10//! literals.
11//!
12//! Parsed values are expected to round-trip to identical representations when serialized. Most
13//! manipulations of the values will ensure that their serialization output can also be
14//! round-tripped from the internal format to the EDN serialization and back into the internal
15//! format, but this can not be provided by all. (For example, removing all optional commas
16//! while retaining comments would make the previous distinction between whether a comment was
17//! before or after a comma indistinguishable).
18//!
19//! Correct parsing does not guarantee that the value can also be encoded into CBOR. While there
20//! are aspects that could be handled at parsing time and are not (eg. tag numbers exceeding the
21//! encodable number space), there are cases that can not be handled by a library without further
22//! context or privileges (eg. the e'' application oriented literal that needs application context,
23//! or the ref'' application oriented literal that defers to relative files, accessing which can
24//! involve file or network access). Consequentially, conversion to CBOR through the various
25//! `.to_cbor()` methods is inherently fallible.
26//!
27//! [EDN specification]: https://www.ietf.org/archive/id/draft-ietf-cbor-edn-literals-15.html
28//!
29//! ## Completeness
30//!
31//! Known limitations are:
32//!
33//! * Support for inspecting and constructing CBOR items is incomplete. The most common types can
34//!   be constructed; contructing or inspecting more exotic items is possible through parsing
35//!   hand-crafted EDN/CBOR and using the generated serializations, respectively.
36//!
37//! * Options for attaching comments and space are limited and immature:
38//!
39//!   * [`Item::with_comment()`] & [`StandaloneItem::set_comment`] can be used to add comments, but
40//!     mainly produce [top-level items][StandaloneItem]. Deeper items are not configurable that
41//!     way, as the comments don't live in the item but its container.
42//!
43//!   * Comments can be added to items through visitors such as [`Item::visit_map_elements`]; both
44//!     the success and the error path of a visiting function can set comments around a tag.
45//!
46//!   * Replacing an item with hand-crafted EDN (possibly from serialized item) is always an
47//!     option.
48//!
49//! * Indenting EDN works for the easy cases, but more exotic cases such as overflowing the limited
50//!   width, long keys, or hash comments, easily disrupt the visual result.
51//!
52//! ## Security
53//!
54//! This library does not access network or file system in any surprising ways and does not
55//! endanger memory safety on its own. The main threat in using it is not resource bound: even
56//! without packed CBOR, heavy nesting can easily overflow the stack, and the float conversions are
57//! costly in time. Unless resource usage per user is limited, it is recommended to limit untrusted
58//! user input to the length of repeated `{` characters that do not yet overflow the stack.
59//!
60//! The crate has not been audited internally or externally. As the
61//! [licenses](https://spdx.org/licenses/MIT.html)
62//! [state](https://spdx.org/licenses/Apache-2.0.html), the software is provided "as is".
63//!
64//! ## CLI application
65//!
66//! Some functionality is available through a binary included with this crate:
67//!
68//! <!-- See https://github.com/assert-rs/snapbox/issues/172 -->
69//! ```console
70//! $ echo "[1, 2, 'x', ip'2001:db1::/64']" | cbor-edn diag2diag
71//! [1, 2, 'x', ip'2001:db1::/64']
72//! ```
73#![forbid(unsafe_code)]
74
75use std::borrow::Cow;
76
77mod visitor;
78use visitor::{
79    ApplicationLiteralsVisitor, ArrayElementVisitor, MapElementVisitor, MapValueHandler,
80    ProcessResult, TagVisitor, Visitor,
81};
82
83pub mod application;
84pub mod error;
85mod float;
86mod space;
87use space::{Comment, SDetails, MS, MSC, S, SOC};
88mod number;
89use number::{Number, NumberParts, NumberValue, Sign};
90mod string;
91use string::{CborString, PreprocessedStringComponent, String1e};
92
93#[cfg(test)]
94mod tests;
95
96use error::*;
97
98const U8MAX: u64 = u8::MAX as _;
99const U16MAX: u64 = u16::MAX as _;
100const U32MAX: u64 = u32::MAX as _;
101
102/// A CBOR Item, including any space and comments surrounding it in a serialization.
103#[derive(Debug, Clone, PartialEq)]
104pub struct StandaloneItem<'a>(S<'a>, Item<'a>, S<'a>);
105
106/// A CBOR Item.
107///
108/// This type represents a CBOR item in EDN. By virtue of EDN's expressiveness, it is capable not
109/// only of expressing any well-formed CBOR, but also to preserve encoding details that are not
110/// preferred (eg. a small integer encoded in more bytes than necessary). Some transformations on
111/// the EDN may lose such details; components that perform a translation such as recoding `(_
112/// h'18', h'6402')` into `<<100, 2>>` have a choice to either not perform the translation or to
113/// discard some encoding details.
114#[derive(Debug, Clone, PartialEq)]
115pub struct Item<'a>(InnerItem<'a>);
116
117/// # Conversion between the in-memory format and serializations
118impl<'a> StandaloneItem<'a> {
119    /// Ingests CBOR Diagnostic Notation (EDN) representing a single CBOR item
120    ///
121    /// Note that this will only return syntactic errors. Content errors that make it impossible to
122    /// produce this as CBOR, such as non-matching encoding indicators or unknown application
123    /// oriented literals, are not reported.
124    pub fn parse(s: &'a str) -> Result<Self, ParseError> {
125        cbordiagnostic::one_item(s).map_err(ParseError)
126    }
127
128    /// Produce an EDN String from the item
129    pub fn serialize(&self) -> String {
130        Unparse::serialize(self)
131    }
132
133    /// Parse a complete CBOR item.
134    ///
135    /// Providing excessive data results in an error.
136    pub fn from_cbor(cbor: &[u8]) -> Result<Self, CborError> {
137        Ok(Self(S::default(), Item::from_cbor(cbor)?, S::default()))
138    }
139
140    /// Parse a complete CBOR item.
141    ///
142    /// Any remaining byts are returned as part of the result.
143    pub fn from_cbor_with_rest(cbor: &[u8]) -> Result<(Self, &[u8]), CborError> {
144        let (item, rest) = Item::from_cbor_with_rest(cbor)?;
145        Ok((Self(S::default(), item, S::default()), rest))
146    }
147
148    /// Encode into a binary CBOR representation
149    pub fn to_cbor(&self) -> Result<Vec<u8>, InconsistentEdn> {
150        Ok(Unparse::to_cbor(self)?.collect())
151    }
152}
153
154/// # Helpers for conversion between standalone and bare items
155impl<'a> StandaloneItem<'a> {
156    /// Discards the comments and space around the single item, returning only the item itself.
157    pub fn into_item(self) -> Item<'a> {
158        self.1
159    }
160
161    /// Accesses the single item.
162    pub fn item(&self) -> &Item<'a> {
163        &self.1
164    }
165
166    /// Mutably accesses the single item.
167    pub fn item_mut(&mut self) -> &mut Item<'a> {
168        &mut self.1
169    }
170
171    fn inner(&self) -> &InnerItem<'a> {
172        self.1.inner()
173    }
174
175    /// Clone the item, turning any [`Cow::Borrowed`] into owned versions, which can then satisfy
176    /// any lifetime.
177    pub fn cloned<'any>(&self) -> StandaloneItem<'any> {
178        StandaloneItem(self.0.cloned(), self.1.cloned(), self.2.cloned())
179    }
180}
181
182/// # Conversion between the in-memory format and serializations
183///
184/// Note that unlike [`StandaloneItem`], this does not provide EDN parsing: Any standalone EDN CBOR
185/// item may contain outer blank space or comments, which can only be represented in a
186/// [`StandaloneItem`].
187impl Item<'_> {
188    /// Produce an EDN String from the item
189    pub fn serialize(&self) -> String {
190        Unparse::serialize(self)
191    }
192
193    /// Parse a complete CBOR item.
194    ///
195    /// Providing excessive data results in an error.
196    pub fn from_cbor(cbor: &[u8]) -> Result<Self, CborError> {
197        match Self::from_cbor_with_rest(cbor) {
198            Ok((s, &[])) => Ok(s),
199            Ok(_) => Err(CborError("Data after item")),
200            Err(e) => Err(e),
201        }
202    }
203
204    /// Parse a complete CBOR item.
205    ///
206    /// Any remaining byts are returned as part of the result.
207    pub fn from_cbor_with_rest(cbor: &[u8]) -> Result<(Self, &[u8]), CborError> {
208        let (major, argument, spec, mut tail) = process_cbor_major_argument(cbor)?;
209
210        let mut s = match (major, argument, spec) {
211            (Major::Unsigned, Some(argument), spec) => Self::new_integer_decimal_with_spec(
212                argument,
213                spec.or_none_if_default_for_arg(argument),
214            ),
215            (Major::Negative, Some(argument), spec) => Self::new_integer_decimal_with_spec(
216                -1i128 - i128::from(argument),
217                spec.or_none_if_default_for_arg(argument),
218            ),
219            (Major::FloatSimple, Some(n @ 0..=19), Spec::S_i) => {
220                Simple::Numeric(Box::new(Self::new_integer_decimal(n).into())).into()
221            }
222            (Major::FloatSimple, Some(20), Spec::S_i) => Simple::False.into(),
223            (Major::FloatSimple, Some(21), Spec::S_i) => Simple::True.into(),
224            (Major::FloatSimple, Some(22), Spec::S_i) => Simple::Null.into(),
225            (Major::FloatSimple, Some(23), Spec::S_i) => Simple::Undefined.into(),
226            (Major::FloatSimple, Some(n @ 32..=255), Spec::S_0) => {
227                Simple::Numeric(Box::new(Self::new_integer_decimal(n).into())).into()
228            }
229            // 0..=31 in S_0 or 24..=31 in S_i
230            (Major::FloatSimple, _, Spec::S_i | Spec::S_0) => {
231                return Err(CborError("Invalid simple value"))
232            }
233            (Major::FloatSimple, Some(0x7c00), Spec::S_1) => {
234                Number(Cow::from("Infinity")).with_spec(Some(Spec::S_1))
235            }
236            (Major::FloatSimple, Some(0xfc00), Spec::S_1) => {
237                Number(Cow::from("-Infinity")).with_spec(Some(Spec::S_1))
238            }
239            (Major::FloatSimple, Some(0x7e00), Spec::S_1) => {
240                Number(Cow::from("NaN")).with_spec(Some(Spec::S_1))
241            }
242            (Major::FloatSimple, Some(n), Spec::S_1) => {
243                let f =
244                    float::f16_bits_to_f64(n.try_into().expect("Range limited by construction"));
245                Number::new_float(f).with_spec(Some(Spec::S_1))
246            }
247            (Major::FloatSimple, Some(n), Spec::S_2) => {
248                let n: u32 = n.try_into().expect("Range limited by construction");
249                let f = f64::from(f32::from_bits(n));
250                Number::new_float(f).with_spec(Some(Spec::S_2))
251            }
252            (Major::FloatSimple, Some(n), Spec::S_3) => {
253                let f = f64::from_bits(n);
254                Number::new_float(f).with_spec(Some(Spec::S_3))
255            }
256            (Major::FloatSimple, None, _ /* S_ not written for exhaustiveness */)
257            | (Major::FloatSimple, _ /* None not written for exhaustiveness */, Spec::S_) => {
258                return Err(CborError(
259                    "Break code only expected at end of indefinte length items",
260                ))
261            }
262            (Major::Tagged, Some(n), s) => {
263                // FIXME this is recursing on the stack rather than on the heap
264                let (item, new_tail) = StandaloneItem::from_cbor_with_rest(tail)?;
265                tail = new_tail;
266                item.tagged_with_spec(n, s.or_none_if_default_for_arg(n))
267            }
268            (Major::Unsigned | Major::Negative | Major::Tagged, None, _) => {
269                return Err(CborError(
270                    "Integer/Tag with indefinite length encoding is not well-formed",
271                ))
272            }
273            (Major::ByteString, Some(n), spec) => {
274                let data = n
275                    .try_into()
276                    .ok()
277                    .and_then(|n| tail.get(..n))
278                    .ok_or(CborError("Announced bytes unavailable"))?;
279                tail = &tail[data.len()..];
280                Self::new_bytes_hex_with_spec(data, spec.or_none_if_default_for_arg(n))
281            }
282            (Major::TextString, Some(n), spec) => {
283                let data = n
284                    .try_into()
285                    .ok()
286                    .and_then(|n| tail.get(..n))
287                    .ok_or(CborError("Announced bytes unavailable"))?;
288                let data = core::str::from_utf8(data)
289                    .map_err(|_| CborError("Text string must be valid UTF-8"))?;
290                tail = &tail[data.len()..];
291                Self::new_text_with_spec(data, spec.or_none_if_default_for_arg(n))
292            }
293            (
294                Major::ByteString | Major::TextString,
295                None,
296                _, /* S_ not written for exhaustiveness */
297            ) => {
298                let mut items = vec![];
299                while tail.first() != Some(&0xff) {
300                    let (inner_major, argument, spec, new_tail) =
301                        process_cbor_major_argument(tail)?;
302                    let Some(argument) = argument.and_then(|a| usize::try_from(a).ok()) else {
303                        return Err(CborError(
304                            "Indefinite length strings can only contain definite lengths and must fit in data",
305                        ));
306                    };
307                    if inner_major != major {
308                        return Err(CborError(
309                            "Indefinite length strings can only contain matching items",
310                        ));
311                    }
312                    if new_tail.len() < argument {
313                        return Err(CborError(
314                            "Announced bytes unavailable inside indefinite length byte string",
315                        ));
316                    }
317                    // with split_at_checked, we could combine the checkinto the split
318                    let (item_data, new_tail) = new_tail.split_at(argument);
319                    tail = new_tail;
320                    items.push(match major {
321                        Major::ByteString => {
322                            CborString::new_bytes_hex_with_spec(item_data, Some(spec))
323                        }
324                        Major::TextString => CborString::new_text_with_spec(
325                            core::str::from_utf8(item_data)
326                                .map_err(|_| CborError("Text string must be valid UTF-8"))?,
327                            Some(spec),
328                        ),
329                        _ => unreachable!(),
330                    });
331                }
332                if tail.is_empty() {
333                    return Err(CborError(
334                        "Indefinite length byte string terminated after item",
335                    ));
336                }
337                tail = &tail[1..];
338
339                let mut items = items.drain(..);
340                if let Some(first_item) = items.next() {
341                    InnerItem::StreamString(
342                        Default::default(),
343                        NonemptyMscVec::new(first_item, items),
344                    )
345                    .into()
346                } else {
347                    todo!()
348                }
349            }
350            (Major::Array, mut length, spec) => {
351                // FIXME this is recursing on the stack rather than on the heap
352                let mut items = vec![];
353                while length != Some(0) && tail.first() != Some(&0xff) {
354                    let (item, new_tail) = Self::from_cbor_with_rest(tail)?;
355                    items.push(item);
356                    tail = new_tail;
357                    if let Some(ref mut n) = &mut length {
358                        *n -= 1;
359                    }
360                }
361                if length.is_none() {
362                    if tail.is_empty() {
363                        return Err(CborError(
364                            "Indefinite length byte string terminated after item",
365                        ));
366                    }
367                    tail = &tail[1..];
368                }
369                let spec = match length {
370                    Some(l) => spec.or_none_if_default_for_arg(l),
371                    None => Some(spec), // which is always indefinite length
372                };
373                InnerItem::Array(SpecMscVec::new(spec, items.into_iter())).into()
374            }
375            (Major::Map, mut length, spec) => {
376                // FIXME this is recursing on the stack rather than on the heap
377                let mut items = vec![];
378                while length != Some(0) && tail.first() != Some(&0xff) {
379                    let (key, new_tail) = Self::from_cbor_with_rest(tail)?;
380                    tail = new_tail;
381                    let (value, new_tail) = Self::from_cbor_with_rest(tail)?;
382                    tail = new_tail;
383                    items.push(Kp::new(key, value));
384                    if let Some(ref mut n) = &mut length {
385                        *n -= 1;
386                    }
387                }
388                if length.is_none() {
389                    if tail.is_empty() {
390                        return Err(CborError(
391                            "Indefinite length byte string terminated after item",
392                        ));
393                    }
394                    tail = &tail[1..];
395                }
396                let spec = match length {
397                    Some(l) => spec.or_none_if_default_for_arg(l),
398                    None => Some(spec), // which is always indefinite length
399                };
400                InnerItem::Map(SpecMscVec::new(spec, items.into_iter())).into()
401            }
402        };
403
404        s.set_delimiters(DelimiterPolicy::SingleLineRegularSpacing);
405        Ok((s, tail))
406    }
407
408    fn visit(&mut self, visitor: &mut impl Visitor) -> ProcessResult {
409        let mut result = visitor.process(self);
410        if result.take_recurse() {
411            self.0.visit(visitor);
412        }
413        result
414    }
415
416    /// Clone the item, turning any [`Cow::Borrowed`] into owned versions, which can then satisfy
417    /// any lifetime.
418    pub fn cloned<'any>(&self) -> Item<'any> {
419        Item(self.0.cloned())
420    }
421}
422
423/// # Conversion between the in-memory format and serializations
424impl<'a> Item<'a> {
425    fn inner(&self) -> &InnerItem<'a> {
426        &self.0
427    }
428
429    fn inner_mut(&mut self) -> &mut InnerItem<'a> {
430        &mut self.0
431    }
432}
433
434/// # Creating items from data or by wrapping other items
435impl<'a> StandaloneItem<'a> {
436    fn tagged_with_spec(self, tag: u64, spec: Option<Spec>) -> Item<'a> {
437        InnerItem::Tagged(tag, spec, Box::new(self)).into()
438    }
439
440    /// Wrap the item into a CBOR tag.
441    pub fn tagged(self, tag: u64) -> Item<'a> {
442        InnerItem::Tagged(tag, None, Box::new(self)).into()
443    }
444}
445
446/// # Creating items from data or by wrapping other items
447impl<'a> Item<'a> {
448    fn new_integer_decimal_with_spec(value: impl Into<i128>, spec: Option<Spec>) -> Self {
449        Number(format!("{}", value.into()).into()).with_spec(spec)
450    }
451
452    /// Create a new item that is integer valued in CBOR and expressed in decimal in EDN.
453    ///
454    /// Note that while values exceeding i65 are accepted, they can not be encoded into CBOR.
455    pub fn new_integer_decimal(value: impl Into<i128>) -> Self {
456        Self::new_integer_decimal_with_spec(value, None)
457    }
458
459    /// Create a new item that is float valued in CBOR and expressed in decimal in EDN.
460    pub fn new_float_decimal(value: f64) -> Self {
461        Number::new_float(value).with_spec(None)
462    }
463
464    /// Create a new item that is integer valued in CBOR and expressed in hexadecimal in EDN.
465    ///
466    /// Negative values have not been implemented in this constructor.
467    pub fn new_integer_hex(value: impl Into<u64>) -> Self {
468        InnerItem::Number(Number(format!("0x{:x}", value.into()).into()), None).into()
469    }
470
471    fn new_bytes_hex_with_spec(value: &[u8], spec: Option<Spec>) -> Self {
472        InnerItem::String(CborString::new_bytes_hex_with_spec(value, spec)).into()
473    }
474
475    /// Create a new item that is a byte string in CBOR (identical to the passed in value) and
476    /// expressed as a `h'...'` string in EDN.
477    pub fn new_bytes_hex(value: &[u8]) -> Self {
478        Self::new_bytes_hex_with_spec(value, None)
479    }
480
481    fn new_text_with_spec(value: &str, spec: Option<Spec>) -> Self {
482        InnerItem::String(CborString::new_text_with_spec(value, spec)).into()
483    }
484
485    /// Create a new item that is a text string in CBOR (identical to the passed in value) and
486    /// expressed as a single double-quoted string in EDN.
487    ///
488    /// ```rust
489    /// # use cbor_edn::*;
490    /// assert_eq!(
491    ///     Item::new_text("Hello \"World\"\0").serialize(),
492    ///     r#""Hello \"World\"\u{0}""#,
493    /// );
494    /// ```
495    pub fn new_text(value: &str) -> Self {
496        Self::new_text_with_spec(value, None)
497    }
498
499    pub fn new_application_literal(identifier: &str, value: &str) -> Result<Self, InconsistentEdn> {
500        if cbordiagnostic::app_prefix(identifier).is_err() {
501            // FIXME bad error type
502            return Err(InconsistentEdn(
503                "Identifier is not a valid application string identifier",
504            ));
505        };
506        Ok(InnerItem::String(CborString::new_application_literal(identifier, value, None)).into())
507    }
508
509    /// Create a CBOR array out of the items
510    pub fn new_array(items: impl Iterator<Item = Item<'a>>) -> Self {
511        InnerItem::Array(SpecMscVec::new(None, items)).into()
512    }
513
514    /// Create a CBOR map out of the keys-value pairs
515    pub fn new_map(items: impl Iterator<Item = (Item<'a>, Item<'a>)>) -> Self {
516        InnerItem::Map(SpecMscVec::new(
517            None,
518            items.map(|(key, value)| Kp::new(key, value)),
519        ))
520        .into()
521    }
522
523    /// Wrap the item into a CBOR tag.
524    pub fn tagged(self, tag: u64) -> Item<'a> {
525        StandaloneItem::from(self).tagged(tag)
526    }
527}
528
529/// # Accessing and modifying an item in place
530impl StandaloneItem<'_> {
531    /// Replace any comment before the item with the new comment
532    pub fn with_comment(self, comment: &str) -> Self {
533        let wrapped_comment = if comment.contains('/') {
534            format!("# {}\n", comment.replace('\n', "\n# "))
535        } else {
536            format!("/ {} /", comment)
537        };
538        Self(S(wrapped_comment.into()), self.1, self.2)
539    }
540
541    /// Replace any comment before the item with the new comment
542    pub fn set_comment(&mut self, comment: &str) {
543        let wrapped_comment = if comment.contains('/') {
544            format!("# {}\n", comment.replace('\n', "\n# "))
545        } else {
546            format!("/ {} /", comment)
547        };
548        self.0 = S(wrapped_comment.into());
549    }
550
551    /// Alters how space and comments are placed inside the item.
552    ///
553    /// See the policy values for details.
554    pub fn set_delimiters(&mut self, policy: DelimiterPolicy) {
555        // On the top level, let's not add the leading \n, because that would cause an empty line
556        // above the sole element formatted like this.
557        self.0.set_delimiters(policy, false);
558        self.1.set_delimiters(policy);
559        self.2.set_delimiters(policy, false);
560    }
561
562    fn visit(&mut self, visitor: &mut impl Visitor) {
563        self.1
564            .visit(visitor)
565            .use_space_before(&mut self.0)
566            .use_space_after(&mut self.2)
567            .done();
568    }
569
570    /// For each item in the tree that is a single application literal, call a callback.
571    ///
572    /// This is primarily used to apply custom EDN filtering:
573    ///
574    /// ```rust
575    /// # use cbor_edn::*;
576    /// let mut full = StandaloneItem::parse("[0 /unmodified/, german'zweiundvierzig']").unwrap();
577    /// full.visit_application_literals(&mut |id, value: String, item: &mut cbor_edn::Item| {
578    ///     if id == "german" {
579    ///         let numeric = match value.as_str() {
580    ///             "dreiundzwanzig" => 23,
581    ///             "zweiundvierzig" => 42,
582    ///             _ => todo!(),
583    ///         };
584    ///         *item = Item::new_integer_decimal(numeric).into();
585    ///     }
586    ///     Ok(())
587    /// });
588    /// assert_eq!(full.serialize(), "[0 /unmodified/, 42]");
589    /// ```
590    pub fn visit_application_literals<F, RF>(&mut self, mut f: RF)
591    where
592        F: for<'b> FnMut(String, String, &mut Item<'b>) -> Result<(), String> + ?Sized,
593        RF: std::ops::DerefMut<Target = F>,
594    {
595        self.visit(&mut ApplicationLiteralsVisitor {
596            user_fn: f.deref_mut(),
597        });
598    }
599
600    /// For each item in the full tree (including embedded representations) that is tagged, call a
601    /// callback.
602    ///
603    /// Any error string is placed in a comment next to the item. The function should return Ok(())
604    /// on any tags it is not interested in visiting.
605    ///
606    /// This is primarily used to apply custom EDN application; see [application::dt_tag_to_aol] for an
607    /// example.
608    pub fn visit_tag<F, RF>(&mut self, mut f: RF)
609    where
610        F: for<'b> FnMut(u64, &mut Item<'b>) -> Result<(), String> + ?Sized,
611        RF: std::ops::DerefMut<Target = F>,
612    {
613        self.visit(&mut TagVisitor {
614            user_fn: f.deref_mut(),
615        });
616    }
617}
618
619/// # Accessing and modifying an item in place
620impl<'a> Item<'a> {
621    /// Access application-extension identifier and string value
622    ///
623    /// This only succeeds if the item is expressed using a single application oriented literal.
624    pub fn get_application_literal(&self) -> Result<(String, String), TypeMismatch> {
625        let InnerItem::String(CborString { ref items, .. }) = self.inner() else {
626            return Err(TypeMismatch::expecting("application-oriented literal"));
627        };
628        let [chunk] = items.as_slice() else {
629            return Err(TypeMismatch::expecting(
630                "single application-oriented literal",
631            ));
632        };
633        let PreprocessedStringComponent::AppString(identifier, value) = chunk
634            .preprocess()
635            // The only reason this would err is if there is embedded CBOR in there, and then
636            // that'd just mean it's not what we requested
637            .map_err(|_| TypeMismatch::expecting("application-oriented literal"))?
638        else {
639            return Err(TypeMismatch::expecting("application-oriented literal"));
640        };
641
642        Ok((identifier, value))
643    }
644
645    /// Access a byte literal value
646    ///
647    /// This only succeeds if the item is a single byte string on the CBOR level, no matter how
648    /// many EDN concatenations or even chunks. The EDN standard byte encodings (hex, base64 etc.)
649    /// are supported, other application-oriented literals need to be resolved first.
650    pub fn get_bytes(&self) -> Result<Vec<u8>, TypeMismatch> {
651        let mut result = vec![];
652
653        let mut append_items = |items: &Vec<String1e>| -> Result<(), TypeMismatch> {
654            for item in items {
655                if item
656                    .encoded_major_type()
657                    .map_err(|_| TypeMismatch::expecting("encodable item"))?
658                    != Major::ByteString
659                {
660                    return Err(TypeMismatch::expecting("byte literal"));
661                }
662                result.extend(
663                    item.bytes_value()
664                        .map_err(|_| TypeMismatch::expecting("byte literal or compatible"))?,
665                );
666            }
667            Ok(())
668        };
669
670        match self.inner() {
671            InnerItem::String(CborString { ref items, .. }) => append_items(items)?,
672            InnerItem::StreamString(_, ref chunks) => {
673                for CborString { ref items, .. } in chunks.iter() {
674                    append_items(items)?;
675                }
676            }
677            _ => return Err(TypeMismatch::expecting("byte literal")),
678        }
679
680        Ok(result)
681    }
682
683    /// Accesses a string literal value.
684    ///
685    /// This only succeeds if the item is a single text string on the CBOR level, no matter how
686    /// many EDN concatenations or even chunks. The EDN standard byte encodings (hex, base64 etc.)
687    /// are tolerated in subsequent items as required for expressing otherwise hard to read parts.
688    ///
689    /// ```
690    /// let item = cbor_edn::StandaloneItem::parse(
691    ///     r#" (_ "hello" h'20' "world" ) "#
692    /// ).unwrap();
693    /// let item = item.item();
694    /// assert_eq!("hello world", &item.get_string().unwrap());
695    /// ```
696    pub fn get_string(&self) -> Result<String, TypeMismatch> {
697        let mut result = vec![];
698
699        let mut append_items = |items: &Vec<String1e>| -> Result<(), TypeMismatch> {
700            for item in items {
701                result.extend(
702                    item.bytes_value()
703                        .map_err(|_| TypeMismatch::expecting("text literal or compatible"))?,
704                );
705            }
706            Ok(())
707        };
708
709        // Just checking the first item because they can not be mixed "except that byte string
710        // literal notation can be used inside a sequence of concatenated text string notation
711        // literals"
712        let check_first = |item: &String1e<'_>| -> Result<(), TypeMismatch> {
713            if item
714                .encoded_major_type()
715                .map_err(|_| TypeMismatch::expecting("encodable item"))?
716                != Major::TextString
717            {
718                return Err(TypeMismatch::expecting("text literal"));
719            }
720            Ok(())
721        };
722
723        match self.inner() {
724            InnerItem::String(CborString { ref items, .. }) => {
725                check_first(items.first().expect("Part of the type guarantees"))?;
726                append_items(items)?;
727            }
728            InnerItem::StreamString(_, ref chunks) => {
729                check_first(
730                    chunks
731                        .first
732                        .items
733                        .first()
734                        .expect("Part of the type guarantees"),
735                )?;
736                for CborString { ref items, .. } in chunks.iter() {
737                    append_items(items)?;
738                }
739            }
740            _ => return Err(TypeMismatch::expecting("byte literal")),
741        }
742
743        String::from_utf8(result).map_err(|_| TypeMismatch::expecting("valid UTF-8"))
744    }
745
746    /// Access the tag number
747    ///
748    /// This only succeeds if the item is a tagged item. Use [`Self::get_tagged()`] to get the
749    /// corresponding tagged item.
750    pub fn get_tag(&self) -> Result<u64, TypeMismatch> {
751        let InnerItem::Tagged(tag, _, _) = self.inner() else {
752            return Err(TypeMismatch::expecting("tagged item"));
753        };
754        Ok(*tag)
755    }
756
757    /// Access the inner item of a tag
758    ///
759    /// This only succeeds if the item is a tagged item. Use [`Self::get_tag()`] to get the
760    /// corresponding tag number.
761    pub fn get_tagged(&self) -> Result<&StandaloneItem<'a>, TypeMismatch> {
762        let InnerItem::Tagged(_, _, ref item) = self.inner() else {
763            return Err(TypeMismatch::expecting("tagged item"));
764        };
765        Ok(item)
766    }
767
768    /// Mutably ccess the inner item of a tag
769    ///
770    /// This only succeeds if the item is a tagged item. Use [`Self::get_tag()`] to get the
771    /// corresponding tag number.
772    pub fn get_tagged_mut(&mut self) -> Result<&mut StandaloneItem<'a>, TypeMismatch> {
773        let InnerItem::Tagged(_, _, ref mut item) = self.inner_mut() else {
774            return Err(TypeMismatch::expecting("tagged item"));
775        };
776        Ok(item)
777    }
778
779    /// Access the integer value of an item
780    ///
781    /// This only succeeds if the item is integer valued; the returned range is an i65 (expressed
782    /// as an i128 for simplicity).
783    pub fn get_integer(&self) -> Result<i128, TypeMismatch> {
784        let InnerItem::Number(ref number, _) = self.inner() else {
785            return Err(TypeMismatch::expecting("integer"));
786        };
787        match number.value() {
788            NumberValue::Float(_) => Err(TypeMismatch::expecting("integer")),
789            NumberValue::Positive(n) => Ok(n.into()),
790            NumberValue::Negative(n) => Ok(-1 - i128::from(n)),
791            // FIXME: that's definitely not a type mismatch
792            NumberValue::Big(n) => n
793                .try_into()
794                .map_err(|_| TypeMismatch::expecting("integer in i128 range")),
795        }
796    }
797
798    /// Access the float value of an item
799    ///
800    /// This only succeeds if the item is float valued.
801    pub fn get_float(&self) -> Result<f64, TypeMismatch> {
802        let InnerItem::Number(ref number, _) = self.inner() else {
803            return Err(TypeMismatch::expecting("float"));
804        };
805        match number.value() {
806            NumberValue::Float(f) => Ok(f),
807            NumberValue::Positive(_) => Err(TypeMismatch::expecting("float (not integer)")),
808            NumberValue::Negative(_) => Err(TypeMismatch::expecting("float (not integer)")),
809            NumberValue::Big(_) => Err(TypeMismatch::expecting("float (not integer)")),
810        }
811    }
812
813    /// Access the items inside an array
814    ///
815    /// This only succeeds if the item is an array.
816    pub fn get_array_items(&self) -> Result<impl Iterator<Item = &Item<'a>>, TypeMismatch> {
817        let InnerItem::Array(smv) = self.inner() else {
818            return Err(TypeMismatch::expecting("array"));
819        };
820
821        Ok(smv.iter())
822    }
823
824    /// Mutably access the items inside an array
825    ///
826    /// This only succeeds if the item is an array.
827    pub fn get_array_items_mut(
828        &mut self,
829    ) -> Result<impl Iterator<Item = &mut Item<'a>>, TypeMismatch> {
830        let InnerItem::Array(smv) = self.inner_mut() else {
831            return Err(TypeMismatch::expecting("array"));
832        };
833
834        Ok(smv.iter_mut())
835    }
836
837    /// Access the items inside a map
838    ///
839    /// This only succeeds if the item is a map.
840    pub fn get_map_items(
841        &self,
842    ) -> Result<impl Iterator<Item = (&Item<'a>, &Item<'a>)>, TypeMismatch> {
843        let InnerItem::Map(smv) = self.inner() else {
844            return Err(TypeMismatch::expecting("map"));
845        };
846
847        Ok(smv.iter().map(|kp| (&kp.key, &kp.value)))
848    }
849
850    /// Access the items inside a map
851    ///
852    /// This only succeeds if the item is a map.
853    pub fn get_map_items_mut(
854        &mut self,
855    ) -> Result<impl Iterator<Item = (&mut Item<'a>, &mut Item<'a>)>, TypeMismatch> {
856        let InnerItem::Map(smv) = self.inner_mut() else {
857            return Err(TypeMismatch::expecting("map"));
858        };
859
860        Ok(smv.iter_mut().map(|kp| (&mut kp.key, &mut kp.value)))
861    }
862
863    /// Removes any encoding indicators present in the item.
864    ///
865    /// This does not affect space or comments; in particular, an item containing only the
866    /// necessary space may be left with extraneous (but harmless) space that was previously needed
867    /// to set an encoding indicator apart from a value.
868    pub fn discard_encoding_indicators(&mut self) {
869        self.inner_mut().discard_encoding_indicators();
870    }
871
872    /// Alters how space and comments are placed inside the item.
873    ///
874    /// Being a plain [`Item`], this only affects inner space; it can not have any around itself.
875    ///
876    /// See the policy values for details.
877    pub fn set_delimiters(&mut self, policy: DelimiterPolicy) {
878        self.0.set_delimiters(policy);
879    }
880
881    /// Turn the item into a [`StandaloneItem`] and add a single new comment
882    pub fn with_comment(self, comment: &str) -> StandaloneItem<'a> {
883        let wrapped_comment = if comment.contains('/') {
884            format!("# {}\n", comment.replace('\n', "\n# "))
885        } else {
886            format!("/ {} /", comment)
887        };
888        StandaloneItem(S(wrapped_comment.into()), self, S::default())
889    }
890
891    /// Calls a callback on any key item inside the map.
892    ///
893    /// Calling this on a non-map item returns a [type mismatch error][TypeMismatch].
894    ///
895    /// An error string returned by the callback is stored in the tree as a comment next to the
896    /// key. A successful result may also contain text that gets placed next to the key, and may
897    /// contain a callback that gets applied in the same fashion to the value after the key.
898    ///
899    /// # Example
900    ///
901    /// The [`application::comment_ccs`] method is an exampel of a callback function.
902    ///
903    /// # Future development
904    ///
905    /// Once `feature(try_trait)` is usable, those return types can be simplified; until then,
906    /// using a [`Result`] enables easy propagation of errors out of the callbacks.
907    pub fn visit_map_elements<F, RF>(&mut self, mut f: RF) -> Result<(), TypeMismatch>
908    where
909        F: for<'b> FnMut(
910                &mut Item<'b>,
911            ) -> Result<(Option<String>, Option<MapValueHandler>), String>
912            + ?Sized,
913        RF: std::ops::DerefMut<Target = F>,
914    {
915        if !matches!(self.0, InnerItem::Map(_)) {
916            return Err(TypeMismatch::expecting("map"));
917        }
918        let f = f.deref_mut();
919        self.visit(&mut MapElementVisitor::new(f)).done();
920        Ok(())
921    }
922
923    /// Calls a callback on any key item inside the array.
924    ///
925    /// Calling this on a non-array item returns a [type mismatch error][TypeMismatch].
926    ///
927    /// An error string returned by the callback is stored in the tree as a comment next to the
928    /// item, as is the string in the successful variant.
929    ///
930    /// # Example
931    ///
932    /// The [`application::comment_lang_tag`] method is an exampel of a callback function. It is
933    /// relatively complex (see below).
934    ///
935    /// # Future development
936    ///
937    /// Once `feature(try_trait)` is usable, those return types can be simplified; until then,
938    /// using a [`Result`] enables easy propagation of errors out of the callbacks.
939    ///
940    /// This function is relatively impractical to use: When a callback needs to know its position
941    /// in the array (which is a frequent occurrence in inhomogenous arrays), it needs to use
942    /// internal state to count up; in doing so it needs to be a closure rather than a function,
943    /// and due to [suboptimal lifetimes](https://codeberg.org/chrysn/cbor-edn/issues/9) that means
944    /// that the callback may easily need to be boxed.
945    pub fn visit_array_elements<F, RF>(&mut self, mut f: RF) -> Result<(), TypeMismatch>
946    where
947        F: for<'b> FnMut(&mut Item<'b>) -> Result<Option<String>, String> + ?Sized,
948        RF: std::ops::DerefMut<Target = F>,
949    {
950        if !matches!(self.0, InnerItem::Array(_)) {
951            return Err(TypeMismatch::expecting("array"));
952        }
953        let f = f.deref_mut();
954        self.visit(&mut ArrayElementVisitor::new(f)).done();
955        Ok(())
956    }
957}
958
959impl Unparse for StandaloneItem<'_> {
960    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
961        self.0.serialize_write(formatter)?;
962        self.1.serialize_write(formatter)?;
963        self.2.serialize_write(formatter)?;
964        Ok(())
965    }
966
967    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
968        self.1.to_cbor()
969    }
970}
971
972impl Unparse for Item<'_> {
973    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
974        self.0.serialize_write(formatter)
975    }
976
977    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
978        self.0.to_cbor()
979    }
980}
981
982impl<'a> From<InnerItem<'a>> for StandaloneItem<'a> {
983    fn from(inner: InnerItem<'a>) -> Self {
984        Item::from(inner).into()
985    }
986}
987
988impl<'a> From<Item<'a>> for StandaloneItem<'a> {
989    fn from(inner: Item<'a>) -> Self {
990        Self(S::default(), inner, S::default())
991    }
992}
993
994impl<'a> From<InnerItem<'a>> for Item<'a> {
995    fn from(inner: InnerItem<'a>) -> Self {
996        Item(inner)
997    }
998}
999
1000/// A CBOR Sequence.
1001#[derive(Debug, Clone, PartialEq)]
1002pub struct Sequence<'a> {
1003    s0: S<'a>,
1004    items: Option<NonemptyMscVec<'a, Item<'a>>>,
1005}
1006
1007impl<'a> Sequence<'a> {
1008    /// Ingests CBOR Diagnostic Notation (EDN) representing a CBOR sequence
1009    ///
1010    /// Note that this will only return syntactic errors. Content errors that make it impossible to
1011    /// produce this as CBOR, such as non-matching encoding indicators or unknown application
1012    /// oriented literals, are not reported.
1013    pub fn parse(s: &'a str) -> Result<Self, ParseError> {
1014        cbordiagnostic::seq(s).map_err(ParseError)
1015    }
1016
1017    /// Produce an EDN String from the sequence
1018    pub fn serialize(&self) -> String {
1019        Unparse::serialize(self)
1020    }
1021
1022    pub fn from_cbor(cbor: &[u8]) -> Result<Self, CborError> {
1023        let mut tail = cbor;
1024        // Could this be more efficient if we returned an iterator? Yes. Would it be easier to
1025        // maintain? Probably not.
1026        let mut items = vec![];
1027        while !tail.is_empty() {
1028            let (item, new_tail) = Item::from_cbor_with_rest(tail)?;
1029            items.push(item);
1030            tail = new_tail;
1031        }
1032        let mut s = Self::new(items.into_iter());
1033        s.set_delimiters(DelimiterPolicy::SingleLineRegularSpacing);
1034        Ok(s)
1035    }
1036
1037    /// Encode into a binary CBOR representation
1038    pub fn to_cbor(&self) -> Result<Vec<u8>, InconsistentEdn> {
1039        Ok(Unparse::to_cbor(self)?.collect())
1040    }
1041
1042    /// Construct a CBOR sequence from items
1043    pub fn new(mut items: impl Iterator<Item = Item<'a>>) -> Self {
1044        Sequence {
1045            s0: Default::default(),
1046            items: items.next().map(|first| NonemptyMscVec::new(first, items)),
1047        }
1048    }
1049
1050    /// For each item in the tree that is any element of the squence, call a callback.
1051    ///
1052    /// This is primarily used to apply custom EDN filtering:
1053    ///
1054    /// ```rust
1055    /// # use cbor_edn::*;
1056    /// let mut full = Sequence::parse("0 /unmodified/, german'zweiundvierzig'").unwrap();
1057    /// full.visit_application_literals(&mut |id, value: String, item: &mut cbor_edn::Item| {
1058    ///     if id == "german" {
1059    ///         let numeric = match value.as_str() {
1060    ///             "dreiundzwanzig" => 23,
1061    ///             "zweiundvierzig" => 42,
1062    ///             _ => todo!(),
1063    ///         };
1064    ///         *item = Item::new_integer_decimal(numeric).into();
1065    ///     }
1066    ///     Ok(())
1067    /// });
1068    /// assert_eq!(full.serialize(), "0 /unmodified/, 42");
1069    /// ```
1070    pub fn visit_application_literals<F, RF>(&mut self, mut f: RF)
1071    where
1072        F: for<'b> FnMut(String, String, &mut Item<'b>) -> Result<(), String> + ?Sized,
1073        RF: std::ops::DerefMut<Target = F>,
1074    {
1075        self.visit(&mut ApplicationLiteralsVisitor {
1076            user_fn: f.deref_mut(),
1077        });
1078    }
1079
1080    /// For each item in the full tree of any element (including embedded representations) that is
1081    /// tagged, call a callback.
1082    ///
1083    /// Any error string is placed in a comment next to the item. The function should return Ok(())
1084    /// on any tags it is not interested in visiting.
1085    ///
1086    /// This is primarily used to apply custom EDN application; see [application::dt_tag_to_aol] for an
1087    /// example.
1088    pub fn visit_tag<F, RF>(&mut self, mut f: RF)
1089    where
1090        F: for<'b> FnMut(u64, &mut Item<'b>) -> Result<(), String> + ?Sized,
1091        RF: std::ops::DerefMut<Target = F>,
1092    {
1093        self.visit(&mut TagVisitor {
1094            user_fn: f.deref_mut(),
1095        });
1096    }
1097
1098    /// Access the items of the sequence
1099    pub fn items(&mut self) -> impl Iterator<Item = &Item<'a>> {
1100        self.items.as_ref().map(|i| i.iter()).into_iter().flatten()
1101    }
1102
1103    /// Mutably access the items of the sequence
1104    pub fn items_mut(&mut self) -> impl Iterator<Item = &mut Item<'a>> {
1105        self.items
1106            .as_mut()
1107            .map(|i| i.iter_mut())
1108            .into_iter()
1109            .flatten()
1110    }
1111
1112    #[deprecated(note = "renamed to items_mut()")]
1113    pub fn get_items_mut(&mut self) -> impl Iterator<Item = &mut Item<'a>> {
1114        self.items_mut()
1115    }
1116
1117    /// Removes any encoding indicators present in the sequence.
1118    ///
1119    /// This does not affect space or comments; in particular, an item containing only the
1120    /// necessary space may be left with extraneous (but harmless) space that was previously needed
1121    /// to set an encoding indicator apart from a value.
1122    pub fn discard_encoding_indicators(&mut self) {
1123        for i in self.items_mut() {
1124            i.discard_encoding_indicators()
1125        }
1126    }
1127
1128    /// Alters how space and comments are placed inside the sequence.
1129    ///
1130    /// See the policy values for details.
1131    pub fn set_delimiters(&mut self, policy: DelimiterPolicy) {
1132        // On the top level, let's not add the leading \n, because that would cause an empty line
1133        // above the sole element formatted like this.
1134        self.s0.set_delimiters(policy, false);
1135        if let Some(items) = self.items.as_mut() {
1136            items.first.set_delimiters(policy);
1137            for (msc, item) in items.tail.iter_mut() {
1138                msc.set_delimiters(policy, true);
1139                item.set_delimiters(policy);
1140            }
1141            items.soc.set_delimiters(policy, !items.tail.is_empty());
1142        }
1143    }
1144
1145    fn visit(&mut self, visitor: &mut impl Visitor) {
1146        if let Some(nmv) = self.items.as_mut() {
1147            nmv.visit(visitor).use_space_after(&mut self.s0).done();
1148        }
1149    }
1150
1151    /// Clone the item, turning any [`Cow::Borrowed`] into owned versions, which can then satisfy
1152    /// any lifetime.
1153    pub fn cloned<'any>(&self) -> Sequence<'any> {
1154        Sequence {
1155            s0: self.s0.cloned(),
1156            items: self.items.as_ref().map(|i| i.cloned()),
1157        }
1158    }
1159}
1160
1161impl Unparse for Sequence<'_> {
1162    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
1163        self.s0.serialize_write(formatter)?;
1164        if let Some(items) = self.items.as_ref() {
1165            items.serialize_write(formatter)?;
1166        }
1167        Ok(())
1168    }
1169
1170    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
1171        let chain = self.items.as_ref().map(|items| items.to_cbor());
1172        let chain = chain.transpose();
1173        chain.map(|optit| optit.into_iter().flatten())
1174    }
1175}
1176
1177/// Rule set for the `set_delimiters()` family of methods
1178#[derive(Copy, Clone, Debug, PartialEq)]
1179#[non_exhaustive]
1180pub enum DelimiterPolicy {
1181    /// Remove all comments, optional space and commas; place commas exactly where in there absence there
1182    /// would need to be space instead.
1183    DiscardAll,
1184    /// Like [`DiscardAll`][DelimiterPolicy::DiscardAll], but leave comments in place.
1185    DiscardAllButComments,
1186    /// Set commas where separation is mandatory, followed by a single space; set a single space after colons of key-value pairs.
1187    ///
1188    /// All other space and commas are removed. Comments are retained, including space between
1189    /// adjacent comments.
1190    SingleLineRegularSpacing,
1191    /// Replace all space with automated indentation. Comments are left in place, including line
1192    /// breaks, space and commas inside or between adjacent comments.
1193    ///
1194    /// For an easy default construction, see the [`.indented()`](Self::indented) method.
1195    IndentedRegularSpacing {
1196        /// Indentation level at the start
1197        base_indent: usize,
1198        /// Indentation added per nesting level
1199        indent_level: usize,
1200        /// Maximum width of lines that is left as a single item.
1201        ///
1202        /// If zero, this will wrap all nested structures; otherwise, it will leave small items
1203        /// with `SingleLineRegularSpacing`.
1204        ///
1205        /// Note that this measures line width in bytes; this is not exact if non-ASCII characters
1206        /// are involved, but a good enough estimate for most EDN content.
1207        max_width: usize,
1208    },
1209    /// Set a single space wherever one is allowed.
1210    ///
1211    /// This is not a practical policy over-all, but some functions may set this for their
1212    /// downstream items.
1213    SingleSpace,
1214}
1215
1216impl DelimiterPolicy {
1217    /// Constructor for [`Self::IndentedRegularSpacing`] with default settings
1218    pub fn indented() -> Self {
1219        Self::IndentedRegularSpacing {
1220            base_indent: 0,
1221            indent_level: 4,
1222            max_width: 80,
1223        }
1224    }
1225}
1226
1227/// Trait through which a parsed CBOR diagnostic notation item can be turned back into a string
1228trait Unparse: Sized {
1229    /// Write the full item into a given formatter
1230    ///
1231    /// This is mainly used to implement this trait, but rarely called from the outside.
1232    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result;
1233
1234    /// Produce a String from the full item
1235    ///
1236    /// No reason is known to not use the provided method; this is what is usually called on an
1237    /// item implemlenting this trait.
1238    fn serialize(&self) -> String {
1239        struct Unparsed<'a, T: Unparse>(&'a T);
1240        impl<T: Unparse> core::fmt::Display for Unparsed<'_, T> {
1241            fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
1242                self.0.serialize_write(f)
1243            }
1244        }
1245
1246        format!("{}", Unparsed(self))
1247    }
1248
1249    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn>;
1250}
1251
1252/// This represents a `T *(MSC T) SOC` sequence.
1253///
1254/// This type is common to CBOR sequences, streamstrings and array/map, with different mechsnisms of
1255/// optionality around them ("just have the whole thing None", "there must be at least one" and
1256/// "the empty variant has a different type (specms vs. spec) next to it").
1257#[derive(Debug, Clone, PartialEq)]
1258struct NonemptyMscVec<'a, T: Unparse> {
1259    // Most users of this are somehow inside Item, and T is usally an item itself -- so we box the
1260    // T here to avoid recursively sized types.
1261    first: Box<T>,
1262    tail: Vec<(MSC<'a>, T)>,
1263    soc: SOC<'a>,
1264}
1265
1266impl<'a, T: Unparse> NonemptyMscVec<'a, T> {
1267    /// Creates a new instance from just the items, with default space.
1268    fn new(first: T, tail: impl Iterator<Item = T>) -> Self {
1269        Self {
1270            first: Box::new(first),
1271            tail: tail.map(|i| (Default::default(), i)).collect(),
1272            soc: Default::default(),
1273        }
1274    }
1275
1276    /// Creates a new instance, taking explicitly all space components (as used in a parser).
1277    fn new_parsing(first: T, tail: Vec<(MSC<'a>, T)>, soc: SOC<'a>) -> Self {
1278        Self {
1279            first: Box::new(first),
1280            tail,
1281            soc,
1282        }
1283    }
1284
1285    fn len(&self) -> usize {
1286        1 + self.tail.len()
1287    }
1288
1289    fn iter(&self) -> impl Iterator<Item = &T> {
1290        core::iter::once(&*self.first).chain(self.tail.iter().map(|(_msc, t)| t))
1291    }
1292}
1293
1294impl<'a> NonemptyMscVec<'a, Item<'a>> {
1295    fn visit(&mut self, visitor: &mut impl Visitor) -> ProcessResult {
1296        let mut own_result = self.first.visit(visitor);
1297        let mut last_result: Option<ProcessResult> = None;
1298        for (msc, item) in self.tail.iter_mut() {
1299            if let Some(result) = last_result.take() {
1300                result.use_space_after(msc).done();
1301            } else {
1302                own_result = own_result.use_space_after(msc);
1303            }
1304            let item_result = item.visit(visitor);
1305            let replaced = last_result.replace(item_result.use_space_before(msc));
1306            assert!(replaced.is_none());
1307        }
1308        if let Some(result) = last_result.take() {
1309            result.use_space_after(&mut self.soc).done();
1310        } else {
1311            own_result = own_result.use_space_after(&mut self.soc);
1312        }
1313
1314        own_result
1315    }
1316
1317    fn cloned<'any>(&self) -> NonemptyMscVec<'any, Item<'any>> {
1318        NonemptyMscVec {
1319            first: Box::new(self.first.cloned()),
1320            tail: self
1321                .tail
1322                .iter()
1323                .map(|(msc, i)| (msc.cloned(), i.cloned()))
1324                .collect(),
1325            soc: self.soc.cloned(),
1326        }
1327    }
1328}
1329// Those ↑ and ↓ are identical, but we don't have a trait for being visit'able and having a
1330// cloned()… should we?
1331impl<'a> NonemptyMscVec<'a, Kp<'a>> {
1332    fn visit(&mut self, visitor: &mut impl Visitor) -> ProcessResult {
1333        let mut own_result = self.first.visit(visitor);
1334        let mut last_result: Option<ProcessResult> = None;
1335        for (msc, item) in self.tail.iter_mut() {
1336            if let Some(result) = last_result.take() {
1337                result.use_space_after(msc).done();
1338            } else {
1339                own_result = own_result.use_space_after(msc);
1340            }
1341            let item_result = item.visit(visitor);
1342            let replaced = last_result.replace(item_result.use_space_before(msc));
1343            assert!(replaced.is_none());
1344        }
1345        if let Some(result) = last_result.take() {
1346            result.use_space_after(&mut self.soc).done();
1347        } else {
1348            own_result = own_result.use_space_after(&mut self.soc);
1349        }
1350
1351        own_result
1352    }
1353
1354    fn cloned<'any>(&self) -> NonemptyMscVec<'any, Kp<'any>> {
1355        NonemptyMscVec {
1356            first: Box::new(self.first.cloned()),
1357            tail: self
1358                .tail
1359                .iter()
1360                .map(|(msc, i)| (msc.cloned(), i.cloned()))
1361                .collect(),
1362            soc: self.soc.cloned(),
1363        }
1364    }
1365}
1366// ↓ And that's only needef for around strings
1367impl<'a> NonemptyMscVec<'a, CborString<'a>> {
1368    fn cloned<'any>(&self) -> NonemptyMscVec<'any, CborString<'any>> {
1369        NonemptyMscVec {
1370            first: Box::new(self.first.cloned()),
1371            tail: self
1372                .tail
1373                .iter()
1374                .map(|(msc, i)| (msc.cloned(), i.cloned()))
1375                .collect(),
1376            soc: self.soc.cloned(),
1377        }
1378    }
1379}
1380
1381// With feature(precise_capturing), we can use the impl … + use syntax, and unify over T.
1382// fn iter_mut(&mut self) -> impl Iterator<Item = &mut T> + use<'_, 'a, T> {
1383macro_rules! nmv_concrete_impl {
1384    ($t:ident) => {
1385        impl<'a> NonemptyMscVec<'a, $t<'a>> {
1386            fn iter_mut(&mut self) -> impl Iterator<Item = &mut $t<'a>> {
1387                let first: &mut $t<'a> = &mut self.first;
1388                let tail = &mut self.tail;
1389                core::iter::once(first).chain(tail.iter_mut().map(|(_msc, i)| i))
1390            }
1391        }
1392    };
1393}
1394nmv_concrete_impl!(Item);
1395nmv_concrete_impl!(CborString);
1396
1397impl<T: Unparse> Unparse for NonemptyMscVec<'_, T> {
1398    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
1399        self.first.serialize_write(formatter)?;
1400        for (msc, item) in self.tail.iter() {
1401            msc.serialize_write(formatter)?;
1402            item.serialize_write(formatter)?;
1403        }
1404        self.soc.serialize_write(formatter)?;
1405        Ok(())
1406    }
1407
1408    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
1409        // Collecting in a vec of inner iterators to flush out the error early
1410        let collected: Result<Vec<_>, _> = self.iter().map(Unparse::to_cbor).collect();
1411        Ok(collected?.into_iter().flatten())
1412    }
1413}
1414
1415/// An empty-allowing extension of [`NonemptyMscVec`] where the empty and nonempty versions differ
1416/// in that the empty version has a spec and the nonempty version has a specms.
1417///
1418/// Note that this is a bit funny in that the space after specms is always empty when there is
1419/// Some spec (because then its inner MS consumes them), whereas when there is no spec, the space
1420/// lands in the `.s`.
1421#[derive(Debug, Clone, PartialEq)]
1422enum SpecMscVec<'a, T: Unparse> {
1423    Present {
1424        spec: Option<(Spec, MS<'a>)>,
1425        s: S<'a>,
1426        items: NonemptyMscVec<'a, T>,
1427    },
1428    Absent {
1429        spec: Option<Spec>,
1430        s: S<'a>,
1431    },
1432}
1433
1434impl<T: Unparse> SpecMscVec<'_, T> {
1435    /// Construct a new list from a spec and items
1436    fn new(spec: Option<Spec>, mut items: impl Iterator<Item = T>) -> Self {
1437        if let Some(first) = items.next() {
1438            // The Some is a bit weird here because the type of SpecMscVec expects Spec to
1439            // non-nullable; we'll see how this develops once that is removed)
1440            SpecMscVec::Present {
1441                spec: spec.map(|spec| (spec, Default::default())),
1442                s: Default::default(),
1443                items: NonemptyMscVec::new(first, items),
1444            }
1445        } else {
1446            SpecMscVec::Absent {
1447                spec,
1448                s: Default::default(),
1449            }
1450        }
1451    }
1452
1453    fn len(&self) -> usize {
1454        match self {
1455            SpecMscVec::Present { items, .. } => items.len(),
1456            SpecMscVec::Absent { .. } => 0,
1457        }
1458    }
1459
1460    fn spec(&self) -> Option<Spec> {
1461        match self {
1462            SpecMscVec::Present {
1463                spec: Some((spec, _ms)),
1464                ..
1465            } => Some(*spec),
1466            SpecMscVec::Present { spec: None, .. } => None,
1467            SpecMscVec::Absent { spec, .. } => *spec,
1468        }
1469    }
1470
1471    fn iter(&self) -> impl Iterator<Item = &T> {
1472        let (first, tail) = match self {
1473            SpecMscVec::Absent { .. } => (None, None),
1474            SpecMscVec::Present {
1475                items: NonemptyMscVec { first, tail, .. },
1476                ..
1477            } => (Some(first.as_ref()), Some(tail)),
1478        };
1479        first
1480            .into_iter()
1481            .chain(tail.into_iter().flatten().map(|(_msc, i)| i))
1482    }
1483
1484    /// Discards the own spec.
1485    ///
1486    /// On presence, this discards a single blank character from the MS that becomes the S (for the
1487    /// common case of the MS just having that mandatory space), but retains any other space
1488    /// including comments.
1489    fn discard_own_encoding_indicator(&mut self) {
1490        match self {
1491            SpecMscVec::Absent { spec, .. } => *spec = None,
1492            SpecMscVec::Present { spec, s, .. } => {
1493                if let Some((_spec, ms)) = spec.take() {
1494                    if ms != Default::default() {
1495                        // Most of the time, s is already empty, but during manipulation, it can
1496                        // get some value too.
1497                        s.prefix(ms.0);
1498                    }
1499                }
1500            }
1501        }
1502    }
1503}
1504
1505// With feature(precise_capturing), we can use the impl … + use syntax, and unify over T. When
1506// restoring the generic form, beware that this will require an explicit lifetime on the impl
1507// (instead of `impl<T: …> SpecMscVec<'_, T>`).
1508//
1509// fn iter_mut(&mut self) -> impl Iterator<Item = &mut T> + use<'_, 'a, T> {
1510macro_rules! smv_concrete_impl {
1511    ($t:ident) => {
1512        impl<'a> SpecMscVec<'a, $t<'a>> {
1513            fn iter_mut(&mut self) -> impl Iterator<Item = &mut $t<'a>> {
1514                let (first, tail) = match self {
1515                    SpecMscVec::Absent { .. } => (None, None),
1516                    SpecMscVec::Present {
1517                        items: NonemptyMscVec { first, tail, .. },
1518                        ..
1519                    } => (Some(first.as_mut()), Some(tail)),
1520                };
1521                first
1522                    .into_iter()
1523                    .chain(tail.into_iter().flatten().map(|(_msc, i)| i))
1524            }
1525
1526            // This one is not sufferyng from feature(precise_capt) but from our .cloned() not being a
1527            // trait method.
1528            fn cloned<'any>(&self) -> SpecMscVec<'any, $t<'any>> {
1529                match self {
1530                    SpecMscVec::Present { spec, s, items } => SpecMscVec::Present {
1531                        spec: spec.as_ref().map(|(spec, ms)| (*spec, ms.cloned())),
1532                        s: s.cloned(),
1533                        items: items.cloned(),
1534                    },
1535                    SpecMscVec::Absent { spec, s } => SpecMscVec::Absent {
1536                        spec: spec.map(|s| s.clone()),
1537                        s: s.cloned(),
1538                    },
1539                }
1540            }
1541        }
1542    };
1543}
1544smv_concrete_impl!(Item);
1545smv_concrete_impl!(Kp);
1546
1547impl<'a> SpecMscVec<'a, Item<'a>> {
1548    fn visit(&mut self, visitor: &mut impl Visitor) {
1549        match self {
1550            SpecMscVec::Present { spec: _, s, items } => {
1551                // anything to after the last item is processed internally
1552                items.visit(visitor).use_space_before(s).done();
1553            }
1554            SpecMscVec::Absent { spec: _, s: _ } => (),
1555        }
1556    }
1557}
1558// Those ↑ and ↓ are identical, but we don't have a trait for being visit'able … should we?
1559impl<'a> SpecMscVec<'a, Kp<'a>> {
1560    fn visit(&mut self, visitor: &mut impl Visitor) {
1561        match self {
1562            SpecMscVec::Present { spec: _, s, items } => {
1563                // anything to after the last item is processed internally
1564                items.visit(visitor).use_space_before(s).done();
1565            }
1566            SpecMscVec::Absent { spec: _, s: _ } => (),
1567        }
1568    }
1569}
1570
1571impl<T: Unparse> Unparse for SpecMscVec<'_, T> {
1572    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
1573        match self {
1574            SpecMscVec::Present { spec, s, items } => {
1575                if let Some((spec, msc)) = spec {
1576                    spec.serialize_write(formatter)?;
1577                    msc.serialize_write(formatter)?;
1578                }
1579                s.serialize_write(formatter)?;
1580                items.serialize_write(formatter)?;
1581                Ok(())
1582            }
1583            SpecMscVec::Absent { spec, s } => {
1584                if let Some(spec) = spec {
1585                    spec.serialize_write(formatter)?;
1586                }
1587                s.serialize_write(formatter)?;
1588                Ok(())
1589            }
1590        }
1591    }
1592
1593    // This writes just the CBOR items; it is up to the caller to process the spec.
1594    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
1595        // FIXME: Or is this just now the point to split Unparse and not implement the CBOR side?
1596
1597        // Collecting in a vec of inner iterators to flush out the error early
1598        let collected: Result<Vec<_>, _> = self.iter().map(Unparse::to_cbor).collect();
1599        Ok(collected?.into_iter().flatten())
1600    }
1601}
1602
1603/// A key-value pair of CBOR items, both surrounded by [S]pace, separated by a ":"
1604#[derive(Debug, Clone, PartialEq)]
1605struct Kp<'a> {
1606    key: Item<'a>,
1607    s0: S<'a>,
1608    s1: S<'a>,
1609    value: Item<'a>,
1610}
1611
1612impl<'a> Kp<'a> {
1613    fn new(key: Item<'a>, value: Item<'a>) -> Self {
1614        Self {
1615            key,
1616            s0: Default::default(),
1617            s1: Default::default(),
1618            value,
1619        }
1620    }
1621
1622    fn visit(&mut self, visitor: &mut impl Visitor) -> ProcessResult {
1623        let key_result = self.key.visit(visitor);
1624        let value_result = self.value.visit(visitor);
1625        key_result
1626            .use_space_after(&mut self.s0)
1627            .chain(value_result.use_space_before(&mut self.s1))
1628    }
1629
1630    fn cloned<'any>(&self) -> Kp<'any> {
1631        Kp {
1632            key: self.key.cloned(),
1633            s0: self.s0.cloned(),
1634            s1: self.s1.cloned(),
1635            value: self.value.cloned(),
1636        }
1637    }
1638}
1639
1640impl Unparse for Kp<'_> {
1641    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
1642        self.key.serialize_write(formatter)?;
1643        self.s0.serialize_write(formatter)?;
1644        formatter.write_str(":")?;
1645        self.s1.serialize_write(formatter)?;
1646        self.value.serialize_write(formatter)?;
1647        Ok(())
1648    }
1649
1650    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
1651        Ok([self.key.to_cbor()?, self.value.to_cbor()?]
1652            .into_iter()
1653            .flatten())
1654    }
1655}
1656
1657#[derive(Debug, Clone, PartialEq)]
1658enum Simple<'a> {
1659    False,
1660    True,
1661    Null,
1662    Undefined,
1663    // Note that later processing may be upset if the string is not a Number item, but cpa'something' may make sense
1664    Numeric(Box<StandaloneItem<'a>>),
1665}
1666impl Simple<'_> {
1667    pub(crate) fn cloned<'any>(&self) -> Simple<'any> {
1668        match self {
1669            Simple::False => Simple::False,
1670            Simple::True => Simple::True,
1671            Simple::Null => Simple::Null,
1672            Simple::Undefined => Simple::Undefined,
1673            Simple::Numeric(standalone_item) => Simple::Numeric(Box::new(standalone_item.cloned())),
1674        }
1675    }
1676}
1677
1678impl Unparse for Simple<'_> {
1679    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
1680        match self {
1681            Simple::False => formatter.write_str("false")?,
1682            Simple::True => formatter.write_str("true")?,
1683            Simple::Null => formatter.write_str("null")?,
1684            Simple::Undefined => formatter.write_str("undefined")?,
1685            Simple::Numeric(i) => {
1686                formatter.write_str("simple(")?;
1687                i.serialize_write(formatter)?;
1688                formatter.write_str(")")?;
1689            }
1690        }
1691        Ok(())
1692    }
1693
1694    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
1695        let mut result = Vec::new();
1696        match self {
1697            Simple::False => result.push(0xf4),
1698            Simple::True => result.push(0xf5),
1699            Simple::Null => result.push(0xf6),
1700            Simple::Undefined => result.push(0xf7),
1701            Simple::Numeric(i) => {
1702                let InnerItem::Number(ref number, spec) = i.inner() else {
1703                    return Err(InconsistentEdn(
1704                        "Items inside simple() need to be numbers for serialization.",
1705                    ));
1706                };
1707                let NumberValue::Positive(number) = number.value() else {
1708                    return Err(InconsistentEdn(
1709                        "Non-positive numbers can not be in a Simple",
1710                    ));
1711                };
1712                if number > 255 {
1713                    return Err(InconsistentEdn("Spec exceeds valid range of 0..=255"));
1714                }
1715                let requested = Spec::encode_argument(spec.as_ref(), Major::FloatSimple, number)?;
1716                let permissible = Spec::encode_argument(None, Major::FloatSimple, number)?;
1717                if requested != permissible {
1718                    return Err(InconsistentEdn(
1719                        "Encoding indicators on simple value must use the preferred encoding",
1720                    ));
1721                }
1722                result.extend(permissible);
1723            }
1724        };
1725        Ok(result.into_iter())
1726    }
1727}
1728
1729impl<'a> From<Simple<'a>> for Item<'a> {
1730    fn from(input: Simple<'a>) -> Self {
1731        InnerItem::Simple(input).into()
1732    }
1733}
1734
1735/// An arbitrary CBOR item
1736#[derive(Clone, Debug, PartialEq)]
1737enum InnerItem<'a> {
1738    Map(SpecMscVec<'a, Kp<'a>>),
1739    Array(SpecMscVec<'a, Item<'a>>),
1740    Tagged(u64, Option<Spec>, Box<StandaloneItem<'a>>),
1741    /// Stored as a string, but we could also explicitly capture the variation:
1742    /// * is a sign present? (even in an integer negative 0?)
1743    /// * what is the base?
1744    /// * how many leading zeros are there?
1745    /// * is there an explicit power (and if so, does it have an explicit sign, or leading zeros?)
1746    /// * note that there are no inner spaces or underscors: no "1 000 000" or "1_000_000", the
1747    ///   latter would conflict with encoding indicators.
1748    ///
1749    /// (and it can be arbitrarily long, exceeding a u64)
1750    Number(Number<'a>, Option<Spec>),
1751    Simple(Simple<'a>),
1752    String(CborString<'a>),
1753    StreamString(MS<'a>, NonemptyMscVec<'a, CborString<'a>>),
1754}
1755
1756impl InnerItem<'_> {
1757    /// Discard any encoding indicators ([Spec]) that may be part of the item
1758    fn discard_encoding_indicators(&mut self) {
1759        match self {
1760            InnerItem::Map(items) => {
1761                for i in items.iter_mut() {
1762                    i.key.discard_encoding_indicators();
1763                    i.value.discard_encoding_indicators();
1764                }
1765                items.discard_own_encoding_indicator();
1766            }
1767            InnerItem::Array(items) => {
1768                for i in items.iter_mut() {
1769                    i.discard_encoding_indicators();
1770                }
1771                items.discard_own_encoding_indicator();
1772            }
1773            InnerItem::Tagged(_n, spec, item) => {
1774                *spec = None;
1775                item.item_mut().discard_encoding_indicators();
1776            }
1777            InnerItem::Number(_n, spec) => {
1778                *spec = None;
1779            }
1780            InnerItem::Simple(Simple::Numeric(i)) => i.item_mut().discard_encoding_indicators(),
1781            InnerItem::Simple(_) => {}
1782            InnerItem::String(items) => {
1783                items.discard_encoding_indicators();
1784            }
1785            InnerItem::StreamString(_ms, items) => {
1786                // FIXME: Shouldn't this just become String? (StreamString is kind of an encoding
1787                // indicator)
1788                for i in items.iter_mut() {
1789                    i.discard_encoding_indicators();
1790                }
1791            }
1792        }
1793    }
1794
1795    fn set_delimiters(&mut self, policy: DelimiterPolicy) {
1796        use DelimiterPolicy::*;
1797
1798        let nested_policy = if let IndentedRegularSpacing {
1799            base_indent,
1800            indent_level,
1801            max_width,
1802        } = policy
1803        {
1804            // Try fitting it in one line; that doesn't do anything that won't be changed by proper
1805            // indentation later anyway, so we don't need to roll back.
1806            self.set_delimiters(SingleLineRegularSpacing);
1807            if self.serialize().len() + base_indent < max_width {
1808                return;
1809            }
1810
1811            IndentedRegularSpacing {
1812                base_indent: base_indent + indent_level,
1813                indent_level,
1814                max_width,
1815            }
1816        } else {
1817            policy
1818        };
1819
1820        match self {
1821            InnerItem::Map(items) => match items {
1822                SpecMscVec::Absent { s, .. } => s.set_delimiters(nested_policy, false),
1823                SpecMscVec::Present { s, items, .. } => {
1824                    s.set_delimiters(nested_policy, true);
1825                    let set_on_item = |kp: &mut Kp| {
1826                        kp.key.set_delimiters(nested_policy);
1827                        kp.value.set_delimiters(nested_policy);
1828                        kp.s0.set_delimiters(nested_policy, false);
1829                        if matches!(policy, SingleLineRegularSpacing) {
1830                            kp.s1.0 = " ".into();
1831                        } else {
1832                            // Or true … but that may need an extra case in the top-level
1833                            // inden`ted-to-single-line logic
1834                            kp.s1.set_delimiters(nested_policy, false);
1835                        }
1836                    };
1837                    set_on_item(&mut items.first);
1838                    for (msc, item) in items.tail.iter_mut() {
1839                        set_on_item(item);
1840                        msc.set_delimiters(nested_policy, true);
1841                    }
1842                    items.soc.set_delimiters(policy, true);
1843                }
1844            },
1845            InnerItem::Array(items) => match items {
1846                SpecMscVec::Absent { s, .. } => s.set_delimiters(nested_policy, false),
1847                SpecMscVec::Present { s, items, .. } => {
1848                    s.set_delimiters(nested_policy, true);
1849                    items.first.set_delimiters(nested_policy);
1850                    for (msc, item) in items.tail.iter_mut() {
1851                        item.set_delimiters(nested_policy);
1852                        msc.set_delimiters(nested_policy, true);
1853                    }
1854                    items.soc.set_delimiters(policy, true);
1855                }
1856            },
1857            InnerItem::Tagged(_n, _spec, item) => {
1858                item.set_delimiters(nested_policy);
1859            }
1860            InnerItem::Number(_n, _spec) => {}
1861            InnerItem::Simple(Simple::Numeric(item)) => {
1862                // Setting the nested_policy on the item as a whole would lead to unsightly indentation --
1863                // setting it piecemeal instead.
1864                item.0.set_delimiters(nested_policy, false);
1865                item.1.set_delimiters(nested_policy);
1866                item.2.set_delimiters(nested_policy, false);
1867            }
1868            InnerItem::Simple(_) => {}
1869            InnerItem::String(CborString { items, separators }) => {
1870                for i in items {
1871                    i.set_delimiters(nested_policy);
1872                }
1873                for (sep_pre, sep_post) in separators {
1874                    match nested_policy {
1875                        SingleLineRegularSpacing => {
1876                            sep_pre.set_delimiters(SingleSpace, true);
1877                            sep_post.set_delimiters(SingleSpace, false);
1878                        }
1879                        _ => {
1880                            sep_pre.set_delimiters(nested_policy, true);
1881                            sep_post.set_delimiters(nested_policy, false);
1882                        }
1883                    }
1884                }
1885            }
1886            InnerItem::StreamString(ms, NonemptyMscVec { first, tail, soc }) => {
1887                ms.set_delimiters(nested_policy, true);
1888                first.set_delimiters(nested_policy);
1889                for (ms, item) in tail {
1890                    ms.set_delimiters(nested_policy, true);
1891                    item.set_delimiters(nested_policy);
1892                }
1893                soc.set_delimiters(policy, true);
1894            }
1895        }
1896    }
1897
1898    fn visit(&mut self, visitor: &mut impl Visitor) {
1899        match self {
1900            InnerItem::Map(spec_msc_vec) => {
1901                spec_msc_vec.visit(visitor);
1902            }
1903            InnerItem::Array(spec_msc_vec) => {
1904                spec_msc_vec.visit(visitor);
1905            }
1906            InnerItem::Tagged(_number, _spec, standalone_item) => {
1907                // This mainly returns no ProcessResult because comments can well be placed inside
1908                // the item -- but if someone really wants to act on the outside, that could be
1909                // taken through here.
1910                standalone_item.visit(visitor);
1911            }
1912            InnerItem::Number(_number, _spec) => (),
1913            InnerItem::Simple(_simple) => (),
1914            InnerItem::String(_cbor_string) => (),
1915            InnerItem::StreamString(_ms, _nonempty_msc_vec) => (),
1916        }
1917    }
1918
1919    fn cloned<'any>(&self) -> InnerItem<'any> {
1920        match self {
1921            InnerItem::Map(spec_msc_vec) => InnerItem::Map(spec_msc_vec.cloned()),
1922            InnerItem::Array(spec_msc_vec) => InnerItem::Array(spec_msc_vec.cloned()),
1923            InnerItem::Tagged(tag, spec, standalone_item) => {
1924                InnerItem::Tagged(*tag, *spec, Box::new(standalone_item.cloned()))
1925            }
1926            InnerItem::Number(number, spec) => InnerItem::Number(number.cloned(), *spec),
1927            InnerItem::Simple(simple) => InnerItem::Simple(simple.cloned()),
1928            InnerItem::String(cbor_string) => InnerItem::String(cbor_string.cloned()),
1929            InnerItem::StreamString(ms, nonempty_msc_vec) => {
1930                InnerItem::StreamString(ms.cloned(), nonempty_msc_vec.cloned())
1931            }
1932        }
1933    }
1934}
1935
1936impl Unparse for InnerItem<'_> {
1937    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
1938        match self {
1939            InnerItem::Map(items) => {
1940                write!(formatter, "{{")?;
1941                items.serialize_write(formatter)?;
1942                write!(formatter, "}}")?;
1943                Ok(())
1944            }
1945            InnerItem::Array(items) => {
1946                write!(formatter, "[")?;
1947                items.serialize_write(formatter)?;
1948                write!(formatter, "]")?;
1949                Ok(())
1950            }
1951            InnerItem::Tagged(n, spec, item) => {
1952                write!(formatter, "{}", n)?;
1953                if let Some(spec) = spec {
1954                    spec.serialize_write(formatter)?;
1955                }
1956                formatter.write_str("(")?;
1957                item.serialize_write(formatter)?;
1958                formatter.write_str(")")?;
1959                Ok(())
1960            }
1961            InnerItem::Number(n, spec) => {
1962                formatter.write_str(&n.0)?;
1963                if let Some(spec) = spec {
1964                    spec.serialize_write(formatter)?;
1965                }
1966                Ok(())
1967            }
1968            InnerItem::Simple(s) => s.serialize_write(formatter),
1969            InnerItem::String(s) => s.serialize_write(formatter),
1970            InnerItem::StreamString(ms, nmv) => {
1971                formatter.write_str("(_")?;
1972                ms.serialize_write(formatter)?;
1973                nmv.serialize_write(formatter)?;
1974                formatter.write_str(")")?;
1975                Ok(())
1976            }
1977        }
1978    }
1979
1980    fn to_cbor(&self) -> Result<impl Iterator<Item = u8>, InconsistentEdn> {
1981        let mut result = vec![];
1982        match self {
1983            InnerItem::Map(smv) => {
1984                let len = smv.len();
1985                let spec = smv.spec();
1986                let (head, tail) = Spec::encode_item_count(spec.as_ref(), Major::Map, len)?;
1987                result.extend(head);
1988                for i in smv.iter() {
1989                    result.extend(i.to_cbor()?);
1990                }
1991                result.extend(tail);
1992            }
1993            InnerItem::Array(smv) => {
1994                let len = smv.len();
1995                let spec = smv.spec();
1996                let (head, tail) = Spec::encode_item_count(spec.as_ref(), Major::Array, len)?;
1997                result.extend(head);
1998                for i in smv.iter() {
1999                    result.extend(i.to_cbor()?);
2000                }
2001                result.extend(tail);
2002            }
2003            InnerItem::Tagged(n, spec, item) => {
2004                result.extend(Spec::encode_argument(spec.as_ref(), Major::Tagged, *n)?);
2005                result.extend(item.to_cbor()?);
2006            }
2007            InnerItem::Number(n, spec) => match n.value() {
2008                NumberValue::Positive(n) => {
2009                    result.extend(Spec::encode_argument(spec.as_ref(), Major::Unsigned, n)?)
2010                }
2011                NumberValue::Negative(n) => {
2012                    result.extend(Spec::encode_argument(spec.as_ref(), Major::Negative, n)?)
2013                }
2014                NumberValue::Float(n) => result.extend(float::encode(n, *spec)?),
2015                NumberValue::Big(n) => match spec {
2016                    None => {
2017                        let (tag, positive) = if n >= num_bigint::BigInt::ZERO {
2018                            (2, n)
2019                        } else {
2020                            (3, -n)
2021                        };
2022                        use num_traits::ops::bytes::ToBytes;
2023                        result.extend(Spec::encode_argument(None, Major::Tagged, tag)?);
2024                        let bytes = positive.to_be_bytes();
2025                        result.extend(Spec::encode_argument(
2026                            None,
2027                            Major::ByteString,
2028                            bytes
2029                                .len()
2030                                .try_into()
2031                                .expect("Even on 128-bit systems, EDN does not exceed 64bit sizes"),
2032                        )?);
2033                        result.extend(bytes);
2034                    }
2035                    _ => {
2036                        return Err(InconsistentEdn(
2037                            "Encoding indicators not specified for bignums",
2038                        ))
2039                    }
2040                },
2041            },
2042            InnerItem::Simple(s) => result.extend(s.to_cbor()?),
2043            InnerItem::String(s) => result.extend(s.to_cbor()?),
2044            InnerItem::StreamString(_ms, NonemptyMscVec { first, tail, .. }) => {
2045                let major = first.encoded_major_type()?;
2046                if !matches!(major, Major::TextString | Major::ByteString) {
2047                    // Syntax can't catch this: Might be an application oriented literal that is
2048                    // not string-valued
2049                    return Err(InconsistentEdn(
2050                        "Item in indefinite length string that is neither bytes nor string",
2051                    ));
2052                }
2053                result.push(((major as u8) << 5) | 31);
2054                result.extend(first.to_cbor()?);
2055                for item in tail.iter() {
2056                    if item.1.encoded_major_type()? != major {
2057                        return Err(InconsistentEdn("Item in indefinite length string has different encoding than head element"));
2058                    }
2059                    result.extend(item.1.to_cbor()?);
2060                }
2061                result.push(0xff);
2062            }
2063        }
2064        Ok(result.into_iter())
2065    }
2066}
2067
2068#[derive(PartialEq, Debug, Copy, Clone)]
2069enum Major {
2070    Unsigned = 0,
2071    Negative = 1,
2072    ByteString = 2,
2073    TextString = 3,
2074    Array = 4,
2075    Map = 5,
2076    Tagged = 6,
2077    FloatSimple = 7,
2078}
2079
2080impl Major {
2081    /// Given a byte, return its major type and the additional information.
2082    fn from_byte(byte: u8) -> (Self, u8) {
2083        (
2084            match byte >> 5 {
2085                0 => Major::Unsigned,
2086                1 => Major::Negative,
2087                2 => Major::ByteString,
2088                3 => Major::TextString,
2089                4 => Major::Array,
2090                5 => Major::Map,
2091                6 => Major::Tagged,
2092                7 => Major::FloatSimple,
2093                _ => unreachable!(),
2094            },
2095            byte & 0x1f,
2096        )
2097    }
2098}
2099
2100/// An encoding indicator
2101///
2102/// Encoding indicators are typically rendered with an underscore, eg. in `4_1`, `_1` is the
2103/// encoding indicator `Spec("1")`, and tells that the number 4 was encoded in more bytes than
2104/// would have been needed.
2105///
2106/// While encoding indicators are described as an extensible registry, new values would interfere
2107/// so deeply with this crate's operation that they would need a code change; consequently, unknown
2108/// values are rejected at parsing time.
2109#[derive(Copy, Clone, Debug, PartialEq)]
2110#[allow(non_camel_case_types)] // reason: underscores are part of what we express here
2111enum Spec {
2112    S_,
2113    S_i,
2114    S_0,
2115    S_1,
2116    S_2,
2117    S_3,
2118}
2119
2120impl Spec {
2121    /// Given an item count, produce the encoded item count for a given Major type (only makes
2122    /// sense for an array and map), as well as any terminator that'd be necessary after the list
2123    /// in case of in indefinite length encoding
2124    fn encode_item_count(
2125        self_: Option<&Self>,
2126        major: Major,
2127        count: usize,
2128    ) -> Result<(Vec<u8>, &[u8]), InconsistentEdn> {
2129        debug_assert!(matches!(major, Major::Map | Major::Array), "Encoding an item count only makes see for maps and arrays; strings work a bit different.");
2130        Ok((
2131            Spec::encode_argument(self_, major, count.try_into().expect("Even on 128bit architectures we can't have more than 64bit long counts of items"))?,
2132            if matches!(self_, Some(Spec::S_)) { [0xff].as_slice() } else { [].as_slice() },
2133        ))
2134    }
2135
2136    fn encode_argument(
2137        self_: Option<&Self>,
2138        major: Major,
2139        argument: u64,
2140    ) -> Result<Vec<u8>, InconsistentEdn> {
2141        let full_spec = match (self_, argument) {
2142            (None, 0..=23) => Self::S_i,
2143            (None, 0..=U8MAX) => Self::S_0,
2144            (None, 0..=U16MAX) => Self::S_1,
2145            (None, 0..=U32MAX) => Self::S_2,
2146            (None, _) => Self::S_3,
2147            (Some(s), _) => *s,
2148        };
2149
2150        let immediate_value = match full_spec {
2151            Self::S_ => 31,
2152            Self::S_i => {
2153                if argument < 24 {
2154                    argument as u8
2155                } else {
2156                    return Err(InconsistentEdn(
2157                        "Immediate encoding demanded but value exceeds 23",
2158                    ));
2159                }
2160            }
2161            Self::S_0 => 24,
2162            Self::S_1 => 25,
2163            Self::S_2 => 26,
2164            Self::S_3 => 27,
2165        };
2166        let first = core::iter::once(((major as u8) << 5) | immediate_value);
2167        Ok(match full_spec {
2168            Self::S_ | Self::S_i => first.collect(),
2169            Self::S_0 => first.chain(u8::try_from(argument)?.to_be_bytes()).collect(),
2170            Self::S_1 => first
2171                .chain(u16::try_from(argument)?.to_be_bytes())
2172                .collect(),
2173            Self::S_2 => first
2174                .chain(u32::try_from(argument)?.to_be_bytes())
2175                .collect(),
2176            Self::S_3 => first.chain(argument.to_be_bytes()).collect(),
2177        })
2178    }
2179
2180    fn serialize_write(&self, formatter: &mut core::fmt::Formatter) -> core::fmt::Result {
2181        match self {
2182            Self::S_ => formatter.write_str("_"),
2183            Self::S_i => formatter.write_str("_i"),
2184            Self::S_0 => formatter.write_str("_0"),
2185            Self::S_1 => formatter.write_str("_1"),
2186            Self::S_2 => formatter.write_str("_2"),
2187            Self::S_3 => formatter.write_str("_3"),
2188        }
2189    }
2190
2191    /// Return None if the integer argument leads to self being selected in preferred encoding
2192    /// anyway.
2193    ///
2194    /// We can't do this in [process_cbor_major_argument] because floats are not so trivial to
2195    /// classify.
2196    fn or_none_if_default_for_arg(self, arg: u64) -> Option<Self> {
2197        const U8MAXPLUS: u64 = U8MAX + 1;
2198        const U16MAXPLUS: u64 = U16MAX + 1;
2199        const U32MAXPLUS: u64 = U32MAX + 1;
2200        match (self, arg) {
2201            (Spec::S_i, 0..=23) => None,
2202            (Spec::S_0, 24..=U8MAX) => None,
2203            (Spec::S_1, U8MAXPLUS..=U16MAX) => None,
2204            (Spec::S_2, U16MAXPLUS..=U32MAX) => None,
2205            (Spec::S_3, U32MAXPLUS..=u64::MAX) => None,
2206            (s, _) => Some(s),
2207        }
2208    }
2209}
2210
2211impl core::str::FromStr for Spec {
2212    type Err = &'static str;
2213
2214    fn from_str(s: &str) -> Result<Self, Self::Err> {
2215        match s {
2216            "" => Ok(Self::S_),
2217            "i" => Ok(Self::S_i),
2218            "0" => Ok(Self::S_0),
2219            "1" => Ok(Self::S_1),
2220            "2" => Ok(Self::S_2),
2221            "3" => Ok(Self::S_3),
2222            _ => Err("Unsupported encoding indicator"),
2223        }
2224    }
2225}
2226
2227/// From a byte string, process the first and subsequent bytes into a major type, an argument, and
2228/// a spec
2229///
2230/// Spec will be S_ iff the [`Option<u64>`] is none.
2231#[allow(clippy::type_complexity)]
2232// reason: All items make sense here, and it is an internal function used in situations when you
2233// would expect those very items.
2234fn process_cbor_major_argument(
2235    cbor: &[u8],
2236) -> Result<(Major, Option<u64>, Spec, &[u8]), CborError> {
2237    // It would be tempting to use minicbor or another CBOR implementation, but they don't
2238    // expose which option was chosen for argument, so we are on our own, because we need that
2239    // information for encoding indicators.
2240    let head = cbor
2241        .first()
2242        .ok_or(CborError("Expected item, out of data"))?;
2243
2244    let (major, additional) = Major::from_byte(*head);
2245    let tail = &cbor[1..];
2246
2247    let (argument, spec, skip): (Option<u64>, _, _) = match additional {
2248        0..=23 => (Some(additional.into()), Spec::S_i, 0),
2249        24 => (
2250            Some(
2251                tail.first()
2252                    .copied()
2253                    .ok_or(CborError("Missing 1 byte"))?
2254                    .into(),
2255            ),
2256            Spec::S_0,
2257            1,
2258        ),
2259        25 => (
2260            Some(
2261                u16::from_be_bytes(
2262                    tail.get(..2)
2263                        .ok_or(CborError("Missing 2 bytes"))?
2264                        .try_into()
2265                        .unwrap(),
2266                )
2267                .into(),
2268            ),
2269            Spec::S_1,
2270            2,
2271        ),
2272        26 => (
2273            Some(
2274                u32::from_be_bytes(
2275                    tail.get(..4)
2276                        .ok_or(CborError("Missing 4 bytes"))?
2277                        .try_into()
2278                        .unwrap(),
2279                )
2280                .into(),
2281            ),
2282            Spec::S_2,
2283            4,
2284        ),
2285        27 => (
2286            Some(u64::from_be_bytes(
2287                tail.get(..8)
2288                    .ok_or(CborError("Missing 8 bytes"))?
2289                    .try_into()
2290                    .unwrap(),
2291            )),
2292            Spec::S_3,
2293            8,
2294        ),
2295        31 => (None, Spec::S_, 0),
2296        _ => return Err(CborError("Reserved header byte")),
2297    };
2298
2299    Ok((major, argument, spec, &tail[skip..]))
2300}
2301
2302peg::parser! { grammar cbordiagnostic() for str {
2303
2304// seq             = S [item *(MSC item) SOC]
2305    pub rule seq() -> Sequence<'input>
2306        = s0:S() items:(first:item() tail:(msc:MSC() inner:item() { (msc, inner) })* soc:SOC() { NonemptyMscVec::new_parsing(first, tail, soc) })? {
2307            Sequence { s0, items }
2308        }
2309
2310
2311// one-item        = S item S
2312    pub rule one_item() -> StandaloneItem<'input>
2313        = s1:S() i:item() s2:S() { StandaloneItem(s1, i, s2) }
2314
2315// item            = map / array / tagged
2316//                 / number / simple
2317//                 / string / streamstring
2318    rule item() -> Item<'input>
2319        = inner:(map() / array() / tagged() /
2320          number() / simple() /
2321          string:string() { InnerItem::String(string) } / streamstring()) { inner.into() }
2322
2323// string1         = (tstr / bstr) spec
2324    rule string1() -> String1e<'input>
2325        = value:$(tstr() / bstr()) spec:spec() {?
2326            Ok(if value.starts_with("<<") {
2327                // FIXME: How can we propagate the parsing we already did instead of parsing again
2328                // and having bad error handling?
2329                String1e::EmbeddedChunk(cbordiagnostic::seq(&value[2..value.len() - 2]).map_err(|_| "Parse error in embedded CBOR")?, spec)
2330            } else {
2331                String1e::TextChunk(Cow::Borrowed(value), spec)
2332            })
2333        }
2334// string1e        = string1 / ellipsis
2335    rule string1e() -> String1e<'input>
2336        = string1() / ellipsis()
2337// ellipsis        = 3*"." ; "..." or more dots
2338    rule ellipsis() -> String1e<'input>
2339        = dots:$("."*<3,>) { String1e::Ellipsis(dots.len()) }
2340// string          = string1e *(S "+" S string1e)
2341    rule string() -> CborString<'input>
2342        = head:string1e() tail:(separator:S() "+" s1:S() inner:string1e() { (separator, s1, inner) })* {
2343            CborString {
2344                items: core::iter::once(head).chain(tail.iter().map(|(_sep_pre, _sep_post, inner)| inner).cloned()).collect(),
2345                separators: tail.iter().map(|(sep_pre, sep_post, _inner)| (sep_pre.clone(), sep_post.clone())).collect()
2346            }
2347        }
2348
2349// number          = (hexfloat / hexint / octint / binint
2350//                    / decnumber / nonfin) spec
2351    rule number() -> InnerItem<'input>
2352        = num:$((hexfloat() / hexint() / octint() / binint() / decnumber() / nonfin())) spec:spec() {InnerItem::Number(Number(Cow::Borrowed(num)), spec)}
2353
2354// sign            = "+" / "-"
2355    rule sign() -> Sign
2356        = "+" { Sign::Plus } / "-" { Sign::Minus }
2357
2358// decnumber       = [sign] (1*DIGIT ["." *DIGIT] / "." 1*DIGIT)
2359//                          ["e" [sign] 1*DIGIT]
2360    pub rule decnumber() -> NumberParts<'input>
2361        = sign:sign()? prepost:(predot:$(DIGIT()+) postdot:("." postdot:$(DIGIT()*) { postdot })? { (predot, postdot) } / "." postdot:$(DIGIT()+) { ("", Some(postdot)) })
2362                         exponent:(['e'|'E'] sign:sign()? exponent:$(DIGIT()+) {(sign, exponent)})?
2363        {
2364            let (predot, postdot) = prepost;
2365            NumberParts {
2366                base: 10,
2367                sign,
2368                predot,
2369                postdot,
2370                exponent,
2371            }
2372        }
2373// hexfloat        = [sign] "0x" (1*HEXDIG ["." *HEXDIG] / "." 1*HEXDIG)
2374//                          "p" [sign] 1*DIGIT
2375   pub rule hexfloat() -> NumberParts<'input>
2376       = sign:sign()?
2377       "0" ['x'|'X']
2378       prepost:(
2379           predot:$(HEXDIG()+) postdot:("." postdot:$(HEXDIG()*) { postdot })?
2380           { (Some(predot), postdot) }
2381           / "." postdot:$(HEXDIG()+)
2382           { (None, Some(postdot)) }
2383       )
2384       ['p'|'P']
2385       expsign:sign()?
2386       exp:$(DIGIT()+)
2387       {
2388           NumberParts {
2389               base: 16,
2390               sign,
2391               predot: prepost.0.unwrap_or(""),
2392               postdot: prepost.1,
2393               exponent: Some((expsign, exp))
2394           }
2395       }
2396// hexint          = [sign] "0x" 1*HEXDIG
2397   pub rule hexint() -> NumberParts<'input>
2398       = sign:sign()? "0" ['x'|'X'] predot:$(HEXDIG()+) { NumberParts {base: 16, sign, predot, postdot: None, exponent: None} }
2399// octint          = [sign] "0o" 1*ODIGIT
2400   pub rule octint() -> NumberParts<'input>
2401       = sign:sign()? "0" ['o'|'O'] predot:$(ODIGIT()+) { NumberParts {base: 8, sign, predot, postdot: None, exponent: None} }
2402// binint          = [sign] "0b" 1*BDIGIT
2403   pub rule binint() -> NumberParts<'input>
2404       = sign:sign()? "0" ['b'|'B'] predot:$(BDIGIT()+) { NumberParts {base: 2, sign, predot, postdot: None, exponent: None} }
2405// nonfin          = %s"Infinity"
2406//                 / %s"-Infinity"
2407//                 / %s"NaN"
2408    rule nonfin()
2409        = "Infinity" / "-Infinity" / "NaN"
2410// simple          = %s"false"
2411//                 / %s"true"
2412//                 / %s"null"
2413//                 / %s"undefined"
2414//                 / %s"simple(" S item S ")"
2415    rule simple() -> InnerItem<'input>
2416        = "false" { InnerItem::Simple(Simple::False) }
2417                / "true" { InnerItem::Simple(Simple::True) }
2418                / "null" { InnerItem::Simple(Simple::Null) }
2419                / "undefined" { InnerItem::Simple(Simple::Undefined) }
2420                / "simple(" s1:S() i:item() s2:S() ")" {InnerItem::Simple(Simple::Numeric(Box::new(StandaloneItem(s1, i, s2))))}
2421// uint            = "0" / DIGIT1 *DIGIT
2422    rule uint() -> u64
2423        = n:$("0" / DIGIT1() DIGIT()*) {? n.parse().or(Err("Exceeding tag space")) }
2424// tagged          = uint spec "(" S item S ")"
2425    rule tagged() -> InnerItem<'input>
2426        = tag:uint() tagspec:spec() "(" s0:S() value:item() s1:S() ")" { InnerItem::Tagged(tag, tagspec, Box::new(StandaloneItem(s0, value, s1))) }
2427
2428// app-prefix      = lcalpha *lcalnum ; including h and b64
2429//                 / ucalpha *ucalnum ; tagged variant, if defined
2430    pub rule app_prefix() =
2431        quiet!{lcalpha() lcalnum()* / ucalpha() ucalnum()*} / expected!("application prefix")
2432// app-string      = app-prefix sqstr
2433    pub rule app_string() -> (&'input str, String)
2434        = prefix:$(app_prefix()) data:sqstr() { (prefix, data) }
2435// sqstr           = SQUOTE *single-quoted SQUOTE
2436    pub rule sqstr() -> String // Yes it is String: Just because they can contain binary doesn't mean
2437                               // that the ABNF allows it -- no '\xff'.
2438        = SQUOTE() sqstr:single_quoted()* SQUOTE() { sqstr.iter().filter_map(|c| *c).collect() }
2439// bstr            = app-string / sqstr / embedded
2440//                   ; app-string could be any type
2441    rule bstr()
2442        = app_string() / sqstr() / embedded()
2443// tstr            = DQUOTE *double-quoted DQUOTE
2444    pub rule tstr() -> String
2445        = DQUOTE() text:double_quoted()* DQUOTE() { text.iter().filter_map(|c| *c).collect() }
2446
2447// embedded        = "<<" seq ">>"
2448    rule embedded()
2449        = "<<" seq() ">>"
2450
2451// array           = "[" (specms S item *(MSC item) SOC / spec S) "]"
2452    rule array() -> InnerItem<'input>
2453        = "[" array:(
2454            spec:specms() s:S() first:item() tail:(msc:MSC() inner:item() { (msc, inner) })* soc:SOC()
2455            { SpecMscVec::Present { spec, s, items: NonemptyMscVec::new_parsing(first, tail, soc) } }
2456            / spec:spec() s:S()
2457            { SpecMscVec::Absent { spec, s } }
2458            ) "]"
2459        { InnerItem::Array(array) }
2460// map             = "{" (specms S keyp *(MSC keyp) SOC / spec S) "}"
2461    rule map() -> InnerItem<'input>
2462        = "{" map:(
2463            spec:specms() s:S() first:keyp() tail:(msc:MSC() inner:keyp() { (msc, inner) })* soc:SOC()
2464            { SpecMscVec::Present { spec, s, items: NonemptyMscVec::new_parsing(first, tail, soc) } }
2465            / spec:spec() s:S()
2466            { SpecMscVec::Absent { spec, s } }
2467            ) "}"
2468        { InnerItem::Map(map) }
2469// keyp            = item S ":" S item
2470    rule keyp() -> Kp<'input>
2471        = key:item() s0:S() ":" s1:S() value:item() { Kp { key, s0, s1, value } }
2472
2473// ; We allow %x09 HT in prose, but not in strings
2474// blank           = %x09 / %x0A / %x0D / %x20
2475    rule blank() -> ()
2476        = quiet!{"\x09" / "\x0A" / "\x0D" / "\x20"} / expected!("tabs, spaces or newlines")
2477
2478// non-slash       = blank / %x21-2e / %x30-D7FF / %xE000-10FFFF
2479    rule non_slash() -> ()
2480        = blank() / ['\x21'..='\x2e' | '\x30'..='\u{D7FF}' | '\u{E000}'..='\u{10FFFF}'] {}
2481// non-lf          = %x09 / %x0D / %x20-D7FF / %xE000-10FFFF
2482    rule non_lf() -> ()
2483        = ['\x09' | '\x0D' | '\x20'..='\u{D7FF}' | '\u{E000}'..='\u{10FFFF}'] {}
2484
2485// comment         = "/" *non-slash "/"
2486//                 / "#" *non-lf %x0A
2487    rule comment() -> Comment
2488        = quiet!{"/" body:$(non_slash()*) "/" { Comment::Slashed } / "#" body:$(non_lf()*) "\x0A" { Comment::Hashed }} / expected!("comment")
2489
2490// ; optional space
2491// S               = *blank *(comment *blank)
2492    // This rule is expressed twice because it is very common to need `s0:S()`, but for comment
2493    // reshaping we occasionally need the internals
2494    rule S() -> S<'input>
2495        = data:S_details() { S(Cow::Borrowed(data.data)) }
2496    pub(crate) rule S_details() -> SDetails<'input>
2497        = sliced:with_slice(<blank()* comments:(comment:comment() blank()* { comment })* { comments.last().cloned() }>) { SDetails { data: sliced.1, last_comment_style: sliced.0 } }
2498// ; mandatory space
2499// MS              = (blank/comment) S
2500    rule MS() -> MS<'input>
2501        = data:$( (blank() / comment() ) S()) { MS(Cow::Borrowed(data)) }
2502// ; mandatory comma and/or space
2503// MSC             = ("," S) / (MS ["," S])
2504    rule MSC() -> MSC<'input>
2505        = data:$( ("," S()) / (MS() ("," S())?) ) { MSC(Cow::Borrowed(data)) }
2506
2507// ; optional comma and/or space
2508// SOC             = S ["," S]
2509    rule SOC() -> SOC<'input>
2510        = data:$( SOC_details() ) { SOC(Cow::Borrowed(data)) }
2511    pub(crate) rule SOC_details() -> (SDetails<'input>, Option<SDetails<'input>>)
2512        = before:S_details() after:("," after:S_details() { after })? { (before, after) }
2513
2514// ; check semantically that strings are either all text or all bytes
2515// ; note that there must be at least one string to distinguish
2516// streamstring    = "(_" MS string *(MSC string) SOC ")"
2517    rule streamstring() -> InnerItem<'input>
2518        = "(_" ms:MS() first:string() tail:(msc:MSC() inner:string() { (msc, inner) })* soc:SOC() ")" {
2519            InnerItem::StreamString(ms, NonemptyMscVec::new_parsing(first, tail, soc))
2520        }
2521
2522// spec            = ["_" *wordchar]
2523    rule spec() -> Option<Spec>
2524        = quiet!{("_" spec:$(wordchar()*) {? spec.parse() })? } / expected!(r#"a valid encoding indicator ("_", "_i", "_0", "_1", "_2" or "_3")"#)
2525// specms          = ["_" *wordchar MS]
2526    rule specms() -> Option<(Spec, MS<'input>)>
2527        = quiet!{("_" spec:$(wordchar()*) ms:MS() {? spec.parse().map(|spec| (spec, ms)) })? } / expected!(r#"a valid encoding indicator ("_", "_i", "_0", "_1", "_2" or "_3")"#)
2528
2529// double-quoted   = unescaped
2530//                 / SQUOTE
2531//                 / "\" DQUOTE
2532//                 / "\" escapable
2533    rule double_quoted() -> Option<char>
2534        = unescaped() /
2535            SQUOTE() { Some('\'') } /
2536            "\\" DQUOTE() { Some('"') } /
2537            "\\" e:escapable() { Some(e) }
2538
2539// single-quoted   = unescaped
2540//                 / DQUOTE
2541//                 / "\" SQUOTE
2542//                 / "\" escapable
2543    rule single_quoted() -> Option<char>
2544        = unescaped() / DQUOTE() { Some('"') } / "\\" SQUOTE() { Some('\'') } / "\\" e:escapable() { Some(e) }
2545
2546// escapable       = %s"b" ; BS backspace U+0008
2547//                 / %s"f" ; FF form feed U+000C
2548//                 / %s"n" ; LF line feed U+000A
2549//                 / %s"r" ; CR carriage return U+000D
2550//                 / %s"t" ; HT horizontal tab U+0009
2551//                 / "/"   ; / slash (solidus) U+002F (JSON!)
2552//                 / "\"   ; \ backslash (reverse solidus) U+005C
2553//                 / (%s"u" hexchar) ;  uXXXX      U+XXXX
2554    rule escapable() -> char
2555        = "b" { '\x08' }
2556            / "f" { '\x0c' }
2557            / "n" { '\n' }
2558            / "r" { '\r' }
2559            / "t" { '\t' }
2560            / "/" { '/' }
2561            / "\\" { '\\' }
2562            / h:("u" h:hexchar() { h }) { h }
2563
2564// hexchar         = "{" (1*"0" [ hexscalar ] / hexscalar) "}"
2565//                 / non-surrogate
2566//                 / (high-surrogate "\" %s"u" low-surrogate)
2567    rule hexchar() -> char
2568        =
2569            "{" hex:$("0"+ hexscalar()? / hexscalar()) "}"
2570            {
2571                char::try_from(
2572                    u32::from_str_radix(hex, 16)
2573                        .expect("Syntax ensures this works")
2574                    )
2575                    .expect("Syntax rules out surrogate sequences and numbers beyond Unicode specification")
2576            }
2577            / hex:$(non_surrogate())
2578            {
2579                char::try_from(
2580                    u32::from(
2581                        u16::from_str_radix(hex, 16)
2582                            .expect("Syntax ensures this works")
2583                        )
2584                    )
2585                    .expect("Syntax rules out surrogate sequences and numbers beyond Unicode specification")
2586            }
2587            / hl:(h:$(high_surrogate()) "\\" "u" l:$(low_surrogate()) { format!("{h}{l}") /* conveniently, syntax ensures it's always 4 nibbles */ } )
2588            {
2589                encoding_rs::UTF_16BE.decode(
2590                    &u32::from_str_radix(&hl, 16)
2591                        .expect("Syntax ensures this works")
2592                        .to_be_bytes()
2593                        // now it is UTF-16
2594                    )
2595                    .0
2596                    .chars()
2597                    .next()
2598                    .expect("Syntax ensures this produces exactly one valid character")
2599            }
2600// non-surrogate   = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG)
2601//                 / ("D" ODIGIT 2HEXDIG )
2602    rule non_surrogate()
2603        = ((DIGIT() / "A"/"B"/"C" / "E"/"F" / "a"/"b"/"c" / "e"/"f") HEXDIG()*<3,3>)
2604                / (("D" / "d") ODIGIT() HEXDIG()*<2,2> )
2605// high-surrogate  = "D" ("8"/"9"/"A"/"B") 2HEXDIG
2606    rule high_surrogate()
2607        = ("D" / "d") ("8"/"9"/"A"/"B"/"a"/"b") HEXDIG()*<2,2>
2608// low-surrogate   = "D" ("C"/"D"/"E"/"F") 2HEXDIG
2609    rule low_surrogate()
2610        = ("D" / "d") ("C"/"D"/"E"/"F" / "c"/"d"/"e"/"f") HEXDIG()*<2,2>
2611// hexscalar       = "10" 4HEXDIG / HEXDIG1 4HEXDIG
2612//                 / non-surrogate / 1*3HEXDIG
2613    rule hexscalar()
2614        = "10" HEXDIG()*<4,4> / HEXDIG1() HEXDIG()*<4,4> / non_surrogate() / HEXDIG()*<1,3>
2615
2616// ; Note that no other C0 characters are allowed, including %x09 HT
2617// unescaped       = %x0A ; new line
2618//                 / %x0D ; carriage return -- ignored on input
2619//                 / %x20-21
2620//                      ; omit 0x22 "
2621//                 / %x23-26
2622//                      ; omit 0x27 '
2623//                 / %x28-5B
2624//                      ; omit 0x5C \
2625//                 / %x5D-D7FF ; skip surrogate code points
2626//                 / %xE000-10FFFF
2627    // Returning an option to express that the carriage return is ignored
2628    rule unescaped() -> Option<char> = "\r" { None } / good:[ '\x0a' | '\x0D' | '\x20'..='\x21' | '\x23'..='\x26' | '\x28'..='\x5b' | '\x5d'..='\u{d7ff}' | '\u{e000}'..='\u{10ffff}' ] { Some(good) }
2629
2630// DQUOTE          = %x22    ; " double quote
2631    rule DQUOTE() = "\""
2632// SQUOTE          = "'"     ; ' single quote
2633    rule SQUOTE() = "'"
2634
2635// DIGIT           = %x30-39 ; 0-9
2636// DIGIT1          = %x31-39 ; 1-9
2637// ODIGIT          = %x30-37 ; 0-7
2638// BDIGIT          = %x30-31 ; 0-1
2639// HEXDIG          = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
2640// HEXDIG1         = DIGIT1 / "A" / "B" / "C" / "D" / "E" / "F"
2641    rule DIGIT() = quiet!{['0'..='9']} / expected!("digits")
2642    rule DIGIT1() = quiet!{['1'..='9']} / expected!("digits excluding 0")
2643    rule ODIGIT() = ['0'..='7']
2644    rule BDIGIT() = ['0'..='1']
2645    rule HEXDIG() -> u8 = n:$(DIGIT() / ['A'..='F' | 'a'..='f']) { u8::from_str_radix(n, 16).expect("Syntax ensures this is OK") }
2646    rule HEXDIG1() = DIGIT1() / ['A'..='F' | 'a'..='f']
2647
2648// ; Note: double-quoted strings as in "A" are case-insensitive in ABNF
2649// lcalpha         = %x61-7A ; a-z
2650// lcalnum         = lcalpha / DIGIT
2651// ucalpha         = %x41-5A ; A-Z
2652// ucalnum         = ucalpha / DIGIT
2653// wordchar        = "_" / lcalnum / ucalpha ; [_a-z0-9A-Z]
2654    rule lcalpha() = ['a'..='z']
2655    rule lcalnum() = ['a'..='z'] / DIGIT()
2656    rule ucalpha() = ['A'..='Z']
2657    rule ucalnum() = ['A'..='Z'] / DIGIT()
2658    rule wordchar() = "_" / lcalnum() / ucalpha()
2659
2660// Not starting a new grammar for these: their names are unique enough, and they reuse many of the
2661// other definitions
2662
2663// app-string-h    = S *(HEXDIG S HEXDIG S / ellipsis S)
2664//                   ["#" *non-lf]
2665    pub rule app_string_h() -> Vec<u8> = S() byte:(high:HEXDIG() S() low:HEXDIG() S() { (high << 4) | low } / ellipsis() S() {? Err("Hex string was abbreviated") })*
2666        ("#" non_lf()*)?
2667        { byte }
2668
2669    /// Return both the value and slice matched by the rule.
2670    ///
2671    /// This is the canonical workaround to get both a slice and a value, as discussed in
2672    /// <https://github.com/kevinmehall/rust-peg/issues/377#issuecomment-2158664327>
2673    rule with_slice<T>(r: rule<T>) -> (T, &'input str)
2674        = value:&r() input:$(r()) { (value, input) }
2675}}