cri-ref 0.0.2

Embedded-friendly equivalents of URIs
Documentation
//! Traits to access a CRI (or CRI reference) using accessor methods.
//!
//! While a CRI *could* be implemented with type state (where a full CRI has a .pull() that gives a
//! scheme and a tail), this makes needlessly complex and verbose code.
//!
//! The hope with the sketched API is that when used on, say, a CRI that's a plain memory slice (or
//! pointer to a known-wellformed CRI, as they are self-delimiting), for which calling `.path()`
//! would incur a call to `.host_and_pathindex()` and in turn to `.scheme_and_hostindex()`, that a
//! program that calls `.scheme()`, `.host()` and `.path()` in succession would be understood by the
//! compiler to remove the costly nested calls and use the already-available results. Analysis of a
//! compiled program will be required to see whether that is actually done.
//!
//! This may need some help from `const`ification and `#[inline]` annotations.
//!
//! Compared to a type-stated version, this is easier to use when not trying to squeeze the last bit
//! of performance, but may be harder to use when squeezing (for there might be some manual labor
//! involved to ensure that, for example, noting too register heavy is done between iterating over
//! the path and asking for the query).

use crate::traits;
use crate::characterclasses::{PATH_UE, QUERY_UE, FRAGMENT_UE, HOST_UE};

mod resolved;
pub use resolved::RuntimeResolved;

/// The different values the `discard` component of a CRI reference can have
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum Discard {
    /// Discard all existing path components,
    ///
    /// Corresponds to `true` in the serialization.
    All,
    /// Discard as many path components as indicated in the argument.
    Some(u8),
}

/// Commonalities between CriRef and Cri
///
/// This interface comes with no error handling, as it assumes that the underlying object *is* a
/// CRI. When processing a received unchecked CRI, see [AllegedCri] for an example of how to handle
/// that.
///
/// ## Invariants
///
/// The accessor based interface shown here does not inherently map some of the invariants that the
/// model is built on, and that are enforced by the CDDL syntax. All implementations are expected
/// to adhere to them, but (being safe to implenent), users can not declare code unreachable based
/// on them, and are expected to either panic if they run into unexpected situations. (Most users
/// will likely not observe the malbehavior, as for example they won't even query the port once
/// they find no host). Likewise, implementations may panic if components are accessed that can not
/// be present as per other components' output.
///
/// For implementers:
///
/// * The host (and port) is only accessed if it is indicated that they are available.
///
/// For users:
///
/// * There can only be Some port if there is a host.
pub trait CriBase {
    type Scheme<'a>: traits::Scheme where Self: 'a;
    type Host<'a>: traits::Host where Self: 'a;

    type PathItem<'a>: traits::TextOrPet<PATH_UE> where Self: 'a;
    type QueryItem<'a>: traits::TextOrPet<QUERY_UE> where Self: 'a;
    type FragmentItem<'a>: traits::TextOrPet<FRAGMENT_UE> where Self: 'a;
    type UserInfoItem<'a>: traits::TextOrPet<HOST_UE> where Self: 'a;

    type PathIter<'a>: Iterator<Item=Self::PathItem<'a>> + ExactSizeIterator where Self: 'a;
    type QueryIter<'a>: Iterator<Item=Self::QueryItem<'a>> where Self: 'a;

    fn path(&self) -> Self::PathIter<'_>;
    fn query(&self) -> Self::QueryIter<'_>;
    fn fragment(&self) -> Option<Self::FragmentItem<'_>>;

    fn userinfo(&self) -> Option<Self::UserInfoItem<'_>>;
    fn host(&self) -> Self::Host<'_>;
    fn port(&self) -> Option<u16>;
}

/// A CRI reference (which may be full or relative)
///
/// This interface comes with no error handling, as it assumes that the underlying object *is* a
/// CRI. When processing a received unchecked CRI, see [AllegedCri] for an example of how to handle
/// that.
///
/// Some CriRef implementations may inherently not support a scheme or a host -- for example, the
/// relative CriRef implied by CoAP Location-* options can only ever suppress CRI references with
/// discard=True and no scheme or host. Such implementations would use the Never (`!`) type for
/// Scheme and Host.
///
/// ## Invariants
///
/// Building on CriBase, a CriRef has an additional invariant not expressed in its interface:
///
/// For users:
///
/// * If [Discard] is not [Discard::All], then scheme, host and port are expected to be None.
pub trait CriRef: CriBase {
    fn discard(&self) -> Discard;
    /// The scheme of the CRI reference, if one is set.
    fn scheme(&self) -> Option<Self::Scheme<'_>>;
    /// The type of authority of the CRI reference
    ///
    /// This may be absent if the scheme is absent (indicating that the base URI's authority is
    /// left as is).
    fn authority(&self) -> Option<traits::Authority>;

    /// Attempt to express the CRI into something that will probably pass for a URI reference.
    ///
    /// Two forms of CRI referencees are inexpressible as URI references:
    ///
    /// * discard=0 but path present: These append a path segment (eg. http://example.com/foo +
    ///   append bar = http://example.com/foo/bar)
    ///
    ///   In this case, the "→/" character sequence (which is not a valid URI) is produced.
    fn format_uri_ref_like(&self, w: &mut impl core::fmt::Write) -> core::fmt::Result {
        use traits::*;

        #[derive(PartialEq, Debug)]
        enum Pending {
            // Path elements can just be produced like that
            Not,
            // A →/ is pending (but not necessary to emit if there are no path components)
            DiscardZero,
            // A slash is probably pending...
            InsidePath,
            BecauseOfAuthority,
            BecauseOfScheme, // but only if an authority is set
            BecauseOfDiscardAll, // but that's expressed a bit differently as we can't know what it's resolved against
        }
        use Pending::*;
        let mut separator_slash_pending = Not;

        match self.discard() {
            Discard::Some(0) => {
                // Can't act immediately -- may either be OK to continue with '?' / '#', or cause
                // an error.
                separator_slash_pending = DiscardZero;
            }
            Discard::Some(1) => {
                // This is not the sharpest criterion for the necessity of a `./`, but it is
                // sufficient, and especially it is consistent with the test vectors.
                if self.path().next().map(|p| p.contains_unescaped(':')) == Some(true) {
                    write!(w, "./")?;
                }
            }
            Discard::Some(n) => {
                for _ in 1..n {
                    write!(w, "../")?;
                }
            }
            Discard::All => {
                // Can't act immediately -- a '/' needs to be emitted before the path unless
                // scheme-and-no-host, but we don't know that yet.
                separator_slash_pending = BecauseOfDiscardAll;
            }
        }

        if let Some(scheme) = self.scheme() {
            write!(w, "{}:", scheme.to_text_scheme())?;
            separator_slash_pending = BecauseOfScheme;
        }

        match self.authority() {
            Some(Authority::HostPort) => {
                write!(w, "//")?;
                if let Some(a) = self.userinfo() {
                    write!(w, "{}@", a.to_uri_component())?;
                }
                self.host().format_uri_host(w)?;
                if let Some(port) = self.port() {
                    write!(w, ":{}", port)?;
                }
                separator_slash_pending = BecauseOfAuthority;
            }
            Some(Authority::NoAuthoritySlashStart) => {
                separator_slash_pending = BecauseOfScheme; // FIXME: Or BecauseofAutority? Might make a difference when the path list is empty.
            }
            Some(Authority::NoAuthoritySlashless) => {
                separator_slash_pending = Not;
            }
            None => {
                assert!(separator_slash_pending != BecauseOfScheme, "If a scheme was given, there needs to be some authority");
            }
        }
        for p in self.path() {
            match separator_slash_pending {
                Not => (),
                DiscardZero => write!(w, "→/")?,
                BecauseOfDiscardAll | BecauseOfAuthority | BecauseOfScheme | InsidePath => write!(w, "/")?,
            };
            separator_slash_pending = InsidePath;
            write!(w, "{}", p.to_uri_component())?;
        }
        let mut is_first_query = true;
        for q in self.query() {
            write!(w, "{}{}", if is_first_query { "?" } else { "&" }, q.to_uri_component())?;
            is_first_query = false;
        }
        if let Some(f) = self.fragment() {
            write!(w, "#{}", f.to_uri_component())?;
        }
        Ok(())
    }

    fn render_uri_ref_like(&self) -> String {
        let mut s = String::new();
        self.format_uri_ref_like(&mut s).expect("Strings accept all writes");
        s
    }
}

/// A Cri (a full one, not a CRI reference)
///
/// Unlike a CriRef, this has a scheme unconditionally, and no applicable value for discard.
pub trait Cri: CriBase {
    /// The scheme of the CRI
    ///
    /// This is always present in a (full) CRI.
    fn scheme(&self) -> Self::Scheme<'_>;
    /// The type of authority of the CRI
    ///
    /// There is always a value for this in a (full) CRI, although some variants indicate that
    /// actually there is no authority is present.
    fn authority(&self) -> traits::Authority;

    /// Write the corresponding URI into a writer
    fn format_uri(&self, w: &mut impl core::fmt::Write) -> core::fmt::Result {
        use traits::*;

        write!(w, "{}:", self.scheme().to_text_scheme())?;
        let mut separator_slash_pending;
        match self.authority() {
            Authority::HostPort => {
                write!(w, "//")?;
                if let Some(userinfo) = self.userinfo() {
                    write!(w, "{}@", userinfo.to_uri_component());
                }
                self.host().format_uri_host(w)?;
                if let Some(port) = self.port() {
                    write!(w, ":{}", port)?;
                }
                separator_slash_pending = true;
            }
            Authority::NoAuthoritySlashStart => {
                separator_slash_pending = true;
            }
            Authority::NoAuthoritySlashless => {
                separator_slash_pending = false;
            }
        }
        for p in self.path() {
            if separator_slash_pending {
                write!(w, "/")?;
            }
            separator_slash_pending = true;
            write!(w, "{}", p.to_uri_component())?;
        }
        let mut is_first_query = true;
        for q in self.query() {
            write!(w, "{}{}", if is_first_query { "?" } else { "&" }, q.to_uri_component())?;
            is_first_query = false;
        }
        if let Some(f) = self.fragment() {
            write!(w, "#{}", f.to_uri_component())?;
        }
        Ok(())
    }

    /// Write the corresponding URI into a string
    fn render_uri(&self) -> String {
        let mut s = String::new();
        self.format_uri(&mut s).expect("Strings accept all writes");
        s
    }

    /// Resolve a reference against this base
    ///
    /// This produces a runtime resolution -- containing just the pointers, and determining any
    /// attributes at runtime.
    ///
    /// Concrete types might get more optimized versions of this; in particular, some might modify
    /// a base in place when following a link.
    fn resolve<R: CriRef>(&self, reference: R) -> RuntimeResolved<Self, R> {
        RuntimeResolved { base: self, reference }
    }

    /// Compare for equality
    ///
    /// This returns true if two CRIs are sure to be equal in the CRI normalization model, and is
    /// equivalent to URI equality after syntax based normalization. No scheme based normalization
    /// is performed. It relies on some of the CRI well-formedness requirements to be met (no PET
    /// on characters that don't need it, no initial empty path segments on NoAuthoritySlashless
    /// URNs).
    fn equals(&self, other: &impl Cri) -> bool {
        use crate::traits::{TextOrPet, Scheme, Host};

        fn equal_option_pet<const U: crate::characterclasses::AsciiSet>(a: Option<impl TextOrPet<U>>, b: Option<impl TextOrPet<U>>) -> bool{
            // FIXME that's way too verbose for what should be `!=` but can't be because we can't
            // blanket implement PartialEq over a trait. Maybe encapsulate return values in an own
            // type rather than an Option?
            match (a, b) {
                (None, None) => true,
                (Some(_), None) | (None, Some(_)) => false,
                (Some(s), Some(o)) => s.equals(&o),
            }
        }

        if !self.scheme().equals(other.scheme()) || self.authority() != other.authority() {
            return false;
        }
        if self.authority() == traits::Authority::HostPort {
            if !equal_option_pet(self.userinfo(), other.userinfo()) || !self.host().equals(&other.host()) || self.port() != other.port() {
                return false;
            }
        }
        if !TextOrPet::iter_equals(self.path(), other.path()) || !TextOrPet::iter_equals(self.query(), other.query()) || !equal_option_pet(self.fragment(), other.fragment()) {
            return false;
        }
        return true;
    }
}

// /// A (full, i.e. non-relative) CRI of whose wellformedness is not determined from the start
// ///
// /// An AllegedCri can be consumed as a CRI in the typical scheme / host_port / path / query /
// /// fragment fashion, and will not err out in the course of that (the [Cri] trait not providing
// /// fallible access). The data produced will represent a CRI (as per the trait's type constraints),
// /// but if the CRI provided at construction is not well-formed, it will be arbitrary.
// ///
// /// After 
// pub struct AllegedCri<'a> {
//     data: &'a[u8],
//     // To be tested: We *could* place the already obtained cursor positions in memo fields here.
//     // That probably does not actually force the compiler to keep it (as long as all use of a local
//     // struct is inlined, I wouldn't know what forces the compiler to have this on the stack in
//     // full). Both versions rely on some advanced optimizations in the compiler -- if the memos are
//     // in the struct, it'll need to inspect the program flow and see that they are only accessed in
//     // sequence and thus that it suffices to keep the currently used one in a register or on the
//     // stack. If there are no memos, the compiler needs to see instead that there is an expensive
//     // computation done twice over the immutable struct, and that the result can be stored
//     // inbetween.
// 
//     // Under the same assumptions, even this could be calculated on demand.
//     verdict: core::cell::Cell<Verdict>,
// }
// 
// impl<'a> AllegedCri<'a> {
//     fn new(data: &'a [u8]) -> Self {
//         AllegedCri {
//             data,
//             verdict: core::cell::Cell::new(Verdict::Undecided)
//         }
//     }
// 
//     pub fn wellformed_so_far(&self) -> bool {
//         self.verdict.get() != Verdict::Erroneous
//     }
// 
//     pub fn wellformed(&self) -> bool {
//         let _ = self.fragment();
//         self.wellformed_so_far()
//     }
// }
// 
// #[derive(PartialEq)]
// enum Verdict {
//     Undecided,
//     OK,
//     Erroneous,
// }