litext 0.2.0 - Docs.rs

//! # litext
//!
//! A procedural macro library for extracting string literal contents from tokens.
//! Built for proc-macro authors who need to pull string content from TokenStream input.
//!
//! ## Overview
//!
//! This library provides the `extract` function which takes a TokenStream representing
//! a string literal and returns the inner text content without the surrounding quotes.
//! It handles both regular string literals and raw string literals.
//!
//! This is a proc-macro helper library. It is designed for proc-macro authors who need
//! to extract string content from tokens during macro expansion. It is not intended for
//! use in regular Rust code.
//!
//! The implementation is zero dependency and contained in a single file.
//!
//! ## Usage
//!
//! Add to your proc-macro crate:
//!
//! ```toml
//! [dependencies]
//! litext = "0.1.0"
//! ```
//!
//! Use in your proc-macro. Pass the raw tokens to litext!:
//!
//! ```ignore
//! use litext::{litext, TokenStream};
//!
//! pub fn my_macro(input: TokenStream) -> TokenStream {
//!     let string_content = litext!(input);
//!     // ... use string_content
//! }
//! ```
//!
//! ## Features
//!
//! Extract supports the following string literal formats:
//!
//! - Regular double-quoted strings: `"hello world"`
//! - Raw strings with hashes: `r#"hello world"#`
//! - Raw strings with multiple hashes: `r##"hello # world"##`
//!
//! Extract returns errors for:
//!
//! - Empty input: "litext! requires a string literal"
//! - Multiple tokens: "litext! takes exactly one argument"
//! - Not a string literal: "expected a string literal, found ..."
//! - Byte strings: "litext! expected a string literal, not a byte string"
//!
//! ## Performance
//!
//! The extract function runs at compile time during macro expansion.
//! There is no runtime overhead in your final binary.
//!
//! ## Re-exports
//!
//! This crate re-exports `proc_macro::TokenStream`, `proc_macro::TokenTree`,
//! and `proc_macro::Literal` for convenience in proc-macro code.

extern crate proc_macro;
pub use proc_macro::{Literal, TokenStream, TokenTree};

/// The whole point.
///
/// Extracts the inner text content from a token stream representing a string literal.
///
/// This macro is the primary interface for extracting string content from tokens.
/// It takes a `TokenStream` that should represent a single string literal and
/// returns the text content without the surrounding quotes. If the input is invalid,
/// the macro triggers a compile error at the call site.
///
/// The macro wraps the [`extract`] function and provides a more convenient syntax
/// for proc-macro authors. It handles both regular string literals (`"hello"`) and
/// raw string literals (`r#"hello"#`).
///
/// # Arguments
///
/// The `$input:expr` argument must be a `TokenStream` containing exactly one token
/// that represents a string literal. This can be:
///
/// - A regular string: `"hello world"`
/// - A raw string: `r#"hello world"#`
/// - A raw string with multiple hashes: `r##"hello # world"##`
///
/// # Return Value
///
/// On success, the macro expands to a `String` containing the inner content of
/// the string literal, without quotes or raw string markers.
///
/// On failure, the macro expands to `return $err`, causing an early return from
/// the calling function with a `compile_error!` invocation. This is appropriate
/// when the macro is used within a function returning `TokenStream`.
///
/// # Errors
///
/// The macro produces compile-time errors for:
///
/// - Empty input: `litext! expected a string literal, got nothing`
/// - Multiple tokens: `litext! expected exactly one string literal`
/// - Not a string literal: `litext! expected a string literal, found ...`
/// - Byte strings: `litext: expected a string literal, not a byte string`
///
/// # Examples
///
/// Basic usage in a proc-macro function:
///
/// ```ignore
/// use litext::{litext, TokenStream};
///
/// pub fn my_macro(input: TokenStream) -> TokenStream {
///     let content = litext!(input);
///     // content is a String containing the string content
///     // ... use content
/// }
/// ```
///
/// The input TokenStream is typically passed from the macro function's input parameter,
/// which contains the tokens parsed from the macro invocation.
///
/// ```ignore
/// use litext::{litext, TokenStream};
///
/// // Given input: my_macro!("hello")
/// pub fn my_macro(input: TokenStream) -> TokenStream {
///     // input contains the tokens for "hello"
///     let content = litext!(input);
///     // content is "hello"
/// }
/// ```
#[macro_export]
macro_rules! litext {
    ($input:expr) => {
        match $crate::extract($input) {
            Ok(s) => s,
            Err(e) => return e,
        }
    };
}

/// Extracts the inner text content from a token stream representing a string literal.
///
/// This function is the core implementation behind the litext! macro. It takes a TokenStream
/// that should represent a single token and processes it to extract the string content.
/// The function handles the various forms of string literals that Rust supports and
/// validates that the input is indeed a string literal.
///
/// The function performs several validation steps. First, it checks that there is
/// exactly one token in the input stream. Then it examines that token to determine
/// if it is a valid string literal. If the token is a Literal, it attempts to parse
/// it as a string. If the token is an Ident, Punct, or Group, it returns an appropriate
/// error message indicating that a string literal was expected.
///
/// This function is public and can be used directly if you need to process TokenStream
/// values from other proc_macro code. However, most users will prefer to use the
/// litext! macro which provides a more convenient interface.
///
/// # Arguments
///
/// The input parameter should be a TokenStream containing exactly one token.
/// This token is expected to be a string literal, either a regular string or a raw string.
///
/// # Return Value
///
/// On success, returns Ok containing a String with the inner text content of the string
/// literal, without the surrounding quotes or the raw string markers.
///
/// On failure, returns Err containing a TokenStream that will cause a compile error
/// when expanded. The error TokenStream contains a compile_error! macro invocation
/// with a descriptive message about what went wrong.
///
/// # Examples
///
/// Using the function directly with a proc_macro token stream:
///
/// ```ignore
/// use proc_macro::TokenStream;
/// use litext::extract;
///
/// let input: TokenStream = "Hello".parse().unwrap();
/// let result = extract(input);
/// assert!(result.is_err()); // "Hello" is not a string literal
/// ```
///
/// The function is primarily used internally by the litext! macro, but the public
/// API allows for more advanced proc_macro development if needed.
pub fn extract(input: TokenStream) -> Result<String, TokenStream> {
    let mut iter = input.into_iter();

    let token = match iter.next() {
        Some(t) => t,
        None => return Err(err("litext! expected a string literal, got nothing")),
    };

    if iter.next().is_some() {
        return Err(err("litext! expected exactly one string literal"));
    }

    match token {
        TokenTree::Literal(lit) => parse_lit(lit),
        TokenTree::Ident(_) => Err(err("litext! expected a string literal, found identifier")),
        TokenTree::Punct(_) => Err(err("litext! expected a string literal, found punctuation")),
        TokenTree::Group(_) => Err(err("litext! expected a string literal, found group")),
    }
}

/// Parses a Literal token to extract the string content from a string literal.
///
/// This function takes a proc_macro Literal token that should represent a string literal
/// and attempts to extract its inner content. It handles both regular double-quoted strings
/// and raw strings using the r#...# syntax. The function validates that the literal is actually
/// a string literal and not a byte string or other type of literal.
///
/// The parsing logic first checks if the literal is a raw string by looking for the 'r' prefix.
/// If it is a raw string, it uses the parse_raw helper function to extract the content.
/// If it is not a raw string but does start and end with double quotes, it extracts the
/// content between the quotes. Any other format results in an error.
///
/// One important aspect of this function is that it explicitly rejects byte strings (those
/// prefixed with b) and raw byte strings (prefixed with br). This is because the goal of
/// litext is to work with text strings, not byte sequences. Byte strings would produce
/// different output that might confuse users who expect text content.
///
/// # Arguments
///
/// The lit parameter is a proc_macro Literal token. This token is expected to represent
/// some kind of literal value in Rust source code, but the function will verify that
/// it is specifically a string literal before processing.
///
/// # Return Value
///
/// On success, returns Ok containing a String with the content of the string literal
/// without the surrounding quotation marks. The string is owned and can be used as
/// needed by the caller.
///
/// On failure, returns Err containing a TokenStream that will trigger a compile-time
/// error when expanded. The error message will indicate what type of literal was
/// provided instead of a string literal.
///
/// # Implementation Notes
///
/// The function uses string manipulation rather than the Literal API's typed methods
/// because proc_macro's Literal type does not provide direct access to the string
/// content in a convenient form. By converting to a string and parsing it manually,
/// we gain full control over the extraction process and can handle all the edge cases
/// that might occur with various string formats.
fn parse_lit(lit: proc_macro::Literal) -> Result<String, TokenStream> {
    let raw = lit.to_string();

    if raw.starts_with('b') && raw.len() > 1 {
        let c = raw.chars().nth(1).unwrap();
        if c == '"' || c == 'r' {
            return Err(err("litext: expected a string literal, not a byte string"));
        }
    }

    if raw.starts_with('r') {
        return parse_raw(&raw).ok_or_else(|| err("litext! malformed raw string literal"));
    }

    if raw.starts_with('"') && raw.ends_with('"') && raw.len() >= 2 {
        return unescape(&raw[1..raw.len() - 1]);
    }

    Err(err("litext! expected a string literal"))
}

/// Parses a raw string literal format to extract the inner content.
///
/// Raw strings in Rust use the syntax r#"content"# where the number of hash marks
/// can be increased to allow the content to contain quote characters without escaping.
/// This function handles all variations of raw string syntax, from r"..." to
/// r##"..."## and beyond.
///
/// The function works by first identifying how many hash marks precede the opening
/// quote, then stripping those hash marks along with the quotes themselves from both
/// ends of the string, and finally returning the content that lies between them.
///
/// The implementation uses Rust's strip_prefix and strip_suffix methods which return
/// Option types, allowing the function to elegantly handle the case where the string
/// does not match the expected format by returning None. This makes the function
/// pure and composable with the rest of the parsing logic.
///
/// # Arguments
///
/// The raw parameter is a string representation of what appears to be a raw string
/// literal. It should not include the surrounding r# syntax - it should be the raw
/// string as it would appear after tokenization, typically starting with 'r' and
/// containing hash characters.
///
/// # Return Value
///
/// If the string is a valid raw string format, returns Some containing the content
/// between the quotes and hash marks. If the string does not match the expected raw
/// string format, returns None.
///
/// The function is infallible in that it always either returns Some with the content
/// or None if the format is invalid. There are no error conditions that would require
/// an error message.
fn parse_raw(raw: &str) -> Option<String> {
    let rest = raw.strip_prefix('r')?;
    let hashes = rest.chars().take_while(|c| *c == '#').count();
    let hash_str = "#".repeat(hashes);
    let inner = rest
        .strip_prefix(&hash_str)?
        .strip_prefix('"')?
        .strip_suffix('"')?
        .strip_suffix(&hash_str)?;
    Some(inner.to_string())
}

/// Unescapes a string by processing escape sequences.
///
/// This function converts escape sequences in a string literal to their actual
/// character values. It handles common escape sequences like `\n`, `\r`, `\t`,
/// `\\`, `\"`, and `\0`, as well as hex escapes (`\xNN`) and unicode escapes
/// (`\u{NNNN}`).
///
/// The function processes the input string character by character. When a backslash
/// is encountered, it looks at the next character to determine which escape sequence
/// to process. Invalid escape sequences result in compile-time errors.
///
/// # Arguments
///
/// The `s` parameter is a string slice containing escape sequences to process.
/// Common escape sequences include:
/// - `\n` - newline
/// - `\r` - carriage return
/// - `\t` - tab
/// - `\\` - backslash
/// - `\"` - double quote
/// - `\0` - null character
///
/// Hex escapes: `\xNN` where NN is a two-digit hex number
/// Unicode escapes: `\u{NNNN}` where NNNN is a unicode codepoint
///
/// # Return Value
///
/// On success, returns `Ok` containing a new String with all escape sequences resolved
/// to their actual character values.
///
/// On failure, returns `Err` containing a TokenStream that will cause a compile
/// error when expanded. The error message indicates which escape sequence was invalid.
///
/// # Examples
///
/// ```
/// let result = unescape("hello\\nworld").unwrap();
/// assert_eq!(result, "hello\nworld");
/// ```
///
/// With unicode:
///
/// ```
/// let result = unescape("hello\\u{1F980}world").unwrap();
/// assert_eq!(result, "hello🦀world");
/// ```
fn unescape(s: &str) -> Result<String, TokenStream> {
    let mut output = String::with_capacity(s.len());
    let mut chars = s.chars();

    while let Some(c) = chars.next() {
        if c != '\\' {
            output.push(c);
            continue;
        }

        match chars.next() {
            Some('n') => output.push('\n'),
            Some('r') => output.push('\r'),
            Some('t') => output.push('\t'),
            Some('\\') => output.push('\\'),
            Some('"') => output.push('"'),
            Some('0') => output.push('\0'),

            Some('x') => {
                let h1 = chars
                    .next()
                    .ok_or_else(|| err("litext: invalid \\x escape"))?;
                let h2 = chars
                    .next()
                    .ok_or_else(|| err("litext: invalid \\x escape"))?;
                let hex = format!("{}{}", h1, h2);
                let byte =
                    u8::from_str_radix(&hex, 16).map_err(|_| err("litext: invalid \\x escape"))?;
                if byte > 0x7F {
                    return Err(err("litext: \\x escape must be in range 0x00..=0x7F"));
                }
                output.push(byte as char);
            }

            Some('u') => {
                match chars.next() {
                    Some('{') => {}
                    _ => return Err(err("litext: invalid \\u escape, expected '{'")),
                }
                let mut hex = String::new();
                loop {
                    match chars.next() {
                        Some('}') => break,
                        Some(c) => hex.push(c),
                        None => return Err(err("litext: unterminated \\u escape")),
                    }
                }
                let codepoint = u32::from_str_radix(&hex, 16)
                    .map_err(|_| err("litext: invalid \\u codepoint"))?;
                let ch = char::from_u32(codepoint)
                    .ok_or_else(|| err("litext: invalid unicode codepoint"))?;
                output.push(ch);
            }

            Some('\n') => {
                while let Some(&c) = chars.as_str().chars().next().as_ref() {
                    if c.is_whitespace() {
                        chars.next();
                    } else {
                        break;
                    }
                }
            }

            _ => return Err(err("litext: invalid escape sequence")),
        }
    }

    Ok(output)
}

/// Creates a compile-time error token stream for reporting errors in the macro.
///
/// This helper function constructs a TokenStream that, when expanded, will cause
/// a compile error with the specified message. It wraps the message in a compile_error!
/// macro invocation, which is a special macro in Rust that halts compilation and
/// displays the provided message.
///
/// The function takes a message string and converts it into a format suitable for
/// embedding in a compile_error! macro call. This involves converting the message
/// to a string literal format that the compile_error! macro can accept. The format!
/// macro call with {:?} ensures that the message is properly escaped and quoted as
/// a string literal within the macro invocation.
///
/// This function is used throughout the library whenever validation fails and a
/// meaningful error message needs to be returned. Rather than returning error
/// information as data, returning a TokenStream that will cause a compile error
/// is more idiomatic for proc_macro crates because it provides immediate feedback
/// to the developer at compile time.
///
/// # Arguments
///
/// The msg parameter is a string slice containing the error message that should
/// be displayed to the developer. This message should be a clear description of
/// what went wrong and ideally how to fix it. Messages are kept concise but
/// informative.
///
/// # Return Value
///
/// Returns a TokenStream containing a compile_error! macro invocation with the
/// provided message. This TokenStream can be returned from the extract function
/// or used in any context where an error TokenStream is expected.
///
/// # Implementation Notes
///
/// The function uses .parse().unwrap() to convert the formatted string into a
/// TokenStream. This should not panic in practice because the format string
/// produced by format!("compile_error!({:?})", msg) is always valid Rust syntax
/// that can be parsed into a TokenStream. The compile_error! macro itself will
/// be invoked during expansion, producing the actual error message.
fn err(msg: &str) -> TokenStream {
    format!("compile_error!({:?})", msg).parse().unwrap()
}