Skip to main content

litext/
lib.rs

1//! # litext
2//!
3//! A procedural macro library for extracting string literal contents from tokens.
4//! Built for proc-macro authors who need to pull string content from TokenStream input.
5//!
6//! ## Overview
7//!
8//! This library provides the [`extract`] and [`extract_litstr`] functions, as well
9//! as the [`litext!`] macro which is the primary interface for most use cases.
10//! It handles both regular string literals and raw string literals.
11//!
12//! This is a proc-macro helper library. It is designed for proc-macro authors who need
13//! to extract string content from tokens during macro expansion. It is not intended for
14//! use in regular Rust code.
15//!
16//! The implementation is zero dependency and contained in a single file.
17//!
18//! ## Usage
19//!
20//! Use in your proc-macro. Pass the raw tokens to `litext!`:
21//!
22//! ```ignore
23//! use litext::{litext, TokenStream};
24//!
25//! pub fn my_macro(input: TokenStream) -> TokenStream {
26//!     let string_content = litext!(input); // or litext!(input as String); to be explicit
27//!     // ... use string_content
28//! }
29//! ```
30//!
31//! To also capture the source span of the literal, use the `as LitStr` form:
32//!
33//! ```ignore
34//! use litext::{litext, TokenStream};
35//!
36//! pub fn my_macro(input: TokenStream) -> TokenStream {
37//!     let lit = litext!(input as LitStr);
38//!     // lit.value() for the string, lit.span() for the source location
39//! }
40//! ```
41//!
42//! ## Features
43//!
44//! The following string literal formats are supported:
45//!
46//! - Regular double-quoted strings: `"hello world"`
47//! - Raw strings with hashes: `r#"hello world"#`
48//! - Raw strings with multiple hashes: `r##"hello # world"##`
49//!
50//! Errors are returned for:
51//!
52//! - Empty input: `litext! expected a string literal, got nothing`
53//! - Multiple tokens: `litext! expected exactly one string literal`
54//! - Not a string literal: `litext! expected a string literal, found ...`
55//! - Byte strings: `litext: expected a string literal, not a byte string`
56//!
57//! ## Performance
58//!
59//! The macro and extract functions run at compile time during macro expansion.
60//! There is no runtime overhead in your final binary.
61//!
62//! ## Re-exports
63//!
64//! This crate re-exports `proc_macro::TokenStream`, `proc_macro::TokenTree`,
65//! `proc_macro::Literal`, `proc_macro::Span`, and [`LitStr`] for convenience in proc-macro code.
66
67#![warn(missing_docs)]
68
69extern crate proc_macro;
70pub use proc_macro::{Literal, Span, TokenStream, TokenTree};
71
72mod litstr;
73pub use litstr::*;
74
75/// The whole point.
76///
77/// Extracts the inner text content from a token stream representing a string literal.
78///
79/// This macro is the primary interface for extracting string content from tokens.
80/// It takes a `TokenStream` that should represent a single string literal and
81/// returns either a [`String`] or a [`LitStr`] depending on the form used.
82/// If the input is invalid, the macro triggers a compile error at the call site.
83///
84/// # Forms
85///
86/// ## `litext!(input)` and `litext!(input as String)`
87///
88/// Both forms are equivalent. They extract the string content and return a
89/// plain [`String`]. The `as String` form is purely cosmetic, it can be useful
90/// for readability when it isn't immediately obvious what the macro returns.
91///
92/// ## `litext!(input as LitStr)`
93///
94/// Returns a [`LitStr`] bundling the extracted string value with its source
95/// [`Span`]. Use this form when you need to emit diagnostics that point back
96/// at the exact location of the literal in the macro input.
97///
98/// # Arguments
99///
100/// The `input` argument must be an identifier referring to a `TokenStream`
101/// containing exactly one token that represents a string literal. This can be:
102///
103/// - A regular string: `"hello world"`
104/// - A raw string: `r#"hello world"#`
105/// - A raw string with multiple hashes: `r##"hello # world"##`
106///
107/// # Errors
108///
109/// Both forms produce compile-time errors for:
110///
111/// - Empty input: `litext! expected a string literal, got nothing`
112/// - Multiple tokens: `litext! expected exactly one string literal`
113/// - Not a string literal: `litext! expected a string literal, found ...`
114/// - Byte strings: `litext: expected a string literal, not a byte string`
115///
116/// # Examples
117///
118/// Basic usage, returning a `String`:
119///
120/// ```ignore
121/// use litext::{litext, TokenStream};
122///
123/// pub fn my_macro(input: TokenStream) -> TokenStream {
124///     let content = litext!(input);
125///     // or equivalently:
126///     let content = litext!(input as String);
127/// }
128/// ```
129///
130/// Using `as LitStr` to retain span information for diagnostics:
131///
132/// ```ignore
133/// use litext::{litext, TokenStream};
134///
135/// pub fn my_macro(input: TokenStream) -> TokenStream {
136///     let lit = litext!(input as LitStr);
137///     let value = lit.value();
138///     let span = lit.span();
139/// }
140/// ```
141#[macro_export]
142macro_rules! litext {
143    ($input:ident $(as String)?) => {
144        match $crate::extract($input) {
145            Ok(s) => s,
146            Err(e) => return e,
147        }
148    };
149
150    ($input:ident as LitStr) => {
151        match $crate::extract_litstr($input) {
152            Ok(lit) => lit,
153            Err(e) => return e,
154        }
155    };
156}
157
158/// Extracts the inner text content from a token stream representing a string literal.
159///
160/// This function is the core implementation behind the litext! macro. It takes a TokenStream
161/// that should represent a single token and processes it to extract the string content.
162/// The function handles the various forms of string literals that Rust supports and
163/// validates that the input is indeed a string literal.
164///
165/// The function performs several validation steps. First, it checks that there is
166/// exactly one token in the input stream. Then it examines that token to determine
167/// if it is a valid string literal. If the token is a Literal, it attempts to parse
168/// it as a string. If the token is an Ident, Punct, or Group, it returns an appropriate
169/// error message indicating that a string literal was expected.
170///
171/// This function is public and can be used directly if you need to process TokenStream
172/// values from other proc_macro code. However, most users will prefer to use the
173/// litext! macro which provides a more convenient interface.
174///
175/// # Arguments
176///
177/// The input parameter should be a TokenStream containing exactly one token.
178/// This token is expected to be a string literal, either a regular string or a raw string.
179///
180/// # Return Value
181///
182/// On success, returns Ok containing a String with the inner text content of the string
183/// literal, without the surrounding quotes or the raw string markers.
184///
185/// On failure, returns Err containing a TokenStream that will cause a compile error
186/// when expanded. The error TokenStream contains a compile_error! macro invocation
187/// with a descriptive message about what went wrong.
188///
189/// # Examples
190///
191/// Using the function directly with a proc_macro token stream:
192///
193/// ```ignore
194/// use proc_macro::TokenStream;
195/// use litext::extract;
196///
197/// let input: TokenStream = "Hello".parse().unwrap();
198/// let result = extract(input);
199/// assert!(result.is_err()); // "Hello" is not a string literal
200/// ```
201///
202/// The function is primarily used internally by the litext! macro, but the public
203/// API allows for more advanced proc_macro development if needed.
204pub fn extract(input: TokenStream) -> Result<String, TokenStream> {
205    let mut iter = input.into_iter();
206
207    let token = match iter.next() {
208        Some(t) => t,
209        None => return Err(err("litext! expected a string literal, got nothing")),
210    };
211
212    if iter.next().is_some() {
213        return Err(err("litext! expected exactly one string literal"));
214    }
215
216    match token {
217        TokenTree::Literal(lit) => parse_lit(lit),
218        TokenTree::Ident(_) => Err(err("litext! expected a string literal, found identifier")),
219        TokenTree::Punct(_) => Err(err("litext! expected a string literal, found punctuation")),
220        TokenTree::Group(_) => Err(err("litext! expected a string literal, found group")),
221    }
222}
223
224/// Extracts the inner text content from a token stream, returning a [`LitStr`]
225/// that bundles the string value with its source span.
226///
227/// This is the span-aware counterpart to [`extract`]. Use this when you need
228/// to emit diagnostics that point back at the original string literal in the
229/// macro input — for example, to report a custom error at the exact location
230/// of a malformed value.
231///
232/// # Arguments
233///
234/// The `input` parameter must be a `TokenStream` containing exactly one token
235/// representing a string literal (regular or raw). The same formats accepted
236/// by [`extract`] are accepted here.
237///
238/// # Return Value
239///
240/// On success, returns `Ok` containing a [`LitStr`] whose `value` field holds
241/// the unquoted string content and whose `span` field records where the literal
242/// appeared in the source.
243///
244/// On failure, returns `Err` containing a `TokenStream` that triggers a
245/// compile error, identical in behaviour to [`extract`].
246///
247/// # Example
248///
249/// ```ignore
250/// use litext::{extract_litstr, TokenStream};
251///
252/// pub fn my_macro(input: TokenStream) -> TokenStream {
253///     match extract_litstr(input) {
254///         Ok(lit) => { /* use lit.value() and lit.span() */ }
255///         Err(e) => return e,
256///     }
257/// }
258/// ```
259pub fn extract_litstr(input: TokenStream) -> Result<LitStr, TokenStream> {
260    let mut iter = input.into_iter();
261
262    let token = match iter.next() {
263        Some(t) => t,
264        None => return Err(err("litext! expected a string literal, got nothing")),
265    };
266
267    if iter.next().is_some() {
268        return Err(err("litext! expected exactly one string literal"));
269    }
270
271    match token {
272        TokenTree::Literal(lit) => {
273            let span = lit.span();
274            let value = parse_lit(lit)?;
275            Ok(LitStr::new(value, span))
276        }
277        TokenTree::Ident(_) => Err(err("litext! expected a string literal, found identifier")),
278        TokenTree::Punct(_) => Err(err("litext! expected a string literal, found punctuation")),
279        TokenTree::Group(_) => Err(err("litext! expected a string literal, found group")),
280    }
281}
282
283/// Parses a Literal token to extract the string content from a string literal.
284///
285/// This function takes a proc_macro Literal token that should represent a string literal
286/// and attempts to extract its inner content. It handles both regular double-quoted strings
287/// and raw strings using the r#...# syntax. The function validates that the literal is actually
288/// a string literal and not a byte string or other type of literal.
289///
290/// The parsing logic first checks if the literal is a raw string by looking for the 'r' prefix.
291/// If it is a raw string, it uses the parse_raw helper function to extract the content.
292/// If it is not a raw string but does start and end with double quotes, it extracts the
293/// content between the quotes. Any other format results in an error.
294///
295/// One important aspect of this function is that it explicitly rejects byte strings (those
296/// prefixed with b) and raw byte strings (prefixed with br). This is because the goal of
297/// litext is to work with text strings, not byte sequences. Byte strings would produce
298/// different output that might confuse users who expect text content.
299///
300/// # Arguments
301///
302/// The lit parameter is a proc_macro Literal token. This token is expected to represent
303/// some kind of literal value in Rust source code, but the function will verify that
304/// it is specifically a string literal before processing.
305///
306/// # Return Value
307///
308/// On success, returns Ok containing a String with the content of the string literal
309/// without the surrounding quotation marks. The string is owned and can be used as
310/// needed by the caller.
311///
312/// On failure, returns Err containing a TokenStream that will trigger a compile-time
313/// error when expanded. The error message will indicate what type of literal was
314/// provided instead of a string literal.
315///
316/// # Implementation Notes
317///
318/// The function uses string manipulation rather than the Literal API's typed methods
319/// because proc_macro's Literal type does not provide direct access to the string
320/// content in a convenient form. By converting to a string and parsing it manually,
321/// we gain full control over the extraction process and can handle all the edge cases
322/// that might occur with various string formats.
323fn parse_lit(lit: proc_macro::Literal) -> Result<String, TokenStream> {
324    let raw = lit.to_string();
325
326    if raw.starts_with('b') && raw.len() > 1 {
327        let c = raw.chars().nth(1).unwrap();
328        if c == '"' || c == 'r' {
329            return Err(err("litext: expected a string literal, not a byte string"));
330        }
331    }
332
333    if raw.starts_with('r') {
334        return parse_raw(&raw).ok_or_else(|| err("litext! malformed raw string literal"));
335    }
336
337    if raw.starts_with('"') && raw.ends_with('"') && raw.len() >= 2 {
338        return unescape(&raw[1..raw.len() - 1]);
339    }
340
341    Err(err("litext! expected a string literal"))
342}
343
344/// Parses a raw string literal format to extract the inner content.
345///
346/// Raw strings in Rust use the syntax r#"content"# where the number of hash marks
347/// can be increased to allow the content to contain quote characters without escaping.
348/// This function handles all variations of raw string syntax, from r"..." to
349/// r##"..."## and beyond.
350///
351/// The function works by first identifying how many hash marks precede the opening
352/// quote, then stripping those hash marks along with the quotes themselves from both
353/// ends of the string, and finally returning the content that lies between them.
354///
355/// The implementation uses Rust's strip_prefix and strip_suffix methods which return
356/// Option types, allowing the function to elegantly handle the case where the string
357/// does not match the expected format by returning None. This makes the function
358/// pure and composable with the rest of the parsing logic.
359///
360/// # Arguments
361///
362/// The raw parameter is a string representation of what appears to be a raw string
363/// literal. It should not include the surrounding r# syntax - it should be the raw
364/// string as it would appear after tokenization, typically starting with 'r' and
365/// containing hash characters.
366///
367/// # Return Value
368///
369/// If the string is a valid raw string format, returns Some containing the content
370/// between the quotes and hash marks. If the string does not match the expected raw
371/// string format, returns None.
372///
373/// The function is infallible in that it always either returns Some with the content
374/// or None if the format is invalid. There are no error conditions that would require
375/// an error message.
376fn parse_raw(raw: &str) -> Option<String> {
377    let rest = raw.strip_prefix('r')?;
378    let hashes = rest.chars().take_while(|c| *c == '#').count();
379    let hash_str = "#".repeat(hashes);
380    let inner = rest
381        .strip_prefix(&hash_str)?
382        .strip_prefix('"')?
383        .strip_suffix('"')?
384        .strip_suffix(&hash_str)?;
385    Some(inner.to_string())
386}
387
388/// Unescapes a string by processing escape sequences.
389///
390/// This function converts escape sequences in a string literal to their actual
391/// character values. It handles common escape sequences like `\n`, `\r`, `\t`,
392/// `\\`, `\"`, and `\0`, as well as hex escapes (`\xNN`) and unicode escapes
393/// (`\u{NNNN}`).
394///
395/// The function processes the input string character by character. When a backslash
396/// is encountered, it looks at the next character to determine which escape sequence
397/// to process. Invalid escape sequences result in compile-time errors.
398///
399/// # Arguments
400///
401/// The `s` parameter is a string slice containing escape sequences to process.
402/// Common escape sequences include:
403/// - `\n` - newline
404/// - `\r` - carriage return
405/// - `\t` - tab
406/// - `\\` - backslash
407/// - `\"` - double quote
408/// - `\0` - null character
409///
410/// Hex escapes: `\xNN` where NN is a two-digit hex number
411/// Unicode escapes: `\u{NNNN}` where NNNN is a unicode codepoint
412///
413/// # Return Value
414///
415/// On success, returns `Ok` containing a new String with all escape sequences resolved
416/// to their actual character values.
417///
418/// On failure, returns `Err` containing a TokenStream that will cause a compile
419/// error when expanded. The error message indicates which escape sequence was invalid.
420///
421/// # Examples
422///
423/// ```ignore
424/// let result = unescape("hello\\nworld").unwrap();
425/// assert_eq!(result, "hello\nworld");
426/// ```
427///
428/// With unicode:
429///
430/// ```ignore
431/// let result = unescape("hello\\u{1F980}world").unwrap();
432/// assert_eq!(result, "hello🦀world");
433/// ```
434fn unescape(s: &str) -> Result<String, TokenStream> {
435    let mut output = String::with_capacity(s.len());
436    let mut chars = s.chars();
437
438    while let Some(c) = chars.next() {
439        if c != '\\' {
440            output.push(c);
441            continue;
442        }
443
444        match chars.next() {
445            Some('n') => output.push('\n'),
446            Some('r') => output.push('\r'),
447            Some('t') => output.push('\t'),
448            Some('\\') => output.push('\\'),
449            Some('"') => output.push('"'),
450            Some('0') => output.push('\0'),
451
452            Some('x') => {
453                let h1 = chars
454                    .next()
455                    .ok_or_else(|| err("litext: invalid \\x escape"))?;
456                let h2 = chars
457                    .next()
458                    .ok_or_else(|| err("litext: invalid \\x escape"))?;
459                let hex = format!("{}{}", h1, h2);
460                let byte =
461                    u8::from_str_radix(&hex, 16).map_err(|_| err("litext: invalid \\x escape"))?;
462                if byte > 0x7F {
463                    return Err(err("litext: \\x escape must be in range 0x00..=0x7F"));
464                }
465                output.push(byte as char);
466            }
467
468            Some('u') => {
469                match chars.next() {
470                    Some('{') => {}
471                    _ => return Err(err("litext: invalid \\u escape, expected '{'")),
472                }
473                let mut hex = String::new();
474                loop {
475                    match chars.next() {
476                        Some('}') => break,
477                        Some(c) => hex.push(c),
478                        None => return Err(err("litext: unterminated \\u escape")),
479                    }
480                }
481                let codepoint = u32::from_str_radix(&hex, 16)
482                    .map_err(|_| err("litext: invalid \\u codepoint"))?;
483                let ch = char::from_u32(codepoint)
484                    .ok_or_else(|| err("litext: invalid unicode codepoint"))?;
485                output.push(ch);
486            }
487
488            Some('\n') => {
489                while let Some(&c) = chars.as_str().chars().next().as_ref() {
490                    if c.is_whitespace() {
491                        chars.next();
492                    } else {
493                        break;
494                    }
495                }
496            }
497
498            _ => return Err(err("litext: invalid escape sequence")),
499        }
500    }
501
502    Ok(output)
503}
504
505/// Creates a compile-time error token stream for reporting errors in the macro.
506///
507/// This helper function constructs a TokenStream that, when expanded, will cause
508/// a compile error with the specified message. It wraps the message in a compile_error!
509/// macro invocation, which is a special macro in Rust that halts compilation and
510/// displays the provided message.
511///
512/// The function takes a message string and converts it into a format suitable for
513/// embedding in a compile_error! macro call. This involves converting the message
514/// to a string literal format that the compile_error! macro can accept. The format!
515/// macro call with {:?} ensures that the message is properly escaped and quoted as
516/// a string literal within the macro invocation.
517///
518/// This function is used throughout the library whenever validation fails and a
519/// meaningful error message needs to be returned. Rather than returning error
520/// information as data, returning a TokenStream that will cause a compile error
521/// is more idiomatic for proc_macro crates because it provides immediate feedback
522/// to the developer at compile time.
523///
524/// # Arguments
525///
526/// The msg parameter is a string slice containing the error message that should
527/// be displayed to the developer. This message should be a clear description of
528/// what went wrong and ideally how to fix it. Messages are kept concise but
529/// informative.
530///
531/// # Return Value
532///
533/// Returns a TokenStream containing a compile_error! macro invocation with the
534/// provided message. This TokenStream can be returned from the extract function
535/// or used in any context where an error TokenStream is expected.
536///
537/// # Implementation Notes
538///
539/// The function uses .parse().unwrap() to convert the formatted string into a
540/// TokenStream. This should not panic in practice because the format string
541/// produced by format!("compile_error!({:?})", msg) is always valid Rust syntax
542/// that can be parsed into a TokenStream. The compile_error! macro itself will
543/// be invoked during expansion, producing the actual error message.
544fn err(msg: &str) -> TokenStream {
545    format!("compile_error!({:?})", msg).parse().unwrap()
546}