litext 0.2.0

Just what you need for extracting string literal contents at compile time
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
//! # litext
//!
//! A procedural macro library for extracting string literal contents from tokens.
//! Built for proc-macro authors who need to pull string content from TokenStream input.
//!
//! ## Overview
//!
//! This library provides the `extract` function which takes a TokenStream representing
//! a string literal and returns the inner text content without the surrounding quotes.
//! It handles both regular string literals and raw string literals.
//!
//! This is a proc-macro helper library. It is designed for proc-macro authors who need
//! to extract string content from tokens during macro expansion. It is not intended for
//! use in regular Rust code.
//!
//! The implementation is zero dependency and contained in a single file.
//!
//! ## Usage
//!
//! Add to your proc-macro crate:
//!
//! ```toml
//! [dependencies]
//! litext = "0.1.0"
//! ```
//!
//! Use in your proc-macro. Pass the raw tokens to litext!:
//!
//! ```ignore
//! use litext::{litext, TokenStream};
//!
//! pub fn my_macro(input: TokenStream) -> TokenStream {
//!     let string_content = litext!(input);
//!     // ... use string_content
//! }
//! ```
//!
//! ## Features
//!
//! Extract supports the following string literal formats:
//!
//! - Regular double-quoted strings: `"hello world"`
//! - Raw strings with hashes: `r#"hello world"#`
//! - Raw strings with multiple hashes: `r##"hello # world"##`
//!
//! Extract returns errors for:
//!
//! - Empty input: "litext! requires a string literal"
//! - Multiple tokens: "litext! takes exactly one argument"
//! - Not a string literal: "expected a string literal, found ..."
//! - Byte strings: "litext! expected a string literal, not a byte string"
//!
//! ## Performance
//!
//! The extract function runs at compile time during macro expansion.
//! There is no runtime overhead in your final binary.
//!
//! ## Re-exports
//!
//! This crate re-exports `proc_macro::TokenStream`, `proc_macro::TokenTree`,
//! and `proc_macro::Literal` for convenience in proc-macro code.

extern crate proc_macro;
pub use proc_macro::{Literal, TokenStream, TokenTree};

/// The whole point.
///
/// Extracts the inner text content from a token stream representing a string literal.
///
/// This macro is the primary interface for extracting string content from tokens.
/// It takes a `TokenStream` that should represent a single string literal and
/// returns the text content without the surrounding quotes. If the input is invalid,
/// the macro triggers a compile error at the call site.
///
/// The macro wraps the [`extract`] function and provides a more convenient syntax
/// for proc-macro authors. It handles both regular string literals (`"hello"`) and
/// raw string literals (`r#"hello"#`).
///
/// # Arguments
///
/// The `$input:expr` argument must be a `TokenStream` containing exactly one token
/// that represents a string literal. This can be:
///
/// - A regular string: `"hello world"`
/// - A raw string: `r#"hello world"#`
/// - A raw string with multiple hashes: `r##"hello # world"##`
///
/// # Return Value
///
/// On success, the macro expands to a `String` containing the inner content of
/// the string literal, without quotes or raw string markers.
///
/// On failure, the macro expands to `return $err`, causing an early return from
/// the calling function with a `compile_error!` invocation. This is appropriate
/// when the macro is used within a function returning `TokenStream`.
///
/// # Errors
///
/// The macro produces compile-time errors for:
///
/// - Empty input: `litext! expected a string literal, got nothing`
/// - Multiple tokens: `litext! expected exactly one string literal`
/// - Not a string literal: `litext! expected a string literal, found ...`
/// - Byte strings: `litext: expected a string literal, not a byte string`
///
/// # Examples
///
/// Basic usage in a proc-macro function:
///
/// ```ignore
/// use litext::{litext, TokenStream};
///
/// pub fn my_macro(input: TokenStream) -> TokenStream {
///     let content = litext!(input);
///     // content is a String containing the string content
///     // ... use content
/// }
/// ```
///
/// The input TokenStream is typically passed from the macro function's input parameter,
/// which contains the tokens parsed from the macro invocation.
///
/// ```ignore
/// use litext::{litext, TokenStream};
///
/// // Given input: my_macro!("hello")
/// pub fn my_macro(input: TokenStream) -> TokenStream {
///     // input contains the tokens for "hello"
///     let content = litext!(input);
///     // content is "hello"
/// }
/// ```
#[macro_export]
macro_rules! litext {
    ($input:expr) => {
        match $crate::extract($input) {
            Ok(s) => s,
            Err(e) => return e,
        }
    };
}

/// Extracts the inner text content from a token stream representing a string literal.
///
/// This function is the core implementation behind the litext! macro. It takes a TokenStream
/// that should represent a single token and processes it to extract the string content.
/// The function handles the various forms of string literals that Rust supports and
/// validates that the input is indeed a string literal.
///
/// The function performs several validation steps. First, it checks that there is
/// exactly one token in the input stream. Then it examines that token to determine
/// if it is a valid string literal. If the token is a Literal, it attempts to parse
/// it as a string. If the token is an Ident, Punct, or Group, it returns an appropriate
/// error message indicating that a string literal was expected.
///
/// This function is public and can be used directly if you need to process TokenStream
/// values from other proc_macro code. However, most users will prefer to use the
/// litext! macro which provides a more convenient interface.
///
/// # Arguments
///
/// The input parameter should be a TokenStream containing exactly one token.
/// This token is expected to be a string literal, either a regular string or a raw string.
///
/// # Return Value
///
/// On success, returns Ok containing a String with the inner text content of the string
/// literal, without the surrounding quotes or the raw string markers.
///
/// On failure, returns Err containing a TokenStream that will cause a compile error
/// when expanded. The error TokenStream contains a compile_error! macro invocation
/// with a descriptive message about what went wrong.
///
/// # Examples
///
/// Using the function directly with a proc_macro token stream:
///
/// ```ignore
/// use proc_macro::TokenStream;
/// use litext::extract;
///
/// let input: TokenStream = "Hello".parse().unwrap();
/// let result = extract(input);
/// assert!(result.is_err()); // "Hello" is not a string literal
/// ```
///
/// The function is primarily used internally by the litext! macro, but the public
/// API allows for more advanced proc_macro development if needed.
pub fn extract(input: TokenStream) -> Result<String, TokenStream> {
    let mut iter = input.into_iter();

    let token = match iter.next() {
        Some(t) => t,
        None => return Err(err("litext! expected a string literal, got nothing")),
    };

    if iter.next().is_some() {
        return Err(err("litext! expected exactly one string literal"));
    }

    match token {
        TokenTree::Literal(lit) => parse_lit(lit),
        TokenTree::Ident(_) => Err(err("litext! expected a string literal, found identifier")),
        TokenTree::Punct(_) => Err(err("litext! expected a string literal, found punctuation")),
        TokenTree::Group(_) => Err(err("litext! expected a string literal, found group")),
    }
}

/// Parses a Literal token to extract the string content from a string literal.
///
/// This function takes a proc_macro Literal token that should represent a string literal
/// and attempts to extract its inner content. It handles both regular double-quoted strings
/// and raw strings using the r#...# syntax. The function validates that the literal is actually
/// a string literal and not a byte string or other type of literal.
///
/// The parsing logic first checks if the literal is a raw string by looking for the 'r' prefix.
/// If it is a raw string, it uses the parse_raw helper function to extract the content.
/// If it is not a raw string but does start and end with double quotes, it extracts the
/// content between the quotes. Any other format results in an error.
///
/// One important aspect of this function is that it explicitly rejects byte strings (those
/// prefixed with b) and raw byte strings (prefixed with br). This is because the goal of
/// litext is to work with text strings, not byte sequences. Byte strings would produce
/// different output that might confuse users who expect text content.
///
/// # Arguments
///
/// The lit parameter is a proc_macro Literal token. This token is expected to represent
/// some kind of literal value in Rust source code, but the function will verify that
/// it is specifically a string literal before processing.
///
/// # Return Value
///
/// On success, returns Ok containing a String with the content of the string literal
/// without the surrounding quotation marks. The string is owned and can be used as
/// needed by the caller.
///
/// On failure, returns Err containing a TokenStream that will trigger a compile-time
/// error when expanded. The error message will indicate what type of literal was
/// provided instead of a string literal.
///
/// # Implementation Notes
///
/// The function uses string manipulation rather than the Literal API's typed methods
/// because proc_macro's Literal type does not provide direct access to the string
/// content in a convenient form. By converting to a string and parsing it manually,
/// we gain full control over the extraction process and can handle all the edge cases
/// that might occur with various string formats.
fn parse_lit(lit: proc_macro::Literal) -> Result<String, TokenStream> {
    let raw = lit.to_string();

    if raw.starts_with('b') && raw.len() > 1 {
        let c = raw.chars().nth(1).unwrap();
        if c == '"' || c == 'r' {
            return Err(err("litext: expected a string literal, not a byte string"));
        }
    }

    if raw.starts_with('r') {
        return parse_raw(&raw).ok_or_else(|| err("litext! malformed raw string literal"));
    }

    if raw.starts_with('"') && raw.ends_with('"') && raw.len() >= 2 {
        return unescape(&raw[1..raw.len() - 1]);
    }

    Err(err("litext! expected a string literal"))
}

/// Parses a raw string literal format to extract the inner content.
///
/// Raw strings in Rust use the syntax r#"content"# where the number of hash marks
/// can be increased to allow the content to contain quote characters without escaping.
/// This function handles all variations of raw string syntax, from r"..." to
/// r##"..."## and beyond.
///
/// The function works by first identifying how many hash marks precede the opening
/// quote, then stripping those hash marks along with the quotes themselves from both
/// ends of the string, and finally returning the content that lies between them.
///
/// The implementation uses Rust's strip_prefix and strip_suffix methods which return
/// Option types, allowing the function to elegantly handle the case where the string
/// does not match the expected format by returning None. This makes the function
/// pure and composable with the rest of the parsing logic.
///
/// # Arguments
///
/// The raw parameter is a string representation of what appears to be a raw string
/// literal. It should not include the surrounding r# syntax - it should be the raw
/// string as it would appear after tokenization, typically starting with 'r' and
/// containing hash characters.
///
/// # Return Value
///
/// If the string is a valid raw string format, returns Some containing the content
/// between the quotes and hash marks. If the string does not match the expected raw
/// string format, returns None.
///
/// The function is infallible in that it always either returns Some with the content
/// or None if the format is invalid. There are no error conditions that would require
/// an error message.
fn parse_raw(raw: &str) -> Option<String> {
    let rest = raw.strip_prefix('r')?;
    let hashes = rest.chars().take_while(|c| *c == '#').count();
    let hash_str = "#".repeat(hashes);
    let inner = rest
        .strip_prefix(&hash_str)?
        .strip_prefix('"')?
        .strip_suffix('"')?
        .strip_suffix(&hash_str)?;
    Some(inner.to_string())
}

/// Unescapes a string by processing escape sequences.
///
/// This function converts escape sequences in a string literal to their actual
/// character values. It handles common escape sequences like `\n`, `\r`, `\t`,
/// `\\`, `\"`, and `\0`, as well as hex escapes (`\xNN`) and unicode escapes
/// (`\u{NNNN}`).
///
/// The function processes the input string character by character. When a backslash
/// is encountered, it looks at the next character to determine which escape sequence
/// to process. Invalid escape sequences result in compile-time errors.
///
/// # Arguments
///
/// The `s` parameter is a string slice containing escape sequences to process.
/// Common escape sequences include:
/// - `\n` - newline
/// - `\r` - carriage return
/// - `\t` - tab
/// - `\\` - backslash
/// - `\"` - double quote
/// - `\0` - null character
///
/// Hex escapes: `\xNN` where NN is a two-digit hex number
/// Unicode escapes: `\u{NNNN}` where NNNN is a unicode codepoint
///
/// # Return Value
///
/// On success, returns `Ok` containing a new String with all escape sequences resolved
/// to their actual character values.
///
/// On failure, returns `Err` containing a TokenStream that will cause a compile
/// error when expanded. The error message indicates which escape sequence was invalid.
///
/// # Examples
///
/// ```
/// let result = unescape("hello\\nworld").unwrap();
/// assert_eq!(result, "hello\nworld");
/// ```
///
/// With unicode:
///
/// ```
/// let result = unescape("hello\\u{1F980}world").unwrap();
/// assert_eq!(result, "hello🦀world");
/// ```
fn unescape(s: &str) -> Result<String, TokenStream> {
    let mut output = String::with_capacity(s.len());
    let mut chars = s.chars();

    while let Some(c) = chars.next() {
        if c != '\\' {
            output.push(c);
            continue;
        }

        match chars.next() {
            Some('n') => output.push('\n'),
            Some('r') => output.push('\r'),
            Some('t') => output.push('\t'),
            Some('\\') => output.push('\\'),
            Some('"') => output.push('"'),
            Some('0') => output.push('\0'),

            Some('x') => {
                let h1 = chars
                    .next()
                    .ok_or_else(|| err("litext: invalid \\x escape"))?;
                let h2 = chars
                    .next()
                    .ok_or_else(|| err("litext: invalid \\x escape"))?;
                let hex = format!("{}{}", h1, h2);
                let byte =
                    u8::from_str_radix(&hex, 16).map_err(|_| err("litext: invalid \\x escape"))?;
                if byte > 0x7F {
                    return Err(err("litext: \\x escape must be in range 0x00..=0x7F"));
                }
                output.push(byte as char);
            }

            Some('u') => {
                match chars.next() {
                    Some('{') => {}
                    _ => return Err(err("litext: invalid \\u escape, expected '{'")),
                }
                let mut hex = String::new();
                loop {
                    match chars.next() {
                        Some('}') => break,
                        Some(c) => hex.push(c),
                        None => return Err(err("litext: unterminated \\u escape")),
                    }
                }
                let codepoint = u32::from_str_radix(&hex, 16)
                    .map_err(|_| err("litext: invalid \\u codepoint"))?;
                let ch = char::from_u32(codepoint)
                    .ok_or_else(|| err("litext: invalid unicode codepoint"))?;
                output.push(ch);
            }

            Some('\n') => {
                while let Some(&c) = chars.as_str().chars().next().as_ref() {
                    if c.is_whitespace() {
                        chars.next();
                    } else {
                        break;
                    }
                }
            }

            _ => return Err(err("litext: invalid escape sequence")),
        }
    }

    Ok(output)
}

/// Creates a compile-time error token stream for reporting errors in the macro.
///
/// This helper function constructs a TokenStream that, when expanded, will cause
/// a compile error with the specified message. It wraps the message in a compile_error!
/// macro invocation, which is a special macro in Rust that halts compilation and
/// displays the provided message.
///
/// The function takes a message string and converts it into a format suitable for
/// embedding in a compile_error! macro call. This involves converting the message
/// to a string literal format that the compile_error! macro can accept. The format!
/// macro call with {:?} ensures that the message is properly escaped and quoted as
/// a string literal within the macro invocation.
///
/// This function is used throughout the library whenever validation fails and a
/// meaningful error message needs to be returned. Rather than returning error
/// information as data, returning a TokenStream that will cause a compile error
/// is more idiomatic for proc_macro crates because it provides immediate feedback
/// to the developer at compile time.
///
/// # Arguments
///
/// The msg parameter is a string slice containing the error message that should
/// be displayed to the developer. This message should be a clear description of
/// what went wrong and ideally how to fix it. Messages are kept concise but
/// informative.
///
/// # Return Value
///
/// Returns a TokenStream containing a compile_error! macro invocation with the
/// provided message. This TokenStream can be returned from the extract function
/// or used in any context where an error TokenStream is expected.
///
/// # Implementation Notes
///
/// The function uses .parse().unwrap() to convert the formatted string into a
/// TokenStream. This should not panic in practice because the format string
/// produced by format!("compile_error!({:?})", msg) is always valid Rust syntax
/// that can be parsed into a TokenStream. The compile_error! macro itself will
/// be invoked during expansion, producing the actual error message.
fn err(msg: &str) -> TokenStream {
    format!("compile_error!({:?})", msg).parse().unwrap()
}