Module stream

Module stream 

Source
Expand description

A high-performance, allocation-free, streaming JSON string unescaper.

This module provides utilities to unescape JSON strings, with a focus on performance and flexibility. It is designed to work with data sources that deliver content in chunks, such as network sockets or file readers, without requiring heap allocations or holding onto previous chunks.

§Key Features

  • Streaming Unescaping: The main type, UnescapeStream, processes byte slices incrementally. This is ideal for I/O-bound applications.
  • Zero Heap Allocations: The entire process occurs on the stack, using a small internal buffer to “stitch” together escape sequences that are split across chunk boundaries.
  • Data-Source Agnostic: The API uses a “push” model. You provide byte slices to the unescaper as you receive them, allowing the caller to reuse their input buffers.
  • Robust Error Handling: Reports detailed errors, including the position and kind of failure.

§How It Works

The core of the streaming logic is the UnescapeStream struct. When you process a slice using unescape_next, it returns a tuple containing two parts:

  1. An Option<Result<char, UnescapeError>>: This handles the “continuity” between the previous slice and the current one. It will be Some(_) only if the previous slice ended with an incomplete escape sequence that was resolved by the start of the new slice. The Result will contain the unescaped character on success or an error if the combined bytes form an invalid sequence.
  2. An UnescapeNext iterator: This iterator yields the unescaped parts for the remainder of the current slice.

After processing all slices, you must call finish to check for any leftover partial escape sequences, which would indicate a malformed JSON string at the end of the stream.

§Example

use json_escape::{stream::UnescapeStream, token::UnescapedToken};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // A JSON string split into multiple parts.
    // The surrogate pair `\uD83D\uDE00` (😀) is split across parts.
    let parts = vec![
        br#"{"message": "Hello, W\"orld! \uD83D"#.as_slice(),
        br#"\uDE00"}"#.as_slice(),
    ];

    let mut unescaper = UnescapeStream::new();
    let mut unescaped_string = String::new();

    for part in parts {
        // Process the next part of the stream.
        let (boundary_char, rest_of_part) = unescaper.try_unescape_next(part)?;

        // 1. Handle the character that may have spanned the boundary.
        if let Some(boundary_char) = boundary_char {
            unescaped_string.push(boundary_char);
        }

        // 2. Process the rest of the current part.
        for result in rest_of_part {
            let unescaped_part = result?;
            match unescaped_part {
                UnescapedToken::Literal(literal) => {
                    unescaped_string.push_str(std::str::from_utf8(literal)?)
                }
                UnescapedToken::Unescaped(ch) => unescaped_string.push(ch),
            }
        }
    }

    // IMPORTANT: Always call finish() to detect errors at the end of the stream.
    unescaper.finish()?;

    assert_eq!(unescaped_string, r#"{"message": "Hello, W"orld! 😀"}"#);

    Ok(())
}

Structs§

FnMutChunkSource
A ChunkSource implementation that wraps a mutable closure (FnMut).
ReadChunkSource
A ChunkSource that reads from any std::io::Read type.
UnescapeNext
An iterator over the unescaped parts of a single byte slice.
UnescapeStream
A streaming JSON string unescaper that operates over byte slices.

Enums§

UnescapeFnError
An error that can occur during the unescape_from_source operation.

Traits§

ChunkSource
This trait is designed to handle byte streams efficiently, especially when the source needs to borrow from an internal buffer between calls. A simple closure (FnMut() -> Option<Result<B, E>>) cannot express this lifetime relationship, as the returned slice would need to outlive the closure call itself. This trait solves that by making the source a mutable object that you call repeatedly.