React to elements in a JSON stream
Parse JSON and execute callbacks based on patterns, even before the entire document is available. llms.txt.
For a fast start,
- first look at the concepts and examples in this
README, - then learn about [
crate::scan()], and - about the context stack and matching by [
crate::iter_match()].
scan_json is designed to support zero-allocation and no_std environments, but a transitive dependency through jiter currently requires std.
Concepts
The library uses the streaming JSON parser RJiter. While parsing, it maintains context, which is the path of element names from the root to the current nesting level.
The workflow for each key:
- First, call
find_actionand execute if found - If the key value is an object or array, update the context and parse the next level
- Afterwards, call
find_end_actionand execute if found
An action receives two arguments:
rjiter: A mutable reference to theRJiterparser object. An action can modify JSON parsing behavior by consuming the current key's valuebaton: This can be either:- A simple
Copytype (likei32,bool,()) passed by value for read-only or stateless operations &RefCell<B>for mutable state that needs to be shared across action calls
- A simple
Example of an action
find_action uses the library helper [iter_match] to detect the content key and return the on_content function.
The action peeks the value and writes it to the output. Because the value is consumed, the action returns the ValueIsConsumed flag to scan so it can update its internal state.
use ;
use StructuralPseudoname;
use ContextIter;
use RJiter;
use RefCell;
use Write;
use U8Pool;
// Find action function that matches "content" key
let find_action = ;
Complete example: Identity transformation
The identity transformation copies JSON input to output, retaining the original structure.
The function [crate::idtransform::idtransform()] is not just a library function,
but also an example of advanced scan use. Read the source code for details.
Additionally, the function [crate::idtransform::copy_atom()] can be useful.
Complete example: converting an LLM stream
Summary:
- Initialize the parser
- Create the black box with a
Vec, which is used asdyn Writein actions - Create handlers for
message,content, and a handler for the end ofmessage - Combine all together in the
scanfunction
The example demonstrates that scan can be used to handle LLM streaming output:
- The input consists of several top-level JSON objects not wrapped in an array
- The server-side-events tokens are ignored
use RefCell;
use Write;
use ;
use StructuralPseudoname;
use ContextIter;
use RJiter;
use U8Pool;
// ---------------- Sample LLM output as `scan_llm_output` input
let json = r#"{
"id": "chatcmpl-Ahpq4nZeP9mESaKsCVdmZdK96IrUH",
"object": "chat.completion",
"created": 1735010736,
"model": "gpt-4o-mini-2024-07-18",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 10,
"total_tokens": 19,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"system_fingerprint": "fp_0aa8d3e20b"
}"#;
let writer_cell = scan_llm_output;
let message = Stringfrom_utf8.unwrap;
assert_eq!;
// ---------------- Another sample of LLM output, the streaming version
let json = r#"
data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: [DONE]
"#;
let writer_cell = scan_llm_output;
let message = Stringfrom_utf8.unwrap;
assert_eq!;
Colophon
License: MIT
Author: Oleg Parashchenko, olpa@ https://uucode.com/
Contact: via email or Ailets Discord
scan_json is a part of the ailets.org project.