React to elements in a JSON stream
Start processing JSON before the entire JSON document is available.
scan_jsonon crates.io- Documentation on docs.rs
- Entry point: [
crate::scan()]
Concepts
The library uses the streaming JSON parser RJiter.
The scan function checks for registered handlers (actions) at the begin and end of every JSON key. The check is performed by a matcher. Together, a matcher plus an action form a trigger.
An action gets two RefCell references as arguments:
baton_cell: A black box for side effects by the actionrjiter_cell:RJiterparser object. An action can interfere with JSON parsing by consuming the value of the current key
Example of a trigger
The trigger matches the key content and calls the on_content function.
The action's black box contains a Write trait object. The action writes the string value of the current JSON key content to this writer.
Getting the value requires using the RJiter parser to consume the next token. The action returns StreamOp::ValueIsConsumed to inform the caller that it has consumed the value, so that the caller can update its internal state.
The type annotation Trigger<BoxedAction<dyn Write>> is not needed in this code fragment, but it is often required when using closure handlers and several triggers.
use ;
use RefCell;
use Write;
let content_trigger: = new;
Complete example: Identity transformation
The identity transformation copies JSON input to output, retaining the original structure.
The function [crate::idtransform::idtransform()] is not just a library function,
but also an example of advanced scan use. Read the source code for details.
Additionally, the function [crate::idtransform::copy_atom()] can be useful.
Complete example: LLM output
Summary:
- Initialize the parser
- Create the black box with a
Vec, which is used asdyn Writein actions - Create triggers for
message,content, and a trigger for the end ofmessage - Combine all together in the
scanfunction
The example demonstrates that scan can be used to handle LLM streaming output:
- The input is several JSON objects on the top-level, without being wrapped in an array
- The server-side-events tokens are ignored
use RefCell;
use Write;
use scan;
use ;
use Options;
// ---------------- Sample LLM output
let json = r#"{
"id": "chatcmpl-Ahpq4nZeP9mESaKsCVdmZdK96IrUH",
"object": "chat.completion",
"created": 1735010736,
"model": "gpt-4o-mini-2024-07-18",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 10,
"total_tokens": 19,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"system_fingerprint": "fp_0aa8d3e20b"
}"#;
let writer_cell = scan_llm_output;
let message = Stringfrom_utf8.unwrap;
assert_eq!;
// ---------------- Sample LLM output (streaming)
let json = r#"
data: {"choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: {"choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"id":"chatcmpl-AgMB1khICnwswjgqIl2X2jr587Nep","object":"chat.completion.chunk","created":1734658387,"model":"gpt-4o-mini-2024-07-18","system_fingerprint":"fp_d02d531b47"}
data: [DONE]
"#;
let writer_cell = scan_llm_output;
let message = Stringfrom_utf8.unwrap;
assert_eq!;
Colophon
License: MIT
Author: Oleg Parashchenko, olpa@ https://uucode.com/
Contact: via email or Ailets Discord
scan_json is a part of the ailets.org project.