Surfing 🏄
A Rust library for parsing JSON objects from text streams.
Overview
Surfing provides utilities to extract JSON objects from text streams, making it particularly useful for:
- Processing log files containing JSON entries mixed with plain text
- Extracting JSON objects from console output
- Handling streaming JSON data that might arrive in chunks
- Filtering JSON content from mixed data sources, such as LLM outputs
Features
- Extract JSON objects and arrays from mixed text content
- Support for processing partial JSON (streaming)
- Serde integration for direct deserialization (optional feature)
- Streaming deserializer for handling JSON in data streams
- Zero dependencies (aside from
anyhowfor error handling)
Installation
Add this to your Cargo.toml:
# Basic functionality
[]
= "0.1.0"
# Or with Serde support
[]
= { = "0.1.0", = ["serde"] }
Usage
Simple Utility Function
For simple use cases, use the high-level utility function:
use extract_json_to_string;
let input = "Log entry: {\"level\":\"info\",\"message\":\"Server started\"} End of line";
let json = extract_json_to_string.unwrap;
assert_eq!;
Processing Streaming Data
Handle JSON that might arrive in chunks:
use BufWriter;
use JSONParser;
let mut parser = new;
let mut buffer = Vecnew;
let json = Stringfrom_utf8.unwrap;
assert_eq!;
Using with Standard Output
Process JSON and write directly to stdout:
use stdout;
use JSONParser;
let mut parser = new;
// Lock stdout for better performance with multiple writes
let stdout = stdout;
let mut handle = stdout.lock;
let stream =
// This would print only the JSON part to the console
for chunk in stream.iter
Performance Considerations
Buffering
For optimal performance when processing large files or streams:
- Use
BufWriterorBufReaderto reduce the number of system calls - Process data in chunks of appropriate size (typically 4-8KB)
- Reuse parser instances when processing multiple chunks to maintain state
Memory Usage
The parser stores minimal state:
- Current JSON nesting level
- A small buffer for tracking markers
This makes it suitable for processing large streams with minimal memory overhead.
Serde Integration
When enabled with the serde feature, you can deserialize directly from mixed text:
use Deserialize;
use from_mixed_text;
// Text with embedded JSON
let input = "Log entry: {\"level\":\"info\",\"message\":\"Started server\"} End of line";
// Directly deserialize the JSON part into a struct
let entry: LogEntry = from_mixed_text.unwrap;
assert_eq!;
assert_eq!;
Streaming Deserialization
Process and deserialize streaming data in two ways:
High-level StreamingDeserializer
For a more convenient API, use the StreamingDeserializer:
use Deserialize;
use StreamingDeserializer;
// Create a deserializer for User structs
let mut deserializer = new;
// Process chunks as they arrive
let chunks = ;
// First chunk - incomplete JSON
let result = deserializer.process_chunk;
assert!;
// Second chunk - completes the JSON
let result = deserializer.process_chunk;
assert!;
let user = result.unwrap;
assert_eq!;
// Third chunk - no more JSON to extract
let result = deserializer.process_chunk;
assert!;
Low-level API
use Deserialize;
use JSONParser;
use from_mixed_text_with_parser;
let mut parser = new;
// Process the chunks as they arrive
let chunk1 = "Config: {\"name\":\"";
let chunk2 = "api-server\",\"port\":8080}";
// First chunk (incomplete)
match
// Second chunk completes the JSON
let config: Config = from_mixed_text_with_parser.unwrap;
assert_eq!;
assert_eq!;
Examples
Check the examples directory for more detailed usage scenarios:
basic.rs- Simple extraction from mixed textstreaming.rs- Processing data in chunksstdout.rs- Filtering JSON to standard outputsimple.rs- Using the high-level utility functionsserde_integration.rs- Using Serde to deserialize extracted JSONstreaming_serde.rs- Using StreamingDeserializer for stream processing
License
This project is licensed under the MIT License - see the LICENSE file for details.