# Qubit JSON
[](https://circleci.com/gh/qubit-ltd/rs-json)
[](https://coveralls.io/github/qubit-ltd/rs-json?branch=main)
[](https://crates.io/crates/qubit-json)
[](https://docs.rs/qubit-json)
[](https://www.rust-lang.org)
[](LICENSE)
[](README.zh_CN.md)
Lenient JSON decoder for Rust, designed for non-fully-trusted text inputs.
## Overview
Qubit JSON provides a small and predictable decoding layer on top of
`serde_json`. Its core type, `LenientJsonDecoder`, normalizes a limited set of
common input issues before parsing and deserializing JSON values.
The crate is intended for cases where JSON text may come from sources such as:
- Markdown-wrapped text
- Markdown code blocks using backtick or tilde fences
- copied snippets
- CLI output streams
- other text channels that may wrap otherwise valid JSON
It is intentionally narrow. The crate does not try to be a general JSON repair
engine, and it does not attempt to guess missing quotes, commas, or braces.
## Design Goals
- **Lenient but predictable**: only handle a small set of well-defined input
problems
- **Object-oriented API**: use a reusable `LenientJsonDecoder` instance instead
of a loose bag of helper functions
- **Serde-first**: delegate actual parsing and deserialization to `serde_json`
- **Clear errors**: report stable error kinds with enough context for callers
- **Low overhead**: avoid unnecessary allocation when normalization can borrow
the original input
## Features
### `LenientJsonDecoder`
- Reusable decoder object that holds immutable decoding options
- `decode<T>()`: decodes any JSON top-level value into `T`
- `decode_value()`: decodes into `serde_json::Value`
- `decode_object<T>()`: requires a top-level JSON object
- `decode_array<T>()`: requires a top-level JSON array
### `JsonDecodeOptions`
- Presets and builder helpers: `lenient()`, `strict()`,
`json_code_fences_only()`, `with_max_input_bytes()`
- `trim_whitespace`: trims leading and trailing whitespace
- `strip_utf8_bom`: strips a leading UTF-8 BOM
- `strip_markdown_code_fence`: strips one outer backtick or tilde Markdown
code fence, including fences indented by up to three spaces
- `strip_markdown_code_fence_requires_closing`: only strip code fence when a
valid closing fence exists
- `strip_markdown_code_fence_json_only`: only strip fenced blocks whose first
info-string token is empty, `json`, or `jsonc`
- `escape_control_chars_in_strings`: escapes ASCII control characters inside
JSON string literals
- `max_input_bytes`: optional byte-size limit applied before normalization
### Explicit Error Model
- `InputTooLarge`: raw input size exceeds configured limit
- `EmptyInput`: input becomes empty after normalization
- `InvalidJson`: normalized text is not valid JSON syntax
- `UnexpectedTopLevel`: top-level JSON kind does not match the requested method
- `Deserialize`: JSON is valid but cannot be deserialized into the target type
- `JsonDecodeError.stage`: indicates the failing stage (`normalize`, `parse`,
`top_level_check`, `deserialize`)
- `JsonDecodeError.input_bytes` / `max_input_bytes`: optional byte context for
diagnostics
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
qubit-json = "0.2.0"
serde = { version = "1.0", features = ["derive"] }
```
The direct `serde` dependency is only needed when deriving `Deserialize` for
typed decoding, as shown in the first quick-start example below.
## Quick Start
### Decode a JSON Object from a Markdown Code Fence
```rust
use serde::Deserialize;
use qubit_json::LenientJsonDecoder;
#[derive(Debug, Deserialize)]
struct User {
name: String,
age: u8,
}
fn main() {
let decoder = LenientJsonDecoder::default();
let user: User = decoder
.decode_object("```json\n{\"name\":\"Alice\",\"age\":30}\n```")
.expect("decoder should extract and decode the fenced JSON object");
assert_eq!(user.name, "Alice");
assert_eq!(user.age, 30);
}
```
### Decode JSON Containing Raw Control Characters in Strings
```rust
use qubit_json::LenientJsonDecoder;
fn main() {
let decoder = LenientJsonDecoder::default();
let value = decoder
.decode_value("{\"text\":\"line 1\nline 2\"}")
.expect("decoder should escape raw control characters inside strings");
assert_eq!(value["text"], "line 1\nline 2");
}
```
### Customize Decoder Options
```rust
use qubit_json::{LenientJsonDecoder, JsonDecodeOptions};
fn main() {
let decoder = LenientJsonDecoder::new(
JsonDecodeOptions::json_code_fences_only().with_max_input_bytes(1024),
);
let value = decoder
.decode_value("{\"ok\":true}")
.expect("plain JSON should still decode with custom options");
assert_eq!(value["ok"], true);
}
```
## Normalization Rules
When enabled, the decoder applies the following pipeline before parsing:
1. enforce the optional raw input byte-size limit
2. validate that the input is not empty
3. trim surrounding whitespace
4. strip a leading UTF-8 BOM
5. trim surrounding whitespace again
6. strip one outer backtick or tilde Markdown code fence
7. trim surrounding whitespace again
8. escape ASCII control characters inside JSON string literals
9. trim surrounding whitespace again
The decoder does not:
- add missing quotes
- add missing commas
- add missing braces or brackets
- rewrite arbitrary malformed JSON into guessed valid JSON
## When to Use
Qubit JSON is a good fit when:
- you need a reusable, configurable JSON decoder object
- your inputs are mostly valid JSON but may be wrapped or slightly noisy
- you want stable error categories around `serde_json`
It is not a good fit when:
- you need aggressive repair for heavily malformed JSON
- your inputs are not actually JSON
- a plain `serde_json::from_str()` call is already sufficient
## License
This project is licensed under the Apache 2.0 License. See [LICENSE](LICENSE)
for details.
## Alignment Notes
This README reflects the current object model:
- `LenientJsonDecoder` owns an internal `LenientJsonNormalizer`.
- Public decoding APIs are `decode`, `decode_object`, `decode_array`,
`decode_value`.
- Normalization and error handling are implemented in
`src/lenient_json_normalizer.rs`
and `src/json_decode_error.rs`, which are covered by tests in `tests/`.
- Product requirements and implementation behavior are aligned with
`doc/json_prd.zh_CN.md` and `doc/json_design.zh_CN.md`.