qubit-json 0.3.1

Lenient JSON decoder for non-fully-trusted JSON text inputs
Documentation

Qubit JSON

CircleCI Coverage Status Crates.io docs.rs Rust License 中文文档

Lenient JSON decoder for Rust, designed for non-fully-trusted text inputs.

Overview

Qubit JSON provides a small and predictable decoding layer on top of serde_json. Its core type, LenientJsonDecoder, normalizes a limited set of common input issues before parsing and deserializing JSON values.

The crate is intended for cases where JSON text may come from sources such as:

  • Markdown-wrapped text
  • Markdown code blocks using backtick or tilde fences
  • copied snippets
  • CLI output streams
  • other text channels that may wrap otherwise valid JSON

It is intentionally narrow. The crate does not try to be a general JSON repair engine, and it does not attempt to guess missing quotes, commas, or braces.

Design Goals

  • Lenient but predictable: only handle a small set of well-defined input problems
  • Object-oriented API: use a reusable LenientJsonDecoder instance instead of a loose bag of helper functions
  • Serde-first: delegate actual parsing and deserialization to serde_json
  • Clear errors: report stable error kinds with enough context for callers
  • Low overhead: avoid unnecessary allocation when normalization can borrow the original input

Features

LenientJsonDecoder

  • Reusable decoder object that holds immutable decoding options
  • decode<T>(): decodes any JSON top-level value into T
  • decode_value(): decodes into serde_json::Value
  • decode_object<T>(): requires a top-level JSON object
  • decode_array<T>(): requires a top-level JSON array

JsonDecodeOptions

  • Presets and builder helpers: lenient(), strict(), json_code_fences_only(), with_max_input_bytes()
  • trim_whitespace: trims leading and trailing whitespace
  • strip_utf8_bom: strips a leading UTF-8 BOM
  • strip_markdown_code_fence: strips one outer backtick or tilde Markdown code fence, including fences indented by up to three spaces
  • strip_markdown_code_fence_requires_closing: only strip code fence when a valid closing fence exists
  • strip_markdown_code_fence_json_only: only strip fenced blocks whose first info-string token is empty, json, or jsonc
  • escape_control_chars_in_strings: escapes ASCII control characters inside JSON string literals
  • max_input_bytes: optional byte-size limit applied before normalization

Explicit Error Model

  • InputTooLarge: raw input size exceeds configured limit
  • EmptyInput: input becomes empty after normalization
  • InvalidJson: normalized text is not valid JSON syntax
  • UnexpectedTopLevel: top-level JSON kind does not match the requested method
  • Deserialize: JSON is valid but cannot be deserialized into the target type
  • JsonDecodeError.stage: indicates the failing stage (normalize, parse, top_level_check, deserialize)
  • JsonDecodeError.input_bytes / max_input_bytes: optional byte context for diagnostics

Installation

Add this to your Cargo.toml:

[dependencies]
qubit-json = "0.2.0"
serde = { version = "1.0", features = ["derive"] }

The direct serde dependency is only needed when deriving Deserialize for typed decoding, as shown in the first quick-start example below.

Quick Start

Decode a JSON Object from a Markdown Code Fence

use serde::Deserialize;
use qubit_json::LenientJsonDecoder;

#[derive(Debug, Deserialize)]
struct User {
    name: String,
    age: u8,
}

fn main() {
    let decoder = LenientJsonDecoder::default();
    let user: User = decoder
        .decode_object("```json\n{\"name\":\"Alice\",\"age\":30}\n```")
        .expect("decoder should extract and decode the fenced JSON object");

    assert_eq!(user.name, "Alice");
    assert_eq!(user.age, 30);
}

Decode JSON Containing Raw Control Characters in Strings

use qubit_json::LenientJsonDecoder;

fn main() {
    let decoder = LenientJsonDecoder::default();
    let value = decoder
        .decode_value("{\"text\":\"line 1\nline 2\"}")
        .expect("decoder should escape raw control characters inside strings");

    assert_eq!(value["text"], "line 1\nline 2");
}

Customize Decoder Options

use qubit_json::{LenientJsonDecoder, JsonDecodeOptions};

fn main() {
    let decoder = LenientJsonDecoder::new(
        JsonDecodeOptions::json_code_fences_only().with_max_input_bytes(1024),
    );

    let value = decoder
        .decode_value("{\"ok\":true}")
        .expect("plain JSON should still decode with custom options");

    assert_eq!(value["ok"], true);
}

Normalization Rules

When enabled, the decoder applies the following pipeline before parsing:

  1. enforce the optional raw input byte-size limit
  2. validate that the input is not empty
  3. trim surrounding whitespace
  4. strip a leading UTF-8 BOM
  5. trim surrounding whitespace again
  6. strip one outer backtick or tilde Markdown code fence
  7. trim surrounding whitespace again
  8. escape ASCII control characters inside JSON string literals
  9. trim surrounding whitespace again

The decoder does not:

  • add missing quotes
  • add missing commas
  • add missing braces or brackets
  • rewrite arbitrary malformed JSON into guessed valid JSON

When to Use

Qubit JSON is a good fit when:

  • you need a reusable, configurable JSON decoder object
  • your inputs are mostly valid JSON but may be wrapped or slightly noisy
  • you want stable error categories around serde_json

It is not a good fit when:

  • you need aggressive repair for heavily malformed JSON
  • your inputs are not actually JSON
  • a plain serde_json::from_str() call is already sufficient

License

This project is licensed under the Apache 2.0 License. See LICENSE for details.

Alignment Notes

This README reflects the current object model:

  • LenientJsonDecoder owns an internal LenientJsonNormalizer.
  • Public decoding APIs are decode, decode_object, decode_array, decode_value.
  • Normalization and error handling are implemented in src/lenient_json_normalizer.rs and src/json_decode_error.rs, which are covered by tests in tests/.
  • Product requirements and implementation behavior are aligned with doc/json_prd.zh_CN.md and doc/json_design.zh_CN.md.