json_threat_protection/
lib.rs

1//! A crate to protect against malicious JSON payloads.
2//!
3//! This crate provides functionality to validate JSON payloads against a set of constraints.
4//! * Maximum depth of the JSON structure.
5//! * Maximum length of strings.
6//! * Maximum number of entries in arrays.
7//! * Maximum number of entries in objects.
8//! * Maximum length of object entry names.
9//! * Whether to allow duplicate object entry names.
10//!
11//! This crate is designed to process untrusted JSON payloads,
12//! such as it does not use recursion to validate the JSON structure.
13//!
14//! # Examples
15//!
16//! ```rust
17//! use json_threat_protection as jtp;
18//!
19//! fn reject_highly_nested_json(data: &[u8], depth: usize) -> Result<(), jtp::Error> {
20//!     jtp::from_slice(data).with_max_depth(depth).validate()
21//! }
22//!
23//! fn reject_too_long_strings(data: &[u8], max_string_length: usize) -> Result<(), jtp::Error> {
24//!     jtp::from_slice(data).with_max_string_length(max_string_length).validate()
25//! }
26//!
27//! fn reject_too_many_array_entries(data: &[u8], max_array_entries: usize) -> Result<(), jtp::Error> {
28//!     jtp::from_slice(data).with_max_array_entries(max_array_entries).validate()
29//! }
30//!
31//! fn reject_too_many_object_entries(data: &[u8], max_object_entries: usize) -> Result<(), jtp::Error> {
32//!    jtp::from_slice(data).with_max_object_entries(max_object_entries).validate()
33//! }
34//!
35//! fn reject_too_long_object_entry_names(data: &[u8], max_object_entry_name_length: usize) -> Result<(), jtp::Error> {
36//!    jtp::from_slice(data).with_max_object_entry_name_length(max_object_entry_name_length).validate()
37//! }
38//!
39//! fn reject_duplicate_object_entry_names(data: &[u8]) -> Result<(), jtp::Error> {
40//!   jtp::from_slice(data).disallow_duplicate_object_entry_name().validate()
41//! }
42//! ```
43//!
44//! # Default constraints
45//!
46//! By default, the validator just checks the JSON syntax without any constraints,
47//! and also allows duplicate object entry names.
48//!
49//! You could set the limit to [`NO_LIMIT`] to disable a specific constraint.
50//!
51//! # Incremental validation
52//!
53//! The `Validator` struct is designed to be used incrementally,
54//! so you can validate huge JSON payloads in multiple function calls
55//! without blocking the current thread for a long time.
56//!
57//! ```rust
58//! use json_threat_protection as jtp;
59//!
60//! fn validate_incrementally(data: &[u8]) -> Result<(), jtp::Error> {
61//!     let mut validator = jtp::from_slice(data);
62//!     
63//!     // validate the JSON payload in 2000 steps,
64//!     // and return `Some(true)` if the validation is finished and no errors.
65//!     // return `Some(false)` to continue the validation.
66//!     // Otherwise, return `Err` if an error occurred.
67//!     while validator.validate_with_steps(2000)? {
68//!         // do something else such as processing other tasks
69//!     }
70//!
71//!     Ok(())
72//! }
73//! ```
74//!
75//! This feature is useful when you want to validate a JSON payload in a non-blocking way,
76//! the typical use case is used to build FFI bindings to other software
77//! that needs to validate JSON payloads in a non-blocking way to avoid blocking the thread.
78//!
79//! # Error handling
80//!
81//! This crate has limited place where might panic, most of errors are returned as `Err`.
82//! And some unintended bugs might also return as `Err` with explicit error kind
83//! to indicate we are running into buggy code.
84//!
85//! Whatever the error kind is,
86//! it always contains the position where the error occurred, such as line, column, and offset,
87//! the `offset` is the byte offset from the beginning of the JSON payload.
88//!
89//! # Special behavior compared to [serde_json](https://crates.io/crates/serde_json)
90//!
91//! This crate do it best to keep consistent with `serde_json`'s behavior
92//! using the [cargo-fuzz](https://crates.io/crates/cargo-fuzz)
93//! to process the same JSON payloading with both this crate and `serde_json`
94//! and compare the validation results.
95//!
96//! However, there are some differences between this crate and `serde_json` so far:
97//! * This crate allow any precision of numbers,
98//!   even if it cannot be represented in Rust's native number types ([`i64`], [`u64`], [`i128`], [`u128`], [`f64`], [`f128`]).
99//!   The [serde_json](https://crates.io/crates/serde_json)
100//!   without [arbitrary_precision](https://github.com/serde-rs/json/blob/3f1c6de4af28b1f6c5100da323f2bffaf7c2083f/Cargo.toml#L69-L75)
101//!   feature enabled will return an error for such numbers.
102//!
103//! # Performance
104//!
105//! This crate is designed to be fast and efficient,
106//! and has its own benchmark suite under the `benches` directory.
107//! You can run the benchmarks with the following command:
108//! ```bash
109//! JSON_FILE=/path/to/file.json cargo bench --bench memory -- --verbose
110//! ```
111//!
112//! This suite validates the JSON syntax using both this crate and `serde_json`,
113//! you could get your own performance number by specifying the `JSON_FILE` to your dataset.
114//!
115//! # Fuzzing
116//!
117//! This crate is fuzzed using the [cargo-fuzz](https://crates.io/crates/cargo-fuzz) tool,
118//! program is under the `fuzz` directory.
119//!
120//! The initial seed corpus is from [nlohmann/json_test_data](https://github.com/nlohmann/json_test_data/),
121//! and extra corpus follows the [nlohmann/json/blob/develop/tests/fuzzing](https://github.com/nlohmann/json/blob/develop/tests/fuzzing.md).
122//!
123mod lexer;
124pub mod read;
125mod validator;
126
127use read::{IoRead, Read, SliceRead, StrRead};
128
129/// Represents no limit for a specific constraint.
130pub const NO_LIMIT: usize = std::usize::MAX;
131pub use lexer::LexerError;
132pub use read::ReadError;
133pub use validator::ValidatorError as Error;
134
135/// The JSON validator.
136pub struct Validator<R: Read> {
137    inner: validator::Validator<R>,
138}
139
140impl<R: Read> Validator<R> {
141    /// Creates a new `Validator` instance with the given reader without any constraints.
142    /// You could prefer to use the [`from_slice`], [`from_str`], or [`from_reader`] functions
143    pub fn new(read: R) -> Self {
144        Validator {
145            inner: validator::Validator::new(
146                read, NO_LIMIT, NO_LIMIT, NO_LIMIT, NO_LIMIT, NO_LIMIT, true,
147            ),
148        }
149    }
150
151    /// Sets the maximum depth of the JSON structure.
152    pub fn with_max_depth(mut self, max_depth: usize) -> Self {
153        let inner = self.inner.with_max_depth(max_depth);
154        self.inner = inner;
155        self
156    }
157
158    /// Sets the maximum length of strings.
159    pub fn with_max_string_length(mut self, max_string_length: usize) -> Self {
160        let inner = self.inner.with_max_string_length(max_string_length);
161        self.inner = inner;
162        self
163    }
164
165    /// Sets the maximum number of entries in arrays.
166    pub fn with_max_array_entries(mut self, max_array_length: usize) -> Self {
167        let inner = self.inner.with_max_array_entries(max_array_length);
168        self.inner = inner;
169        self
170    }
171
172    /// Sets the maximum number of entries in objects.
173    pub fn with_max_object_entries(mut self, max_object_length: usize) -> Self {
174        let inner = self.inner.with_max_object_entries(max_object_length);
175        self.inner = inner;
176        self
177    }
178
179    /// Sets the maximum length of object entry names.
180    pub fn with_max_object_entry_name_length(
181        mut self,
182        max_object_entry_name_length: usize,
183    ) -> Self {
184        let inner = self
185            .inner
186            .with_max_object_entry_name_length(max_object_entry_name_length);
187        self.inner = inner;
188        self
189    }
190
191    /// Allows duplicate object entry names.
192    pub fn allow_duplicate_object_entry_name(mut self) -> Self {
193        let inner = self.inner.allow_duplicate_object_entry_name();
194        self.inner = inner;
195        self
196    }
197
198    /// Disallows duplicate object entry names.
199    pub fn disallow_duplicate_object_entry_name(mut self) -> Self {
200        let inner = self.inner.disallow_duplicate_object_entry_name();
201        self.inner = inner;
202        self
203    }
204
205    /// Validates the JSON payload in a single call, and consumes current [`Validator`] instance.
206    ///
207    /// # Returns
208    ///
209    /// * `Ok(())` - If the JSON payload is valid and did not violate any constraints.
210    /// * `Err` - If the JSON payload is invalid or violates any constraints.
211    ///
212    /// # Errors
213    ///
214    /// * [`Error`] - If the JSON payload is invalid or violates any constraints.
215    ///
216    /// In the extreme case, the error might return an `Err` and indicate
217    /// this crate is running into buggy code.
218    /// Please report it to the crate maintainer if you see this error.
219    pub fn validate(self) -> Result<(), validator::ValidatorError> {
220        self.inner.validate()
221    }
222
223    /// Validates the JSON payload in multiple calls.
224    ///
225    /// # Arguments
226    ///
227    /// * `steps` - The number of steps to validate the JSON payload,
228    ///             roughly corresponds to the number of tokens processed.
229    ///
230    /// # Returns
231    ///
232    /// * `Ok(true)` - If the validation is finished and no errors.
233    /// * `Ok(false)` - If the validation is not finished yet, and you should call this function again.
234    /// * `Err` - If the JSON payload is invalid or violates any constraints.
235    ///
236    /// # Errors
237    ///
238    /// * [`Error`] - If the JSON payload is invalid or violates any constraints.
239    ///
240    /// In the extreme case, the error might return an `Err` and indicate
241    /// this crate is running into buggy code.
242    /// Please report it to the crate maintainer if you see this error.
243    /// 
244    /// # WARNING
245    /// 
246    /// The validator will be invalidated once this method
247    /// returns an `Err` or `Ok(true)`,
248    /// and calling any methods for [`Validator`] instance is undefined behavior.
249    pub fn validate_with_steps(&mut self, steps: usize) -> Result<bool, validator::ValidatorError> {
250        self.inner.validate_with_steps(steps)
251    }
252}
253
254/// Creates a new `Validator` instance with the given slice of bytes without any constraints.
255pub fn from_slice(slice: &[u8]) -> Validator<SliceRead> {
256    Validator::new(SliceRead::new(slice))
257}
258
259/// Creates a new `Validator` instance with the given `&str` without any constraints.
260pub fn from_str(string: &str) -> Validator<StrRead> {
261    Validator::new(StrRead::new(string))
262}
263
264/// Creates a new `Validator` instance with the given reader without any constraints.
265///
266/// # Arguments
267///
268/// * `reader` - The value that implements the [`std::io::Read`] trait.
269///
270/// # Performance
271///
272/// Constructing a `Validator` instance with a reader is slower than
273/// using [`from_slice`] or [`from_str`] functions.
274///
275/// And also it is recommended to use the [`std::io::BufReader`] to wrap the reader
276/// to improve the performance instead of using the reader directly.
277///
278/// # Examples
279///
280/// ```rust
281/// use std::io::BufReader;
282///
283/// fn validate_from_reader<R: std::io::Read>(reader: R) -> Result<(), json_threat_protection::Error> {
284///     let buf_reader = BufReader::new(std::fs::File::open("huge.json").unwrap());
285///     json_threat_protection::from_reader(buf_reader).validate()
286/// }
287/// ```
288pub fn from_reader<R: std::io::Read>(reader: R) -> Validator<IoRead<R>> {
289    Validator::new(IoRead::new(reader))
290}