json_threat_protection/lib.rs
1//! A crate to protect against malicious JSON payloads.
2//!
3//! This crate provides functionality to validate JSON payloads against a set of constraints.
4//! * Maximum depth of the JSON structure.
5//! * Maximum length of strings.
6//! * Maximum number of entries in arrays.
7//! * Maximum number of entries in objects.
8//! * Maximum length of object entry names.
9//! * Whether to allow duplicate object entry names.
10//!
11//! This crate is designed to process untrusted JSON payloads,
12//! such as it does not use recursion to validate the JSON structure.
13//!
14//! # Examples
15//!
16//! ```rust
17//! use json_threat_protection as jtp;
18//!
19//! fn reject_highly_nested_json(data: &[u8], depth: usize) -> Result<(), jtp::Error> {
20//! jtp::from_slice(data).with_max_depth(depth).validate()
21//! }
22//!
23//! fn reject_too_long_strings(data: &[u8], max_string_length: usize) -> Result<(), jtp::Error> {
24//! jtp::from_slice(data).with_max_string_length(max_string_length).validate()
25//! }
26//!
27//! fn reject_too_many_array_entries(data: &[u8], max_array_entries: usize) -> Result<(), jtp::Error> {
28//! jtp::from_slice(data).with_max_array_entries(max_array_entries).validate()
29//! }
30//!
31//! fn reject_too_many_object_entries(data: &[u8], max_object_entries: usize) -> Result<(), jtp::Error> {
32//! jtp::from_slice(data).with_max_object_entries(max_object_entries).validate()
33//! }
34//!
35//! fn reject_too_long_object_entry_names(data: &[u8], max_object_entry_name_length: usize) -> Result<(), jtp::Error> {
36//! jtp::from_slice(data).with_max_object_entry_name_length(max_object_entry_name_length).validate()
37//! }
38//!
39//! fn reject_duplicate_object_entry_names(data: &[u8]) -> Result<(), jtp::Error> {
40//! jtp::from_slice(data).disallow_duplicate_object_entry_name().validate()
41//! }
42//! ```
43//!
44//! # Default constraints
45//!
46//! By default, the validator just checks the JSON syntax without any constraints,
47//! and also allows duplicate object entry names.
48//!
49//! You could set the limit to [`NO_LIMIT`] to disable a specific constraint.
50//!
51//! # Incremental validation
52//!
53//! The `Validator` struct is designed to be used incrementally,
54//! so you can validate huge JSON payloads in multiple function calls
55//! without blocking the current thread for a long time.
56//!
57//! ```rust
58//! use json_threat_protection as jtp;
59//!
60//! fn validate_incrementally(data: &[u8]) -> Result<(), jtp::Error> {
61//! let mut validator = jtp::from_slice(data);
62//!
63//! // validate the JSON payload in 2000 steps,
64//! // and return `Some(true)` if the validation is finished and no errors.
65//! // return `Some(false)` to continue the validation.
66//! // Otherwise, return `Err` if an error occurred.
67//! while validator.validate_with_steps(2000)? {
68//! // do something else such as processing other tasks
69//! }
70//!
71//! Ok(())
72//! }
73//! ```
74//!
75//! This feature is useful when you want to validate a JSON payload in a non-blocking way,
76//! the typical use case is used to build FFI bindings to other software
77//! that needs to validate JSON payloads in a non-blocking way to avoid blocking the thread.
78//!
79//! # Error handling
80//!
81//! This crate has limited place where might panic, most of errors are returned as `Err`.
82//! And some unintended bugs might also return as `Err` with explicit error kind
83//! to indicate we are running into buggy code.
84//!
85//! Whatever the error kind is,
86//! it always contains the position where the error occurred, such as line, column, and offset,
87//! the `offset` is the byte offset from the beginning of the JSON payload.
88//!
89//! # Special behavior compared to [serde_json](https://crates.io/crates/serde_json)
90//!
91//! This crate do it best to keep consistent with `serde_json`'s behavior
92//! using the [cargo-fuzz](https://crates.io/crates/cargo-fuzz)
93//! to process the same JSON payloading with both this crate and `serde_json`
94//! and compare the validation results.
95//!
96//! However, there are some differences between this crate and `serde_json` so far:
97//! * This crate allow any precision of numbers,
98//! even if it cannot be represented in Rust's native number types ([`i64`], [`u64`], [`i128`], [`u128`], [`f64`], [`f128`]).
99//! The [serde_json](https://crates.io/crates/serde_json)
100//! without [arbitrary_precision](https://github.com/serde-rs/json/blob/3f1c6de4af28b1f6c5100da323f2bffaf7c2083f/Cargo.toml#L69-L75)
101//! feature enabled will return an error for such numbers.
102//!
103//! # Performance
104//!
105//! This crate is designed to be fast and efficient,
106//! and has its own benchmark suite under the `benches` directory.
107//! You can run the benchmarks with the following command:
108//! ```bash
109//! JSON_FILE=/path/to/file.json cargo bench --bench memory -- --verbose
110//! ```
111//!
112//! This suite validates the JSON syntax using both this crate and `serde_json`,
113//! you could get your own performance number by specifying the `JSON_FILE` to your dataset.
114//!
115//! # Fuzzing
116//!
117//! This crate is fuzzed using the [cargo-fuzz](https://crates.io/crates/cargo-fuzz) tool,
118//! program is under the `fuzz` directory.
119//!
120//! The initial seed corpus is from [nlohmann/json_test_data](https://github.com/nlohmann/json_test_data/),
121//! and extra corpus follows the [nlohmann/json/blob/develop/tests/fuzzing](https://github.com/nlohmann/json/blob/develop/tests/fuzzing.md).
122//!
123mod lexer;
124pub mod read;
125mod validator;
126
127use read::{IoRead, Read, SliceRead, StrRead};
128
129/// Represents no limit for a specific constraint.
130pub const NO_LIMIT: usize = std::usize::MAX;
131pub use lexer::LexerError;
132pub use read::ReadError;
133pub use validator::ValidatorError as Error;
134
135/// The JSON validator.
136pub struct Validator<R: Read> {
137 inner: validator::Validator<R>,
138}
139
140impl<R: Read> Validator<R> {
141 /// Creates a new `Validator` instance with the given reader without any constraints.
142 /// You could prefer to use the [`from_slice`], [`from_str`], or [`from_reader`] functions
143 pub fn new(read: R) -> Self {
144 Validator {
145 inner: validator::Validator::new(
146 read, NO_LIMIT, NO_LIMIT, NO_LIMIT, NO_LIMIT, NO_LIMIT, true,
147 ),
148 }
149 }
150
151 /// Sets the maximum depth of the JSON structure.
152 pub fn with_max_depth(mut self, max_depth: usize) -> Self {
153 let inner = self.inner.with_max_depth(max_depth);
154 self.inner = inner;
155 self
156 }
157
158 /// Sets the maximum length of strings.
159 pub fn with_max_string_length(mut self, max_string_length: usize) -> Self {
160 let inner = self.inner.with_max_string_length(max_string_length);
161 self.inner = inner;
162 self
163 }
164
165 /// Sets the maximum number of entries in arrays.
166 pub fn with_max_array_entries(mut self, max_array_length: usize) -> Self {
167 let inner = self.inner.with_max_array_entries(max_array_length);
168 self.inner = inner;
169 self
170 }
171
172 /// Sets the maximum number of entries in objects.
173 pub fn with_max_object_entries(mut self, max_object_length: usize) -> Self {
174 let inner = self.inner.with_max_object_entries(max_object_length);
175 self.inner = inner;
176 self
177 }
178
179 /// Sets the maximum length of object entry names.
180 pub fn with_max_object_entry_name_length(
181 mut self,
182 max_object_entry_name_length: usize,
183 ) -> Self {
184 let inner = self
185 .inner
186 .with_max_object_entry_name_length(max_object_entry_name_length);
187 self.inner = inner;
188 self
189 }
190
191 /// Allows duplicate object entry names.
192 pub fn allow_duplicate_object_entry_name(mut self) -> Self {
193 let inner = self.inner.allow_duplicate_object_entry_name();
194 self.inner = inner;
195 self
196 }
197
198 /// Disallows duplicate object entry names.
199 pub fn disallow_duplicate_object_entry_name(mut self) -> Self {
200 let inner = self.inner.disallow_duplicate_object_entry_name();
201 self.inner = inner;
202 self
203 }
204
205 /// Validates the JSON payload in a single call, and consumes current [`Validator`] instance.
206 ///
207 /// # Returns
208 ///
209 /// * `Ok(())` - If the JSON payload is valid and did not violate any constraints.
210 /// * `Err` - If the JSON payload is invalid or violates any constraints.
211 ///
212 /// # Errors
213 ///
214 /// * [`Error`] - If the JSON payload is invalid or violates any constraints.
215 ///
216 /// In the extreme case, the error might return an `Err` and indicate
217 /// this crate is running into buggy code.
218 /// Please report it to the crate maintainer if you see this error.
219 pub fn validate(self) -> Result<(), validator::ValidatorError> {
220 self.inner.validate()
221 }
222
223 /// Validates the JSON payload in multiple calls.
224 ///
225 /// # Arguments
226 ///
227 /// * `steps` - The number of steps to validate the JSON payload,
228 /// roughly corresponds to the number of tokens processed.
229 ///
230 /// # Returns
231 ///
232 /// * `Ok(true)` - If the validation is finished and no errors.
233 /// * `Ok(false)` - If the validation is not finished yet, and you should call this function again.
234 /// * `Err` - If the JSON payload is invalid or violates any constraints.
235 ///
236 /// # Errors
237 ///
238 /// * [`Error`] - If the JSON payload is invalid or violates any constraints.
239 ///
240 /// In the extreme case, the error might return an `Err` and indicate
241 /// this crate is running into buggy code.
242 /// Please report it to the crate maintainer if you see this error.
243 ///
244 /// # WARNING
245 ///
246 /// The validator will be invalidated once this method
247 /// returns an `Err` or `Ok(true)`,
248 /// and calling any methods for [`Validator`] instance is undefined behavior.
249 pub fn validate_with_steps(&mut self, steps: usize) -> Result<bool, validator::ValidatorError> {
250 self.inner.validate_with_steps(steps)
251 }
252}
253
254/// Creates a new `Validator` instance with the given slice of bytes without any constraints.
255pub fn from_slice(slice: &[u8]) -> Validator<SliceRead> {
256 Validator::new(SliceRead::new(slice))
257}
258
259/// Creates a new `Validator` instance with the given `&str` without any constraints.
260pub fn from_str(string: &str) -> Validator<StrRead> {
261 Validator::new(StrRead::new(string))
262}
263
264/// Creates a new `Validator` instance with the given reader without any constraints.
265///
266/// # Arguments
267///
268/// * `reader` - The value that implements the [`std::io::Read`] trait.
269///
270/// # Performance
271///
272/// Constructing a `Validator` instance with a reader is slower than
273/// using [`from_slice`] or [`from_str`] functions.
274///
275/// And also it is recommended to use the [`std::io::BufReader`] to wrap the reader
276/// to improve the performance instead of using the reader directly.
277///
278/// # Examples
279///
280/// ```rust
281/// use std::io::BufReader;
282///
283/// fn validate_from_reader<R: std::io::Read>(reader: R) -> Result<(), json_threat_protection::Error> {
284/// let buf_reader = BufReader::new(std::fs::File::open("huge.json").unwrap());
285/// json_threat_protection::from_reader(buf_reader).validate()
286/// }
287/// ```
288pub fn from_reader<R: std::io::Read>(reader: R) -> Validator<IoRead<R>> {
289 Validator::new(IoRead::new(reader))
290}