php_serde/
lib.rs

1#![forbid(missing_docs)]
2#![allow(clippy::float_cmp)]
3//! # PHP serialization format support for serde
4//!
5//! PHP uses a custom serialization format through its
6//! [`serialize()`](https://www.php.net/manual/en/function.serialize.php)
7//! and
8//! [`unserialize()`](https://www.php.net/manual/en/function.unserialize.php)
9//! methods. This crate adds partial support for this format using `serde`.
10//!
11//! An overview of the format can be seen at
12//! <https://stackoverflow.com/questions/14297926/structure-of-a-serialized-php-string>,
13//! details are available at
14//! <http://www.phpinternalsbook.com/php5/classes_objects/serialization.html>.
15//!
16//! ## What is supported?
17//!
18//! * Basic and compound types:
19//!
20//!   | PHP type                | Rust type                                             |
21//!   | ---                     | ---                                                   |
22//!   | boolean                 | `bool`                                                |
23//!   | integer                 | `i64` (automatic conversion to other types supported) |
24//!   | float                   | `f64` (automatic conversion to `f32` supported)       |
25//!   | strings                 | `Vec<u8>` (PHP strings are not UTF8)                  |
26//!   | null                    | decoded as `None`                                     |
27//!   | array (non-associative) | tuple `struct`s or `Vec<_>`                           |
28//!   | array (associative)     | regular `struct`s or `HashMap<_, _>`                  |
29//!
30//! * Rust `String`s are transparently UTF8-converted to PHP bytestrings.
31//!
32//! ### Out-of-order arrays
33//!
34//! PHP arrays can be created "out of order", as they store every array index as an
35//! explicit integer in the array. Thus the following code
36//!
37//! ```php
38//! $arr = array();
39//! $arr[0] = "zero";
40//! $arr[3] = "three";
41//! $arr[2] = "two";
42//! $arr[1] = "one";
43//! ```
44//!
45//! results in an array that would be equivalent to ["zero", "one", "two", "three"],
46//! at least when iterated over.
47//!
48//! Because deserialization does not buffer values, these arrays cannot be directly
49//! serialized into a `Vec`. Instead they should be deserialized into a map, which
50//! can then be turned into a `Vec` if desired.
51//!
52//! A second concern are "holes" in the array, e.g. if the entry with key `1` is
53//! missing. How to fill these is typically up to the user.
54//!
55//! The helper function `deserialize_unordered_array` can be used with serde's
56//! `deserialize_with` decorator to automatically buffer and order things, as well
57//! as plugging holes by closing any gaps.
58//!
59//! ## What is missing?
60//!
61//! * PHP objects
62//! * Non-string/numeric array keys, except when deserializing into a `HashMap`
63//! * Mixed arrays. Array keys are assumed to always have the same key type
64//!   (Note: If this is required, consider extending this library with a variant
65//!    type).
66//!
67//! ## Example use
68//!
69//! Given an example data structure storing a session token using the following
70//! PHP code
71//!
72//! ```php
73//! <?php
74//! $serialized = serialize(array("user", "", array()));
75//! echo($serialized);
76//! ```
77//!
78//! and thus the following output
79//!
80//! ```text
81//! a:3:{i:0;s:4:"user";i:1;s:0:"";i:2;a:0:{}}
82//! ```
83//!
84//! , the data can be reconstructed using the following rust code:
85//!
86//! ```rust
87//! use serde::Deserialize;
88//! use php_serde::from_bytes;
89//!
90//! #[derive(Debug, Deserialize, Eq, PartialEq)]
91//! struct Data(Vec<u8>, Vec<u8>, SubData);
92//!
93//! #[derive(Debug, Deserialize, Eq, PartialEq)]
94//! struct SubData();
95//!
96//! let input = br#"a:3:{i:0;s:4:"user";i:1;s:0:"";i:2;a:0:{}}"#;
97//! assert_eq!(
98//!     from_bytes::<Data>(input).unwrap(),
99//!     Data(b"user".to_vec(), b"".to_vec(), SubData())
100//! );
101//! ```
102//!
103//! Likewise, structs are supported as well, if the PHP arrays use keys:
104//!
105//! ```php
106//! <?php
107//! $serialized = serialize(
108//!     array("foo" => true,
109//!           "bar" => "xyz",
110//!           "sub" => array("x" => 42))
111//! );
112//! echo($serialized);
113//! ```
114//!
115//! In Rust:
116//!
117//! ```rust
118//!# use serde::Deserialize;
119//!# use php_serde::from_bytes;
120//! #[derive(Debug, Deserialize, Eq, PartialEq)]
121//! struct Outer {
122//!     foo: bool,
123//!     bar: String,
124//!     sub: Inner,
125//! }
126//!
127//! #[derive(Debug, Deserialize, Eq, PartialEq)]
128//! struct Inner {
129//!     x: i64,
130//! }
131//!
132//! let input = br#"a:3:{s:3:"foo";b:1;s:3:"bar";s:3:"xyz";s:3:"sub";a:1:{s:1:"x";i:42;}}"#;
133//! let expected = Outer {
134//!     foo: true,
135//!     bar: "xyz".to_owned(),
136//!     sub: Inner { x: 42 },
137//! };
138//!
139//! let deserialized: Outer = from_bytes(input).expect("deserialization failed");
140//!
141//! assert_eq!(deserialized, expected);
142//! ```
143//!
144//! ### Optional values
145//!
146//! Missing values can be left optional, as in this example:
147//!
148//! ```php
149//! <?php
150//! $location_a = array();
151//! $location_b = array("province" => "Newfoundland and Labrador, CA");
152//! $location_c = array("postalcode" => "90002",
153//!                     "country" => "United States of America");
154//! echo(serialize($location_a) . "\n");
155//! echo(serialize($location_b) . "\n");
156//! # -> a:1:{s:8:"province";s:29:"Newfoundland and Labrador, CA";}
157//! echo(serialize($location_c) . "\n");
158//! # -> a:2:{s:10:"postalcode";s:5:"90002";s:7:"country";
159//! #         s:24:"United States of America";}
160//! ```
161//!
162//! The following declaration of `Location` will be able to parse all three
163//! example inputs.
164//!
165//! ```rust
166//!# use serde::Deserialize;
167//! #[derive(Debug, Deserialize, Eq, PartialEq)]
168//! struct Location {
169//!     province: Option<String>,
170//!     postalcode: Option<String>,
171//!     country: Option<String>,
172//! }
173//! ```
174//!
175//! # Full roundtrip example
176//!
177//! ```rust
178//! use serde::{Deserialize, Serialize};
179//! use php_serde::{to_vec, from_bytes};
180//!
181//! #[derive(Debug, Deserialize, Eq, PartialEq, Serialize)]
182//! struct UserProfile {
183//!     id: u32,
184//!     name: String,
185//!     tags: Vec<String>,
186//! }
187//!
188//! let orig = UserProfile {
189//!     id: 42,
190//!     name: "Bob".to_owned(),
191//!     tags: vec!["foo".to_owned(), "bar".to_owned()],
192//! };
193//!
194//! let serialized = to_vec(&orig).expect("serialization failed");
195//! let expected = br#"a:3:{s:2:"id";i:42;s:4:"name";s:3:"Bob";s:4:"tags";a:2:{i:0;s:3:"foo";i:1;s:3:"bar";}}"#;
196//! assert_eq!(serialized, &expected[..]);
197//!
198//! let profile: UserProfile = from_bytes(&serialized).expect("deserialization failed");
199//! assert_eq!(profile, orig);
200//! ```
201
202// Rustc lints
203// <https://doc.rust-lang.org/rustc/lints/listing/allowed-by-default.html>
204#![warn(
205    anonymous_parameters,
206    bare_trait_objects,
207    elided_lifetimes_in_paths,
208    rust_2018_idioms,
209    trivial_casts,
210    trivial_numeric_casts,
211    unsafe_code,
212    unused_extern_crates,
213    unused_import_braces
214)]
215// Clippy lints
216// <https://rust-lang.github.io/rust-clippy/current/>
217#![warn(
218    clippy::all,
219    clippy::dbg_macro,
220    clippy::float_cmp_const,
221    clippy::get_unwrap,
222    clippy::mem_forget,
223    clippy::nursery,
224    clippy::pedantic,
225    clippy::todo,
226    clippy::unwrap_used,
227    clippy::wrong_pub_self_convention
228)]
229// Allow some clippy lints
230#![allow(
231    clippy::default_trait_access,
232    clippy::doc_markdown,
233    clippy::if_not_else,
234    clippy::must_use_candidate,
235    clippy::needless_pass_by_value,
236    clippy::pub_enum_variant_names,
237    clippy::use_self,
238    clippy::cargo_common_metadata,
239    clippy::missing_errors_doc,
240    clippy::enum_glob_use,
241    clippy::struct_excessive_bools,
242    clippy::module_name_repetitions,
243    clippy::used_underscore_binding,
244    clippy::future_not_send,
245    clippy::missing_const_for_fn,
246    clippy::type_complexity,
247    clippy::option_if_let_else
248)]
249// Allow some lints while testing
250#![cfg_attr(
251    test,
252    allow(clippy::unwrap_used, clippy::blacklisted_name, clippy::float_cmp)
253)]
254
255mod de;
256mod error;
257mod ser;
258
259pub use de::{deserialize_unordered_array, from_bytes};
260pub use error::{Error, Result};
261pub use ser::{to_vec, to_writer};
262
263#[cfg(test)]
264mod tests {
265    use super::{from_bytes, to_vec};
266    use proptest::prelude::any;
267    use proptest::proptest;
268    use serde::{Deserialize, Serialize};
269    use std::collections::HashMap;
270
271    macro_rules! roundtrip {
272        ($ty:ty, $value:expr) => {
273            let val = $value;
274
275            let serialized = to_vec(&val).expect("Serialization failed");
276            eprintln!("{}", String::from_utf8_lossy(serialized.as_slice()));
277
278            let deserialized: $ty =
279                from_bytes(serialized.as_slice()).expect("Deserialization failed");
280
281            assert_eq!(deserialized, val);
282        };
283    }
284
285    #[test]
286    fn roundtrip_newtype() {
287        #[derive(Debug, Deserialize, Eq, PartialEq, Serialize)]
288        struct MyNewtype(i32);
289
290        roundtrip!(MyNewtype, MyNewtype(0));
291        roundtrip!(MyNewtype, MyNewtype(1));
292        roundtrip!(MyNewtype, MyNewtype(-1));
293    }
294
295    proptest! {
296        #[test]
297        fn roundtrip_unit(v in any::<()>()) {
298            roundtrip!((), v);
299        }
300
301        #[test]
302        fn roundtrip_bool(v in any::<bool>()) {
303            roundtrip!(bool, v);
304        }
305
306        #[test]
307        fn roundtrip_u8(v in any::<u8>()) {
308            roundtrip!(u8, v);
309        }
310
311        #[test]
312        fn roundtrip_u16(v in any::<u16>()) {
313            roundtrip!(u16, v);
314        }
315
316        #[test]
317        fn roundtrip_u32(v in any::<u32>()) {
318            roundtrip!(u32, v);
319        }
320
321        #[test]
322        fn roundtrip_u64(v in 0..(std::i64::MAX as u64)) {
323            roundtrip!(u64, v);
324        }
325
326        #[test]
327        fn roundtrip_i8(v in any::<i8>()) {
328            roundtrip!(i8, v);
329        }
330
331        #[test]
332        fn roundtrip_i16(v in any::<i16>()) {
333            roundtrip!(i16, v);
334        }
335
336        #[test]
337        fn roundtrip_i32(v in any::<i32>()) {
338            roundtrip!(i32, v);
339        }
340
341        #[test]
342        fn roundtrip_i64(v in any::<i64>()) {
343            roundtrip!(i64, v);
344        }
345
346        #[test]
347        fn roundtrip_f32(v in any::<f32>()) {
348            roundtrip!(f32, v);
349        }
350
351        #[test]
352        fn roundtrip_f64(v in any::<f64>()) {
353            roundtrip!(f64, v);
354        }
355
356        #[test]
357        fn roundtrip_bytes(v in any::<Vec<u8>>()) {
358            roundtrip!(Vec<u8>, v);
359        }
360
361        #[test]
362        fn roundtrip_char(v in any::<char>()) {
363            roundtrip!(char, v);
364        }
365
366        #[test]
367        fn roundtrip_string(v in any::<String>()) {
368            roundtrip!(String, v);
369        }
370
371        #[test]
372        fn roundtrip_option(v in any::<Option<i32>>()) {
373            roundtrip!(Option<i32>, v);
374        }
375
376        #[test]
377        fn roundtrip_same_type_tuple(v in any::<(u32, u32)>()) {
378            roundtrip!((u32, u32), v);
379        }
380
381        #[test]
382        fn roundtrip_mixed_type_tuple(v in any::<(String, i32)>()) {
383            roundtrip!((String, i32), v);
384        }
385
386        #[test]
387        fn roundtrip_string_string_hashmap(v in proptest::collection::hash_map(any::<String>(), any::<String>(), 0..100)) {
388            roundtrip!(HashMap<String, String>, v);
389        }
390    }
391
392    use std::io::prelude::*;
393    use std::io::Result;
394    use std::io::SeekFrom;
395    use std::process::Command;
396    use tempfile::tempfile;
397
398    fn through_php(bytes: &[u8]) -> Result<Vec<u8>> {
399        let mut file = tempfile()?;
400        file.write_all(bytes)?;
401        file.seek(SeekFrom::Start(0))?;
402
403        let res = Command::new("php")
404            .stdin(file)
405            .args(&[
406                "-r",
407                "print(serialize(unserialize(file_get_contents('php://stdin'))));",
408            ])
409            .output()?;
410
411        Ok(res.stdout)
412    }
413
414    macro_rules! php_roundtrip {
415        ($ty:ty, $value:expr) => {
416            let val = $value;
417            let serialized = to_vec(&val).expect("Serialization failed");
418            eprintln!(
419                "serialized={:?}",
420                String::from_utf8_lossy(serialized.as_slice())
421            );
422            let output = through_php(serialized.as_slice()).expect("Failed to deser&ser with php");
423            eprintln!("output={:?}", String::from_utf8_lossy(output.as_slice()));
424            let deserialized: $ty = from_bytes(output.as_slice()).expect("Deserialization failed");
425            // php output will differ sometimes, only checking that deserialized value is correct
426            // assert_eq!(serialized, output);
427            assert_eq!(deserialized, val);
428        };
429    }
430
431    proptest! {
432        #[test]
433        #[ignore]
434        fn php_roundtrip_unit(v in any::<()>()) {
435            php_roundtrip!((), v);
436        }
437
438        #[test]
439        #[ignore]
440        fn php_roundtrip_bool(v in any::<bool>()) {
441            php_roundtrip!(bool, v);
442        }
443
444        #[test]
445        #[ignore]
446        fn php_roundtrip_i64(v in any::<f64>()) {
447            php_roundtrip!(f64, v);
448        }
449
450        #[test]
451        #[ignore]
452        fn php_roundtrip_u64(v in any::<f64>()) {
453            php_roundtrip!(f64, v);
454        }
455
456        #[test]
457        #[ignore]
458        fn php_roundtrip_f64(v in any::<f64>()) {
459            php_roundtrip!(f64, v);
460        }
461
462        #[test]
463        fn php_roundtrip_char(v in any::<char>()) {
464            php_roundtrip!(char, v);
465        }
466
467        #[test]
468        #[ignore]
469        fn php_roundtrip_string(v in any::<String>()) {
470            php_roundtrip!(String, v);
471        }
472
473        #[test]
474        #[ignore]
475        fn php_roundtrip_option(v in any::<Option<i32>>()) {
476            php_roundtrip!(Option<i32>, v);
477        }
478
479        #[test]
480        #[ignore]
481        fn php_roundtrip_same_type_tuple(v in any::<(u32, u32)>()) {
482            php_roundtrip!((u32, u32), v);
483        }
484
485        #[test]
486        #[ignore]
487        fn php_roundtrip_mixed_type_tuple(v in any::<(String, i32)>()) {
488            php_roundtrip!((String, i32), v);
489        }
490
491        // that'd fail on input: v = {"0": ""} because php
492        // serialized="a:1:{s:1:\"0\";s:0:\"\";}"
493        // output="a:1:{i:0;s:0:\"\";}"
494        // #[test]
495        // #[ignore]
496        // fn php_roundtrip_string_string_hashmap(v in proptest::collection::hash_map(any::<String>(), any::<String>(), 0..100)) {
497        //     php_roundtrip!(HashMap<String, String>, v);
498        // }
499    }
500}