refpack/lib.rs
1////////////////////////////////////////////////////////////////////////////////
2// This Source Code Form is subject to the terms of the Mozilla Public /
3// License, v. 2.0. If a copy of the MPL was not distributed with this /
4// file, You can obtain one at https://mozilla.org/MPL/2.0/. /
5// /
6////////////////////////////////////////////////////////////////////////////////
7
8//! A very overengineered rust crate for compressing and decompressing data in
9//! the RefPack format utilized by many EA games of the early 2000s
10//!
11//! # RefPack
12//! RefPack, also known as QFS, is a semi-standardized compression format
13//! utilized by many games published by Electronic Arts from the 90s to the late
14//! 2000s. In many cases, it was deployed with a custom header format.
15//!
16//! ## Structure
17//! RefPack shares many similarities with lz77 compression; it is a lossless
18//! compression format which relies on length-distance pairs to existing bytes
19//! within the decompression buffer. Where it differs from lz77 is that rather
20//! than a single format for "Literal" control codes and "Pointer" control
21//! codes, RefPack uses 4 distinct control codes for different sizes of pointers
22//! and literal blocks. A fifth control code is also present to indicate end of
23//! stream rather than requiring a size to be specified before decompression.
24//!
25//! ### Codes
26//! RefPack utilizes one "Literal" bytes-only control code similar to lz77, but
27//! with limited precision to multiples of 4. The remaining three control codes
28//! are varying sizes of "Pointer" control codes, for small, medium, and large
29//! back-references and lengths. The limited precision of the "Literal" control
30//! code is compensated for via "Pointer" control codes also having the ability
31//! to write up to 3 literal bytes to the stream
32//!
33//! See [Command](crate::data::control::Command) for further details.
34//!
35//! ## Decompression
36//! Decompression simply requires reading from a stream of `RefPack` data until
37//! a stopcode is reached.
38//!
39//! See [decompression](crate::data::decompression) for further details
40//!
41//!
42//! ## Compression
43//! Compressing via RefPack is largely similar to lz77 compression algorithms,
44//! and involves a sliding window over the data to search for repeating blocks,
45//! and then writing to the stream as the previously specified codes.
46//!
47//! See [compression](crate::data::compression) for further details
48//!
49//! ## Headers
50//! While the actual data block of RefPack has only one known implementation,
51//! multiple types of headers for the library have been identified.
52//!
53//! ## Other Implementations
54//!
55//! RefPack has been implemented in various other languages and for various
56//! games:
57//!
58//! - [RefPack.cpp (download)](http://download.wcnews.com/files/documents/sourcecode/shadowforce/transfer/asommers/mfcapp_src/engine/compress/RefPack.cpp):
59//! Original canonical implementation of RefPack by Frank Barchard for Origin
60//! Software. Utilized by some early Origin Software games.
61//! - [JDBPF](https://github.com/actioninja/JDBPF/blob/90644a3286580aa7676779a2d2e5a3c9de9a31ff/src/ssp/dbpf/converter/DBPFPackager.java#L398C9-L398C9):
62//! Early Simcity 4 Java Library for reading DBPF files which utilize RefPack
63//! - [JDBPFX](https://github.com/actioninja/JDBPF/blob/90644a3286580aa7676779a2d2e5a3c9de9a31ff/src/ssp/dbpf/converter/DBPFPackager.java#L398C9-L398C9):
64//! Later currently maintained fork of JDBPF
65//! - [DBPFSharp](https://github.com/0xC0000054/DBPFSharp/blob/3038b9c15b0ddd3ccfb4b72bc6ac4541eee677fb/src/DBPFSharp/QfsCompression.cs#L100):
66//! Simcity 4 DBPF Library written in C#
67//! - [Sims2Tools](https://github.com/whoward69/Sims2Tools/blob/0baaf2dce985474215cf0f64096a8dd9950c2757/DbpfLibrary/Utils/Decompressor.cs#L54C1-L54C1):
68//! Sims 2 DBPF Library written in C#
69//!
70//!
71//! # This Crate
72//!
73//! This crate is a rust implementation designed to compress and decompress
74//! refpack data with any header format. It uses generics to support arbitrary
75//! header formats to allow pure usage of this library without having to write
76//! "glue" code to parse header info.
77//!
78//! Put simply, this means that you get the benefit of being able to use any
79//! format however you like without any performance overhead from dynamic
80//! dispatch, as well as being able to implement your own arbitrary formats that
81//! are still compatible with the same compression algorithms.
82//!
83//! # Usage
84//!
85//! `refpack-rs` exposes two functions: `compress` and `decompress`, along with
86//! `easy` variants with easier but less flexible of usage.
87//!
88//! `compress` and `decompress` take mutable references to a buffer to read and
89//! write from, that implements `std::io::Read` and `std::io::Write`,
90//! respectively.
91//!
92//! `decompress` will read from the buffer until it encounters a stopcode (byte
93//! within (0xFC..=0xFF)), while `compress` will read in the provided length.
94//!
95//! all compression and decompression functions accept one generic argument
96//! constrained to the [Format](crate::format::Format) trait. Implementations
97//! be "unconstructable" types, with the recommended type being an empty enum.
98//!
99//! ## Implementations
100//!
101//! | Format | Games | Header |
102//! |--------|-------|--------|
103//! | [Reference](crate::format::Reference) | Various 90s Origin Software and EA games | [Reference](crate::header::Reference) |
104//! | [Maxis](crate::format::Maxis) | The Sims, The Sims Online, Simcity 4, The Sims 2 | [Maxis](crate::header::Maxis) |
105//! | [SimEA](crate::format::SimEA) | The Sims 3, The Sims 4 | [SimEA](crate::header::SimEA) |
106//!
107//!
108//! ### Example
109//!
110//! ```
111//! use std::io::{Cursor, Seek};
112//!
113//! use refpack::format::Reference;
114//!
115//! # fn main() {
116//! let mut source_reader = Cursor::new(b"Hello World!".to_vec());
117//! let mut out_buf = Cursor::new(vec![]);
118//! refpack::compress::<Reference>(
119//! source_reader.get_ref().len(),
120//! &mut source_reader,
121//! &mut out_buf,
122//! refpack::CompressionOptions::Optimal,
123//! )
124//! .unwrap();
125//! # }
126//! ```
127//!
128//! The easy variants are `compress_easy` and `decompress_easy`, which take a
129//! `&[u8]` and return a `Result<Vec<u8>, RefPackError>`.
130//!
131//! Internally they simply call `compress` and `decompress` with a `Cursor` to
132//! the input and output buffers, however they are more convenient to use in
133//! many cases.
134
135// I like clippy to yell at me about everything!
136#![warn(clippy::pedantic, clippy::cargo)]
137// Due to the high amount of byte conversions, sometimes intentional lossy conversions are
138// necessary.
139#![allow(clippy::cast_possible_truncation)]
140// same as above
141#![allow(clippy::cast_lossless)]
142// and above
143#![allow(clippy::cast_possible_wrap)]
144// above
145#![allow(clippy::cast_precision_loss)]
146// Annoying and wrong, RefPack is a compression scheme.
147#![allow(clippy::doc_markdown)]
148// Default::default() is more idiomatic imo
149#![allow(clippy::default_trait_access)]
150// too many lines is a dumb metric
151#![allow(clippy::too_many_lines)]
152// causes weirdness with header and reader
153#![allow(clippy::similar_names)]
154// all uses of #[inline(always)] have been benchmarked thoroughly
155#![allow(clippy::inline_always)]
156
157pub mod data;
158mod error;
159pub mod format;
160pub mod header;
161
162pub use crate::data::compression::{CompressionOptions, compress, easy_compress};
163pub use crate::data::decompression::{decompress, easy_decompress};
164pub use crate::error::{Error as RefPackError, Result as RefPackResult};
165
166#[cfg(test)]
167mod test {
168 use proptest::collection::vec;
169 use proptest::num::u8;
170 use proptest::prop_assert_eq;
171 use test_strategy::proptest;
172
173 use crate::data::compression::CompressionOptions;
174 use crate::format::{Maxis, Reference, SimEA};
175 use crate::{easy_compress, easy_decompress};
176
177 #[proptest]
178 fn reference_symmetrical_read_write(
179 #[strategy(vec(0..=1u8, 1..1000))] data: Vec<u8>,
180 compression_options: CompressionOptions,
181 ) {
182 let compressed = easy_compress::<Reference>(&data, compression_options).unwrap();
183
184 let got = easy_decompress::<Reference>(&compressed).unwrap();
185
186 prop_assert_eq!(data, got);
187 }
188
189 // excluded from normal runs to avoid massive CI runs, *very* long running
190 // tests
191 #[proptest]
192 #[ignore]
193 fn reference_large_symmetrical_read_write(
194 #[strategy(vec(0..=1u8, 1..16_000_000))] data: Vec<u8>,
195 compression_options: CompressionOptions,
196 ) {
197 let compressed = easy_compress::<Reference>(&data, compression_options).unwrap();
198
199 let got = easy_decompress::<Reference>(&compressed).unwrap();
200
201 prop_assert_eq!(data, got);
202 }
203
204 #[proptest]
205 fn maxis_symmetrical_read_write(
206 #[strategy(vec(0..=1u8, 1..1000))] data: Vec<u8>,
207 compression_options: CompressionOptions,
208 ) {
209 let compressed = easy_compress::<Maxis>(&data, compression_options).unwrap();
210
211 let got = easy_decompress::<Maxis>(&compressed).unwrap();
212
213 prop_assert_eq!(data, got);
214 }
215
216 // excluded from normal runs to avoid massive CI runs, *very* long running
217 // tests
218 #[proptest]
219 #[ignore]
220 fn maxis_large_symmetrical_read_write(
221 #[strategy(vec(0..=1u8, 1..16_000_000))] data: Vec<u8>,
222 compression_options: CompressionOptions,
223 ) {
224 let compressed = easy_compress::<Maxis>(&data, compression_options).unwrap();
225
226 let got = easy_decompress::<Maxis>(&compressed).unwrap();
227
228 prop_assert_eq!(data, got);
229 }
230
231 #[proptest]
232 fn simea_symmetrical_read_write(
233 // this should include inputs of > 16mb, but testing those inputs is extremely slow
234 #[strategy(vec(0..=1u8, 1..1000))] data: Vec<u8>,
235 compression_options: CompressionOptions,
236 ) {
237 let compressed = easy_compress::<SimEA>(&data, compression_options).unwrap();
238
239 let got = easy_decompress::<SimEA>(&compressed).unwrap();
240
241 prop_assert_eq!(data, got);
242 }
243
244 // excluded from normal runs to avoid massive CI runs, *very* long running
245 // tests
246 #[proptest]
247 #[ignore]
248 fn simea_large_symmetrical_read_write(
249 #[strategy(vec(0..=1u8, 1..16_000_000))] data: Vec<u8>,
250 compression_options: CompressionOptions,
251 ) {
252 let compressed = easy_compress::<SimEA>(&data, compression_options).unwrap();
253
254 let got = easy_decompress::<SimEA>(&compressed).unwrap();
255
256 prop_assert_eq!(data, got);
257 }
258
259 /// the decoder should not panic while decoding garbage data
260 #[proptest]
261 fn reference_decompress_garbage(#[strategy(vec(0..=1u8, 1..1000))] data: Vec<u8>) {
262 let _ = easy_decompress::<Reference>(&data);
263 }
264
265 /// the decoder should not panic while decoding garbage data
266 #[proptest]
267 fn maxis_decompress_garbage(#[strategy(vec(0..=1u8, 1..1000))] mut data: Vec<u8>) {
268 if data.len() >= 6 {
269 // set the correct flags
270 data[4] = 0x10;
271 // set the correct magic
272 data[5] = 0xFB;
273 }
274 let _ = easy_decompress::<Maxis>(&data);
275 }
276
277 /// the decoder should not panic while decoding garbage data
278 #[proptest]
279 fn simea_decompress_garbage(#[strategy(vec(0..=1u8, 1..1000))] mut data: Vec<u8>) {
280 if data.len() >= 2 {
281 // set the correct flags
282 data[0] = 0x00;
283 // set the correct magic
284 data[1] = 0xFB;
285 }
286 let _ = easy_decompress::<SimEA>(&data);
287 }
288}