mismall 2.0.0 - Docs.rs

# Reference: Debugging Huffman Decoding and Archive Extraction Issues

## Problem Description

The primary issue causing `InvalidData` and `UnexpectedEof` errors, particularly in archive extraction, stemmed from a fundamental misunderstanding of the `decode` function's input expectations within `src/archive.rs`.

**Original Flaw:**
1.  The `create_archive` function correctly calculates the master header's `compressed_size` to include embedded headers and their payloads.
2.  During extraction (`extract_archive` and `extract_file`), the logic would:
    *   Read an `embedded_header` using `Headers::from_reader(reader)`. This advanced the main `reader` past the embedded header.
    *   Then, it would attempt to reconstruct a stream for the `decode` function by concatenating the `embedded_header.clone().to_bytes()` with the `compressed_data_buf` (read separately). This created a `Cursor` containing `[EMBEDDED_HEADER_BYTES, COMPRESSED_DATA]`.
3.  The `decode` function itself, however, internally called `Headers::from_reader` to parse the header from the *beginning* of its input stream.
4.  This resulted in the `decode` function effectively reading the embedded header twice. The first `Headers::from_reader` call in `extract_archive` positioned the main reader correctly, but the subsequent `Cursor` creation (with the prepended cloned header) caused `decode` to re-read the header, misinterpreting the actual compressed data as part of the header or simply skipping the true data. This led to parsing failures, `UnexpectedEof` (because `decode` consumed bytes it shouldn't have) or `InvalidData` errors.

## Proposed and Implemented Fix

The core of the fix was to refactor the `decode` function to accept an already-parsed `Headers` object, and then expect the provided `reader` to be positioned directly at the start of the compressed payload data.

### Changes in `src/huffman/decoder.rs`

*   **Modified `decode` function signature:**
    ```rust
    pub fn decode<R: Read, W: Write>(
        header: Headers, // Now accepts an already parsed Headers object
        reader: &mut R,
        decrypt_password: Option<&str>,
        writer: &mut W,
    ) -> Result<DecodeInfo, Box<dyn std::error::Error>>
    ```
*   **Removed internal header parsing:** The line `let header = Headers::from_reader(reader)?;` was removed from within the `decode` function.
*   **Corrected payload reader initialization:** For unencrypted data, the `payload_reader` is now initialized using `reader.take(header.compressed_size)` to ensure it only reads the actual compressed data after the header.

### Changes in `src/archive.rs`

*   **Updated `extract_archive` function:**
    *   The erroneous logic of creating a `full_file_bytes` buffer by cloning the header and prepending it to compressed data was removed.
    *   The `decode` function call was updated to directly pass the `embedded_header` (already parsed) and the main `reader` (which is correctly positioned at the start of the compressed payload after the `embedded_header` was read) as arguments:
        ```rust
                let _decode_info = decode(
                    embedded_header, // Pass the already parsed header
                    reader,          // Pass the main reader, which is correctly positioned
                    decrypt_password,
                    &mut output_file,
                ).map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e.to_string()))?;
        ```
*   **Updated `extract_file` function:**
    *   Applied the same fix as `extract_archive` to its `decode` call.

### Changes in `src/tests/roundtrip_test.rs`

*   **Added `use crate::headers::Headers;` import.**
*   **`roundtrip_file_compression` test:**
    *   Before calling `decode`, the test now explicitly reads the header from the `encoded_cursor` using `Headers::from_reader`.
    *   The `decode` call passes this read `header` and the `encoded_cursor` (now positioned at the payload).
    *   Corrected variable names from `encoded_output_buffer` to `compressed_output_buffer` and `decompressed_output_buffer` to `decoded_output_buffer` for consistency with the test's definitions.
*   **`test_encrypted_file_roundtrip` test:**
    *   Similarly, the test now explicitly reads the header before each `decode` call (for correct password, wrong password, and no password scenarios).
    *   The `decode` calls pass the respective `header` objects and the `Cursor`s.
    *   Corrected variable names like `decrypted_output_buffer_correct` to `decoded_output_buffer`.

### Changes in `src/cli.rs`

*   **`Decompress` command (single file decompression path):**
    *   Inside the `else` block for single file decompression, a new `single_file_header` is read from `input_file` using `Headers::from_reader(&mut input_file)`.
    *   This `single_file_header` is then passed as the first argument to the `decode` function, ensuring `input_file` is correctly positioned for `decode` to read the compressed data.

## Remaining Considerations / Known Limitations

*   **Checksum Verification in CLI:** The current streaming `decode` function writes directly to an output file. This makes it difficult to calculate the checksum of the decompressed content within the CLI application *after* decompression without re-reading the entire output file. For now, the CLI relies on the `decode_info.checksum` returned by the `decode` function, assuming `decode` would internally verify the checksum if it needed to. A more robust solution for client-side checksum verification of streaming decompressed output might involve modifying the `decode` function to return the decompressed `Vec<u8>` (going against streaming principles) or introducing a separate checksum verification pass.
*   **Unused Imports and Variables Warnings:** There are still several `unused import` and `unused variable` warnings (`padding_bits`, `embedded_header_start_pos`, `MOD_ADLER`, `serialize_tree`). These should be addressed for code cleanliness, either by using the variables/imports or removing them. These are non-critical warnings but indicate areas for minor cleanup.