Crate ecmlib

Source
Expand description

A simple library to encode or decode CD-ROM sectors into smaller and more compressible streams in a lossless way. It works by removing known data like:

  • Error Correction Code: This data is used to correct the sector in case of damage.
  • Error Detection Code: This data is used to detect defects into the sector.
  • Sync: The Sync data is used by the readers to detect the start point of the sector. It is always the same.
  • Address: Every sector has its own address. This address is predictable. *1
  • Mode: Sectors can be grouped into blocks of sectors of the same mode. *2
  • Flags: The Flags are used in Mode2 XA sectors and are duplicated, so one copy can be removed.

*1: The Address can be easily determined knowing the sector number. The first sector starts at the address 00:02:00.

*2: The Mode cannot be determined with the sector data, so must be provided at decoding time. The encoder allows to remove it because it can be stored in some ways that allows to save a little space. For example, if all the sectors on a 700MB disk are using the same type (360.000 sectors), you can store the mode into a single byte and save 359.999 bytes in the encoded stream.

The ECC and EDC data can be very random and hurts the compresibility of the data. This can be improved by removing that sector data and then regenerate when needed.

NOTE: This library will not work with all kind of disk images. Some ISO images sometimes are just the data without the extra sector information, and non CD-ROM disks tend to be just the data. For example, a DVD image file is the raw data without any ECC, EDC, Headers… and the same for other disk types like UMD images.

§How to use it

First we need to add the crate to the Cargo.toml file:

[dependencies]
ecmlib = "1.0.0"

With th crate imported, we will use the library as follows:

use ecmlib::{Decoder, Encoder, Optimizations, SectorType};
use std::fs::OpenOptions;
use std::io::{BufReader, BufWriter, Error, ErrorKind, Read, Result, Seek, Write};

const SECTOR_SIZE: usize = 2352;

fn main() -> Result<()> {
    env_logger::init();

    // Input file and buffer
    let input_path = "tests/data/mode2_xa1.bin";
    let input_file = OpenOptions::new().read(true).open(input_path)?;
    let input_metadata = input_file.metadata()?;
    let mut input_reader = BufReader::new(input_file);

    // Encoded files and buffers
    let encoded_path = "encoded.bin";
    let encoded_file = OpenOptions::new()
        .read(true)
        .write(true)
        .create(true)
        .open(encoded_path)?;
    let mut encoded_writer = BufWriter::new(&encoded_file);
    let mut encoded_reader = BufReader::new(&encoded_file);
    //
    let encoded_path_idx = "encoded.bin.idx";
    let mut encoded_file_idx = OpenOptions::new()
        .write(true)
        .create(true)
        .open(encoded_path_idx)?;

    // Decoded file and buffer
    let decoded_path = "decoded.bin";
    let decoded_file = OpenOptions::new()
        .write(true)
        .create(true)
        .open(decoded_path)?;
    let mut decoded_writer = BufWriter::new(&decoded_file);

    // Other settings and variables
    let mut optimizations = Optimizations::all();
    // The first sector starts at MSF 00:02:00 -> 150
    let mut sector_number = 150;
    // Buffer used to send the data to the encoder/decoder
    let mut sector_buffer = [0u8; SECTOR_SIZE];
    // Vector to store the index. First sector will be for optimizations.
    let mut sectors_index: Vec<u8> = vec![optimizations.bits()];

    // Initialize the encoder and the decoder
    let mut encoder = Encoder::new(optimizations);
    let mut decoder = Decoder::new();

    // Check that the size is multiple of a sector size (a correct CD-ROM)
    if input_metadata.len() % SECTOR_SIZE as u64 != 0 {
        eprintln!("The input file doesn't seems to be a CD-ROM Image.");
        return Err(Error::new(ErrorKind::InvalidInput, "Incorrect Size"));
    }

    // First pass to determine the right optimizations for the image.
    // Some images are not compliant and requires to disable some optimizations. To be surre, it's useful to check it first.
    loop {
        match input_reader.read_exact(&mut sector_buffer) {
            Ok(()) => {
                // Determine the sector type
                let sector_type = encoder.detect_sector_type(&sector_buffer).unwrap();
                // Append the sector type to the index
                sectors_index.push(sector_type as u8);
                // Check the optimizations
                let correct_optimizations = encoder.check_optimizations(
                    &sector_buffer,
                    sector_number,
                    sector_type,
                    optimizations,
                );
                // Update the optimizations if the check result doesn't matches the original optimizations.
                if correct_optimizations != optimizations {
                    optimizations = correct_optimizations;
                    encoder.set_optimizations(optimizations);
                    sectors_index[0] = optimizations.bits();
                }

                sector_number += 1;
            }
            Err(ref e) if e.kind() == ErrorKind::UnexpectedEof => {
                break;
            }
            Err(e) => {
                eprintln!(
                    "There was an error reading the sector {}: {}",
                    sector_number, e
                );
                return Err(Error::new(ErrorKind::InvalidInput, "Read error"));
            }
        }
    }

    println!("Checked {} sectors.", sector_number - 150);

    // Reset the reader buffer position and the sector_number
    input_reader.rewind()?;
    sector_number = 150;

    // Write the index file
    encoded_file_idx.write(&sectors_index)?;

    // Second pass to encode the file
    for sector_type in &sectors_index[1..] {
        let converted_sector_type = SectorType::from(*sector_type);

        match input_reader.read_exact(&mut sector_buffer) {
            Ok(()) => {
                // Optimize the sector with the detected optimizations
                let processed_sector = encoder
                    .encode_sector(
                        &sector_buffer,
                        sector_number,
                        Some(converted_sector_type),
                        true,
                    )
                    .unwrap();

                // Write the processed sector into the encoded file
                encoded_writer.write_all(&processed_sector)?;

                sector_number += 1;
            }
            Err(ref e) if e.kind() == ErrorKind::UnexpectedEof => {
                break;
            }
            Err(e) => {
                eprintln!(
                    "There was an error reading the sector {}: {}",
                    sector_number, e
                );
                return Err(Error::new(ErrorKind::InvalidInput, "Read error"));
            }
        }
    }

    println!("Encoded {} sectors.", sector_number - 150);

    // Flush the encoded file
    encoded_writer.flush()?;
    // Rewind the buffer
    encoded_writer.rewind()?;
    // Reset the sector number
    sector_number = 150;

    // Decode the encoded data into the decoded output
    for sector_type in &sectors_index[1..] {
        let converted_sector_type = SectorType::from(*sector_type);
        let encoded_size = decoder.get_encoded_size(converted_sector_type, optimizations);

        // Read the required bytes
        match encoded_reader.read_exact(&mut sector_buffer[..encoded_size]) {
            Ok(()) => {
                // Encode the sector
                let processed_sector = decoder
                    .decode_sector(
                        &sector_buffer[..encoded_size],
                        converted_sector_type,
                        sector_number,
                        optimizations,
                    )
                    .unwrap();

                // Escribir el sector procesado
                decoded_writer.write_all(&processed_sector)?;

                sector_number += 1;
            }
            Err(ref e) if e.kind() == ErrorKind::UnexpectedEof => {
                break;
            }
            Err(e) => {
                eprintln!(
                    "There was an error reading the sector {}: {}",
                    sector_number, e
                );
                return Err(Error::new(ErrorKind::InvalidInput, "Read error"));
            }
        }
    }

    // Flush the decoded buffer
    decoded_writer.flush()?;
    println!(
        "Encoding finished correctly. Processed sectors: {}",
        sector_number - 150
    );


    Ok(())
}

§Important Notes

Some CD-ROM contains non compliant sectors as anticopy method, like for example the PSX games. It is important to test the applicable optimizations to every sector or you’ll never be able to recover the original sector. The library will do the tests to determine the optimizations (unless you force it to don’t do it), and will return the last used optimizations using the method get_last_used_optimizations. This is useful to for example add to every sector index the used optimizations, or perform a first pass to determine the applicable optimizations to the whole image. That dependes of the balance of how many bytes you want to save and the complexity of the decoding ;)

§Sectors types

§CDDA

A CDDA sector is just raw data that cannot be removed. This kind of sector will provide a 0% of space saving unless the sector is a GAP (fully zeroed), in which case the reduction will be 100%.

-----------------------------------------------------
       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000h [---DATA...
...
0920h                                     ...DATA---]
-----------------------------------------------------

§MODE1

A MODE1 sector contains:

  • Sync Data: 12 bytes
  • Address: 3 bytes
  • Mode: 1 byte
  • Data: 2048 bytes
  • EDC: 4 bytes
  • GAP: 8 bytes
  • ECC: 276 Bytes

This sector can be reduced by 304 bytes (12.92%) keeping only the data, and 100% in case that the data is a GAP or the full sector is a GAP (zeroed data, EDC & ECC).

-----------------------------------------------------
       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000h 00 FF FF FF FF FF FF FF FF FF FF 00 [-MSF -] 01
0010h [---DATA...
...
0800h                                     ...DATA---]
0810h [---EDC---] 00 00 00 00 00 00 00 00 [---ECC...
...
0920h                                      ...ECC---]
-----------------------------------------------------

§MODE2

A MODE2 sector contains:

  • Sync Data: 12 bytes
  • Address: 3 bytes
  • Mode: 1 byte
  • Data: 2336 bytes

This sector can be reduced by only 16 bytes (1%), and for gap data can be reduced up to 100% too. Luckily this sector is not widely used becase is insecure (doesn’t contain any ECC or EDC).

-----------------------------------------------------
       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000h 00 FF FF FF FF FF FF FF FF FF FF 00 [-MSF -] 02
0010h [---DATA...
...
0920h                                     ...DATA---]
-----------------------------------------------------

§MODE2 XA1

This sector is similar to a MODE2 sector but with EDC and ECC data. The distribution is the following:

  • Sync Data: 12 bytes
  • Address: 3 bytes
  • Mode: 1 byte
  • Flags (2 copies): 8 bytes
  • Data: 2048 Bytes
  • EDC: 4 bytes
  • ECC: 276 Bytes

This sector can be reduced by 300 bytes (12.75%), and in case of a GAP sector only 4 bytes will be required (a copy of the flag).

-----------------------------------------------------
       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000h 00 FF FF FF FF FF FF FF FF FF FF 00 [-MSF -] 02
0010h [--FLAGS--] [--FLAGS--] [---DATA...
...
0810h             ...DATA---] [---EDC---] [---ECC...
...
0920h                                      ...ECC---]
-----------------------------------------------------

§MODE2 XA2

This sector is like a MODE2 XA1 sector but without the ECC data. This will allow more space for data but is less reliable. In this case the distribution is the following:

  • Sync Data: 12 bytes
  • Address: 3 bytes
  • Mode: 1 byte
  • Flags (2 copies): 8 bytes
  • Data: 2324 Bytes
  • EDC: 4 bytes

This sector can be reduced by 24 bytes (1%), and like the XA1 in case of a GAP sector only 4 bytes will be required (a copy of the flag).

-----------------------------------------------------
       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000h 00 FF FF FF FF FF FF FF FF FF FF 00 [-MSF -] 02
0010h [--FLAGS--] [--FLAGS--] [---DATA...
...
0920h                         ...DATA---] [---EDC---]
-----------------------------------------------------

§Address Notes

The Address is noted in MSF (Minutes, Seconds and Frames), formatted in BCD (Binary‑Coded Decimal). The first sector starts at 00:02:00 (150 frames pregap).

  • A minute are 60 seconds
  • A second are 75 frames

Every frame is a sector, so a 80 minutes disk contains 360.000 secctors.

Structs§

Decoder
Decoder implementation able to recover the original sector data by regenerating the removed data.
Encoder
Encoder implementation to optimize the CD-ROM sectors data and reduce their size. This “compression” produces a lossless encoding method.
Optimizations
This enum must be used to pass to the decoder the optimizations used into the sector. Like the SectorType, it is important to store this data together with the sector to be able to decode it later.

Enums§

SectorType
Enum to identify the different sectors types. This data is important for the decoding, so it must be stored in the output stream in any way via Index, header…