cbm-dos 0.1.3

A library to decode and encode gcr bytes (4-to-5)
Documentation
# cbm-dos


A small Rust library that implements Commodore-style GCR (Group Code Recording) 4-to-5 encoding and decoding.

It converts 8-bit bytes into a stream of 5-bit codes ("quintuples") and back, using the classic 4-bit-to-5-bit mapping. The crate exposes a minimal API centered around the `GCR` type with `encode` and `decode` operations.

- 4-byte input encodes to 5 bytes of GCR output.
- 5-byte GCR input decodes to 4 bytes of original output.
- Invalid quintuples during decoding cause `decode` to return `None`.

## Status

- Version: 0.1.3
- Rust edition: 2024

## Why GCR (4-to-5)?

Group Code Recording was used on Commodore disk formats, mapping each 4-bit nibble to a 5-bit code that satisfies constraints for magnetic media. This library focuses on that mapping only (bit packing/unpacking and lookup), not on flux-level or disk image handling.

## Features

- Simple, allocation-friendly encoding/decoding routines
- O(1) lookups via precomputed tables
- Deterministic packing to/from 40-bit (5-byte) blocks
- Fully tested round-trip for a known vector

## Installation

Add the dependency to your `Cargo.toml`:

```toml
[dependencies]
cbm-dos = "0.1.3"
```

Then import in your code:

```rust
use cbm_dos::GCR;
```

## Quick start


```rust
use cbm_dos::GCR;

fn main() {
    // Create a GCR encoder/decoder
    let gcr = GCR::new();

    // Example: encode 8 bytes (must be a multiple of 4)
    let data: Vec<u8> = vec![0x08, 0x01, 0x00, 0x01, 0x30, 0x30, 0x00, 0x00];
    let encoded = gcr.encode(&data);
    assert_eq!(encoded, vec![0x52, 0x54, 0xB5, 0x29, 0x4B, 0x9A, 0xA6, 0xA5, 0x29, 0x4A]);

    // Decode back (input length must be a multiple of 5)
    let decoder = GCR::new();
    let decoded = decoder.decode(&encoded).expect("valid GCR");
    assert_eq!(decoded, data);
}
```

## API overview


- `GCR::new() -> GCR`
  - Constructs a new instance with precomputed encode/decode lookup tables.

- `GCR::encode(&self, input: &[u8]) -> Vec<u8>`
  - Encodes the input in chunks of 4 bytes at a time.
  - For each 4-byte chunk (8 nibbles), each nibble is mapped to a 5-bit code and packed into a 40-bit big-endian value, emitted as 5 bytes.
  - If `input.len()` is not a multiple of 4, extra bytes at the end are ignored. You should pad your input if you need exact coverage.

- `GCR::decode(&self, input: &[u8]) -> Option<Vec<u8>>`
  - Decodes the input in chunks of 5 bytes at a time.
  - Each 5-byte chunk is interpreted as a 40-bit big-endian value composed of 8 quintuples; each quintuple maps back to a 4-bit nibble.
  - Returns `None` if any quintuple in any chunk is invalid.

## The mapping


This library uses the canonical 16-entry mapping from 4-bit nibbles to 5-bit GCR codes (shown here as binary):

```
(Encoded -> Decoded nibble)
01010 -> 0x0
01011 -> 0x1
10010 -> 0x2
10011 -> 0x3
01110 -> 0x4
01111 -> 0x5
10110 -> 0x6
10111 -> 0x7
01001 -> 0x8
11001 -> 0x9
11010 -> 0xA
11011 -> 0xB
01101 -> 0xC
11101 -> 0xD
11110 -> 0xE
10101 -> 0xF
```

Internally the crate precomputes two lookup tables for O(1) translation:
- `decode_mappings[32]` indexed by the 5-bit code to obtain the 4-bit nibble (invalid entries are 0xFF)
- `encode_mappings[16]` indexed by the nibble to obtain its 5-bit code

## Input size rules and padding

- Encoding operates on exact 4-byte blocks. If the input length is not a multiple of 4, the trailing bytes are ignored. If you need to process all data, pad to a multiple of 4 and carry the padding information separately.
- Decoding operates on exact 5-byte blocks. If the input length is not a multiple of 5, the trailing bytes are ignored by the chunking iterator and will not be decoded. Provide complete 5-byte blocks.

## Error handling

- `decode` returns `None` if it encounters any 5-bit value that is not a valid GCR code (i.e., it maps to 0xFF in the internal table). This typically means the input stream is corrupted or misaligned.

## Example vectors

The tests in this crate include a round-trip sanity check:

```rust
use cbm_dos::GCR;

fn main() {
    let gcr = GCR::new();
    let encoded: Vec<u8> = vec![0x52, 0x54, 0xB5, 0x29, 0x4B, 0x9A, 0xA6, 0xA5, 0x29, 0x4A];
    let decoded = gcr.decode(&encoded).unwrap();
    assert_eq!(decoded, vec![0x08, 0x01, 0x00, 0x01, 0x30, 0x30, 0x00, 0x00]);

    let gcr2 = GCR::new();
    let reencoded = gcr2.encode(&decoded);
    assert_eq!(reencoded, encoded);
}
```

## Performance and allocation

- `encode` builds the output `Vec<u8>` by pushing 5 bytes per 4 input bytes; pre-sizing is not strictly necessary, but you can reserve capacity if you know the number of blocks.
- `decode` accumulates output and uses a small temporary for nibble packing; invalid codes short-circuit with `None`.

## Safety

This crate is `no_std`-unaware by default (it uses `Vec` from the standard library). It does not use unsafe code. There are no external dependencies.

## Testing

Run the tests:

```bash
cargo test
```

## Limitations and scope

- Only handles the 4-to-5 GCR mapping and 40-bit packing as implemented here.
- Does not include disk flux decoding/encoding, sync marks, sector layout, checksums, or higher-level track/sector handling.

## License

Licensed under either of
- Apache License, Version 2.0
- MIT license
at your option.

The declared license for this crate is "MIT OR Apache-2.0" as specified in Cargo.toml. If license text files are not present in the repository, refer to the standard license texts:
- Apache-2.0: https://www.apache.org/licenses/LICENSE-2.0
- MIT: https://opensource.org/licenses/MIT