# The PlasCAD file format


This document describes the PlasCAD file format: a compact way of storing DNA sequences, features, primers, metadata,
and related information. It's a binary format divided into discrete packets, and uses the `.pcad` file extension.
Code for implementing can be found in [pcad.rs](src/file_io/pcad.rs).

Most data structures use [Bincode](https://docs.rs/bincode/latest/bincode/) library. This is convenient for this program's
purposes, but makes external interoperability more challenging.

Byte order is big endian.


## Components

The starting two bytes of a PlasCAD file are always `0xca`, `0xfe`.

The remaining bytes are divided into adjacent packets. Packets can be found in any order.


## Packet structure

- **Byte 0**: Always `0x11`
- **Bytes 1-4**: A 32-bit unsigned integer of payload size, in bytes.
- **Byte 5**: An 8-bit unsigned integer that indicates the packet's type. (See the `Packets` sections below for this mapping.)
- **Bytes 6-end**: The payload; how this is encoded depends on packet type.

## Packet types


### Sequence: 0

Contains a DNA sequence.

Bytes 0-3: A 32-bit unsigned integer of sequence length, in nucleotides.
Remaining data: Every two bits is a nucleotide. This packet is always a whole number of bytes; bits in the final byte that
would extend past the sequence length are ignored. Bit assignments for nucleotides is as follows:

- **T**: `0b00`
- **C**: `0b01`
- **A**: `0b10`
- **G**: `0b11`

This is the same nucleotide mapping as [.2bit format](http://genome.ucsc.edu/FAQ/FAQformat.html#format7).

#### An example

The sequence `CTGATTTCTG`. This would serialize as follows, using 7 total bytes:

- **Bytes 0-3**: `[0, 0, 0, 10]`, to indicate the sequence length of 10 nucleodies. 

Three additional bytes to encode the sequence; each byte can fit 4 nucleotides:
- **Byte 4**: `CTGA` `0b01_00_11_10`
- **Byte 5**: `TTTC` `0b00_00_00_01`
- **Byte 6**: `TG` `0b00_11_00_00`. 

On the final byte, note the 0-fill on the right; we know not to encode it as `T` due to the
sequence length.


## Packets, with their associcated integer type 


### Features: 1

A bincode serialization of a `Vec<Feature>`

### Primers: 2

A bincode serialization of a `Vec<Primer>`

### Metadata: 3

A bincode serialization of a `Metadata`

### IonConcentrations: 6

A bincode serialization of a `IonConcentrations`

### Portions: 7

A bincode serialization of a `Portions`

### PathLoaded: 10

A bincode serialization of a `Option<PathBuf>`

### Topology: 11

A bincode serialization of a `Topology`