Skip to main content

Crate dryice

Crate dryice 

Source
Expand description

High-throughput transient container for read-like genomic records.

dryice is a block-oriented temporary storage format optimized for workflows where sequencing records need to move to disk and back quickly, especially external sorting, partitioning, and other out-of-core genomics pipelines.

The crate is parser-agnostic: any type implementing SeqRecordLike can be written into a dryice file, and records are read back as borrowed slices with no per-record allocation. Sequence, quality, and name encodings are selected via trait-based codec type parameters, and users can implement their own codecs.

§Writing records (default codecs)

use dryice::{DryIceWriter, SeqRecord, SeqRecordLike};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder()
    .inner(&mut buf)
    .build();

let record = SeqRecord::new(
    b"read1".to_vec(),
    b"ACGTACGT".to_vec(),
    b"!!!!!!!!".to_vec(),
)?;
writer.write_record(&record)?;
writer.finish()?;

§Writing with compact codecs

use dryice::{DryIceWriter, SeqRecord};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder()
    .inner(&mut buf)
    .two_bit_exact()
    .binned_quality()
    .split_names()
    .target_block_records(4096)
    .build();

let record = SeqRecord::new(
    b"instrument:run:flowcell 1:N:0:ATCACG".to_vec(),
    b"ACGTACGT".to_vec(),
    b"!!!!!!!!".to_vec(),
)?;
writer.write_record(&record)?;
writer.finish()?;

§Writing with record keys

use dryice::{Bytes8Key, DryIceWriter, SeqRecord};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder()
    .inner(&mut buf)
    .bytes8_key()
    .build();

let record = SeqRecord::new(
    b"read1".to_vec(),
    b"ACGTACGT".to_vec(),
    b"!!!!!!!!".to_vec(),
)?;
let key = Bytes8Key(*b"sortkey!");
writer.write_record_with_key(&record, &key)?;
writer.finish()?;

§Writing key-only files with empty payload

use dryice::{Bytes16Key, DryIceWriter};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder()
    .inner(&mut buf)
    .bytes16_key()
    .empty_payload()
    .build();

writer.write_key_only(&Bytes16Key(*b"0000000000000001"))?;
writer.write_key_only(&Bytes16Key(*b"0000000000000002"))?;
writer.finish()?;

§Writing minimizer keys with the builder conveniences

use dryice::{DefaultMinimizer64, DryIceWriter, SeqRecord};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder()
    .inner(&mut buf)
    .minimizers_with_sequences()
    .build();

let record = SeqRecord::new(
    b"read1".to_vec(),
    b"ACGTGCTCAGAGACTCAGAGGATTACAGTTTACGTGCTCAGAGACTCAGAGGA".to_vec(),
    vec![b'!'; 53],
)?;

if let Some(key) = DefaultMinimizer64::try_from_sequence(record.sequence())? {
    writer.write_record_with_key(&record, &key)?;
}

writer.finish()?;

§Reading records (zero-copy)

use dryice::{DryIceReader, DryIceWriter, SeqRecord, SeqRecordLike};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder().inner(&mut buf).build();
let record = SeqRecord::new(
    b"r1".to_vec(), b"ACGT".to_vec(), b"!!!!".to_vec()
)?;
writer.write_record(&record)?;
writer.finish()?;

let mut reader = DryIceReader::new(buf.as_slice())?;
while reader.next_record()? {
    let _name = reader.name();
    let _seq = reader.sequence();
    let _qual = reader.quality();
}

§Reading keys directly

use dryice::{
    Bytes16Key, DryIceReader, DryIceWriter, OmittedNameCodec, OmittedQualityCodec,
    OmittedSequenceCodec,
};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder()
    .inner(&mut buf)
    .bytes16_key()
    .empty_payload()
    .build();
writer.write_key_only(&Bytes16Key(*b"0000000000000001"))?;
writer.finish()?;

let mut reader = DryIceReader::builder()
    .inner(buf.as_slice())
    .sequence_codec::<OmittedSequenceCodec>()
    .quality_codec::<OmittedQualityCodec>()
    .name_codec::<OmittedNameCodec>()
    .record_key::<Bytes16Key>()
    .build()?;

while let Some(key) = reader.next_key()? {
    let _ = key;
}

§Reading records (convenience iterator)

use dryice::{DryIceReader, DryIceWriter, SeqRecord};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder().inner(&mut buf).build();
let record = SeqRecord::new(
    b"r1".to_vec(), b"ACGT".to_vec(), b"!!!!".to_vec()
)?;
writer.write_record(&record)?;
writer.finish()?;

let reader = DryIceReader::new(buf.as_slice())?;
for record in reader.into_records() {
    let record = record?;
    println!("{}", record);
}

§Zero-copy reader-to-writer piping

use dryice::{DryIceReader, DryIceWriter, SeqRecord, SeqRecordLike};

let mut buf1 = Vec::new();
let mut writer1 = DryIceWriter::builder().inner(&mut buf1).build();
let record = SeqRecord::new(
    b"r1".to_vec(), b"ACGT".to_vec(), b"!!!!".to_vec()
)?;
writer1.write_record(&record)?;
writer1.finish()?;

let mut buf2 = Vec::new();
let mut reader = DryIceReader::new(buf1.as_slice())?;
let mut writer2 = DryIceWriter::builder().inner(&mut buf2).build();
while reader.next_record()? {
    writer2.write_record(&reader)?;
}
writer2.finish()?;

§Temporary file lifecycle

For filesystem-backed intermediate data, prefer letting dryice create and own the temporary file. TempDryIceFile composes with the normal stream-oriented reader and writer APIs, but removes the backing file by default when the guard is cleaned up or dropped.

use std::io::{Seek, SeekFrom};

use dryice::{DryIceReader, DryIceWriter, SeqRecord, TempDryIceFile};

let temp = TempDryIceFile::new()?;

let mut file = {
    let file = temp.open()?;
    let mut writer = DryIceWriter::builder().inner(file).build();
    let record = SeqRecord::new(b"r1".to_vec(), b"ACGT".to_vec(), b"!!!!".to_vec())?;
    writer.write_record(&record)?;
    writer.finish()?
};

file.seek(SeekFrom::Start(0))?;
let mut reader = DryIceReader::new(file)?;
while reader.next_record()? {
    // use the current record
}

temp.cleanup()?;

§Reading with non-default codecs

use dryice::{
    BinnedQualityCodec, DryIceReader, DryIceWriter, SeqRecord,
    SeqRecordLike, SplitNameCodec, TwoBitExactCodec,
};

let mut buf = Vec::new();
let mut writer = DryIceWriter::builder()
    .inner(&mut buf)
    .two_bit_exact()
    .binned_quality()
    .split_names()
    .build();
let record = SeqRecord::new(
    b"instrument:run 1:N:0".to_vec(),
    b"ACGT".to_vec(),
    b"!!!!".to_vec(),
)?;
writer.write_record(&record)?;
writer.finish()?;

let mut reader = DryIceReader::with_codecs::<
    TwoBitExactCodec,
    BinnedQualityCodec,
    SplitNameCodec,
>(buf.as_slice())?;
while reader.next_record()? {
    let _seq = reader.sequence();
}

§Custom codec implementation

use dryice::{DryIceError, SequenceCodec};

struct UppercaseCodec;

impl SequenceCodec for UppercaseCodec {
    const TYPE_TAG: [u8; 16] = *b"demo:seq:upper!!";
    const LOSSY: bool = true;

    fn encode_into(sequence: &[u8], output: &mut Vec<u8>) -> Result<(), DryIceError> {
        output.extend(sequence.iter().map(u8::to_ascii_uppercase));
        Ok(())
    }

    fn decode_into(
        encoded: &[u8],
        _original_len: usize,
        output: &mut Vec<u8>,
    ) -> Result<(), DryIceError> {
        output.extend_from_slice(encoded);
        Ok(())
    }
}

Re-exports§

pub use config::BlockLayoutOptions;
pub use config::BlockSizePolicy;
pub use config::DryIceWriterOptions;
pub use key::Bytes8Key;
pub use key::Bytes16Key;
pub use key::KmerKey;
pub use key::Minimizer64;
pub use key::NoRecordKey;
pub use key::PrefixKmer64;
pub use key::RecordKey;
pub use temp::TempDryIceFile;

Modules§

config
Writer and reader configuration types.
fields
Field markers and selection-expression scaffolding for selected reads.
key
Record-key types and traits.
temp
Owned temporary files for filesystem-backed dryice workflows.

Structs§

BinnedQualityCodec
Illumina-style 8-level quality score binning.
DryIceReader
Reads sequencing records from a dryice file.
DryIceRecords
Iterator over records in a dryice file, yielding owned SeqRecord values.
DryIceWriter
Writes sequencing records into the dryice block-oriented format.
EmptyRecord
A zero-payload read-like record with empty name, sequence, and quality.
OmittedNameCodec
Omit names entirely. Encodes to empty, decodes to OmittedName.
OmittedQualityCodec
An omitted quality codec that produces and expects empty quality sections.
OmittedSequenceCodec
Omitted sequence storage.
RawAsciiCodec
Raw ASCII sequence storage. No transformation — fastest possible encode and decode, largest on-disk footprint.
RawNameCodec
Raw name storage. No transformation.
RawQualityCodec
Raw quality score storage. No transformation.
SelectedDryIceReader
Reader type returned when a field selection is specified on the builder.
SelectedRecord
Borrowed current-record view returned by a selected reader.
SeqRecord
An owned, row-wise sequencing record.
SplitNameCodec
Split name codec. Splits on the first space into identifier and description, storing both with a length prefix for exact reconstruction.
TwoBitExactCodec
Exact 2-bit sequence encoding with sparse ambiguity sideband.
TwoBitLossyNCodec
Lossy 2-bit sequence encoding that collapses all ambiguous bases to N.

Enums§

DryIceError
Top-level error type for all dryice operations.

Constants§

EMPTY_RECORD
Shared empty record value for key-only writes and tests.

Traits§

NameCodec
A name encoding strategy for dryice blocks.
QualityCodec
A quality score encoding strategy for dryice blocks.
SeqRecordExt
Extension trait providing convenience methods for any SeqRecordLike implementor.
SeqRecordLike
A read-like sequencing record with name, sequence, and quality fields.
SequenceCodec
A sequence encoding strategy for dryice blocks.

Type Aliases§

DefaultMinimizer64
DefaultPrefixKmer64