Expand description

The twobit crate provides an efficient 2bit file reader, implemented in pure Rust.

Brief overview

This crate is inspired by py2bit and tries to offer somewhat similar functionality with no C-dependency, no external crate dependencies, and great performance. It follows 2 bit specification version 0.

The primary type in this crate is TwoBitFile, a wrapper around a generic IO reader that provides access to 2bit reading routines. Resulting nucleotide sequences are returned as String, but can be also handled as bytes since they are guaranteed to be pure ASCII.

The set of errors is described by the Error type, with most methods returning results wrapped in Result due to possible IO and format errors.

Examples

use twobit::TwoBitFile;

let mut tb = TwoBitFile::open("assets/foo.2bit")?;
assert_eq!(tb.chrom_names(), &["chr1", "chr2"]);
assert_eq!(tb.chrom_sizes(), &[150, 100]);
let expected_seq = "NNACGTACGTACGTAGCTAGCTGATC";
assert_eq!(tb.read_sequence("chr1", 48..74)?, expected_seq);

All sequence-related methods expect range argument; one can pass .. (unbounded range) in order to query the entire sequence:

assert_eq!(tb.read_sequence("chr1", ..)?.len(), 150);

Files can be fully cached in memory in order to provide fast random access and avoid any IO operations when decoding:

let mut tb_mem = TwoBitFile::open_and_read("assets/foo.2bit")?;
let expected_seq = tb.read_sequence("chr1", ..)?;
assert_eq!(tb_mem.read_sequence("chr1", ..)?, expected_seq);

2bit files offer two types of masks: N masks (aka hard masks) for unknown or arbitrary nucleotides, and soft masks for lower-case nucleotides (e.g. “t” instead of “T”).

Hard masks are always enabled; soft masks are disabled by default, but can be enabled manually:

let mut tb_soft = tb.enable_softmask(true);
let expected_seq = "NNACGTACGTACGTagctagctGATC";
assert_eq!(tb_soft.read_sequence("chr1", 48..74)?, expected_seq);

Modules

Generate 2bit files from other file formats

Type-safe representations of nucleotide sequences

Structs

Number or percentage of bases of each type in a sequence.

Information on a particular chromosome or sequence in a 2bit file.

2bit file reader, a wrapper around Read + Seek.

Summary information about a 2bit file.

Enums

An error that may occur while reading and parsing a 2bit file.

Type Definitions

A type alias for TwoBitFile with arbitrary boxed reader.

A type alias for Result<T, twobit::Error>.

A type alias for TwoBitFile<_> returned by open_and_read().

A type alias for TwoBitFile<_> returned by open().