# twobit
Efficient 2bit file reader, implemented in pure Rust.
[](https://github.com/jbethune/rust-twobit/actions?query=branch%3Amaster)
[](https://crates.io/crates/twobit)
[](https://docs.rs/twobit)

[](https://opensource.org/licenses/MIT)
The [2bit file format](http://genome.ucsc.edu/FAQ/FAQformat.html#format7) is
used to store genomic sequences on disk. It allows for fast access to specific
parts of the genome.
This crate is inspired by [py2bit](https://github.com/deeptools/py2bit) and tries to
offer somewhat similar functionality with no C-dependency, no external crate dependencies,
and great performance. It follows
[2 bit specification version 0](http://genome.ucsc.edu/FAQ/FAQformat.html#format7).
## Examples
```rust
use twobit::TwoBitFile;
let mut tb = TwoBitFile::open("assets/foo.2bit")?;
assert_eq!(tb.chrom_names(), &["chr1", "chr2"]);
assert_eq!(tb.chrom_sizes(), &[150, 100]);
let expected_seq = "NNACGTACGTACGTAGCTAGCTGATC";
assert_eq!(tb.read_sequence("chr1", 48..74)?, expected_seq);
```
All sequence-related methods expect range argument; one can pass `..` (unbounded range)
in order to query the entire sequence:
```rust
assert_eq!(tb.read_sequence("chr1", ..)?.len(), 150);
```
Files can be fully cached in memory in order to provide fast random access and avoid any
IO operations when decoding:
```rust
let mut tb_mem = TwoBitFile::open_and_read("assets/foo.2bit")?;
let expected_seq = tb.read_sequence("chr1", ..)?;
assert_eq!(tb_mem.read_sequence("chr1", ..)?, expected_seq);
```
2bit files offer two types of masks: N masks (aka hard masks) for unknown or arbitrary
nucleotides, and soft masks for lower-case nucleotides (e.g. "t" instead of "T").
Hard masks are *always enabled*; soft masks are *disabled by default*, but can be enabled
manually:
```rust
let mut tb_soft = tb.enable_softmask(true);
let expected_seq = "NNACGTACGTACGTagctagctGATC";
assert_eq!(tb_soft.read_sequence("chr1", 48..74)?, expected_seq);
```