binseq/lib.rs
1#![doc = include_str!("../README.md")]
2//!
3//! # BINSEQ
4//!
5//! The `binseq` library provides efficient APIs for working with the [BINSEQ](https://www.biorxiv.org/content/10.1101/2025.04.08.647863v2) file format family.
6//!
7//! It offers methods to read and write BINSEQ files, providing:
8//!
9//! - Compact multi-bit encoding and decoding of nucleotide sequences through [`bitnuc`](https://docs.rs/bitnuc/latest/bitnuc/)
10//! - Support for both single and paired-end sequences
11//! - Abstract [`BinseqRecord`] trait for representing records from all variants
12//! - Abstract [`BinseqReader`] enum for processing records from all variants
13//! - Abstract [`BinseqWriter`] enum for writing records to all variants
14//! - Parallel processing capabilities for arbitrary tasks through the [`ParallelProcessor`] trait.
15//! - Configurable [`Policy`] for handling invalid nucleotides (BQ/VBQ, CBQ natively supports `N` nucleotides)
16//!
17//! ## Recent additions (v0.9.0):
18//!
19//! ### New variant: CBQ
20//! **[`cbq`]** is a new variant of BINSEQ that solves many of the pain points around VBQ.
21//! The CBQ format is a columnar-block-based format that offers improved compression and faster processing speeds compared to VBQ.
22//! It natively supports `N` nucleotides and avoids the need for additional 4-bit encoding.
23//!
24//! ### Improved interface for writing records
25//! **[`BinseqWriter`]** provides a unified interface for writing records generically to BINSEQ files.
26//! This makes use of the new [`SequencingRecord`] which provides a cleaner builder API for writing records to BINSEQ files.
27//!
28//! ## Recent VBQ Format Changes (v0.7.0+)
29//!
30//! The VBQ format has undergone significant improvements:
31//!
32//! - **Embedded Index**: VBQ files now contain their index data embedded at the end of the file,
33//! improving portability.
34//! - **Headers Support**: Optional sequence identifiers/headers can be stored with each record.
35//! - **Extended Capacity**: u64 indexing supports files with more than 4 billion records.
36//! - **Multi-bit Encoding**: Support for both 2-bit and 4-bit nucleotide encodings.
37//!
38//! Legacy VBQ files are automatically migrated to the new format when accessed.
39//!
40//! # Example: Memory-mapped Access
41//!
42//! ```
43//! use binseq::Result;
44//! use binseq::prelude::*;
45//!
46//! #[derive(Clone, Default)]
47//! pub struct Processor {
48//! // Define fields here
49//! }
50//!
51//! impl ParallelProcessor for Processor {
52//! fn process_record<B: BinseqRecord>(&mut self, record: B) -> Result<()> {
53//! // Implement per-record logic here
54//! Ok(())
55//! }
56//!
57//! fn on_batch_complete(&mut self) -> Result<()> {
58//! // Implement per-batch logic here
59//! Ok(())
60//! }
61//! }
62//!
63//! fn main() -> Result<()> {
64//! // provide an input path (*.bq or *.vbq)
65//! let path = "./data/subset.bq";
66//!
67//! // open a reader
68//! let reader = BinseqReader::new(path)?;
69//!
70//! // initialize a processor
71//! let processor = Processor::default();
72//!
73//! // process the records in parallel with 8 threads
74//! reader.process_parallel(processor, 8)?;
75//! Ok(())
76//! }
77//! ```
78
79#![allow(clippy::module_inception)]
80
81/// BQ - fixed length records, no quality scores
82pub mod bq;
83
84/// Error definitions
85pub mod error;
86
87/// Parallel processing
88mod parallel;
89
90/// Invalid nucleotide policy
91mod policy;
92
93/// Record types and traits shared between BINSEQ variants
94mod record;
95
96/// VBQ - Variable length records, optional quality scores, compressed blocks
97pub mod vbq;
98
99/// CBQ - Columnar variable length records, optional quality scores and headers
100pub mod cbq;
101
102/// Prelude - Commonly used types and traits
103pub mod prelude;
104
105/// Write operations generic over the BINSEQ variant
106pub mod write;
107
108/// Utilities for working with BINSEQ files
109pub mod utils;
110
111pub use error::{Error, IntoBinseqError, Result};
112pub use parallel::{BinseqReader, ParallelProcessor, ParallelReader};
113pub use policy::{Policy, RNG_SEED};
114pub use record::{BinseqRecord, SequencingRecord, SequencingRecordBuilder};
115pub use write::{BinseqWriter, BinseqWriterBuilder};
116
117/// Re-export `bitnuc::BitSize`
118pub use bitnuc::BitSize;
119
120/// Default quality score for BINSEQ readers without quality scores
121pub(crate) const DEFAULT_QUALITY_SCORE: u8 = b'?';