1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
//! # VBINSEQ
//!
//! VBINSEQ is a high-performance binary file format for nucleotides.
//!
//! It is a variant of the BINSEQ file format with support for _variable length records_ and _quality scores_.
//!
//! ## Overview
//!
//! VBINSEQ provides a block-based file format for efficient storage and retrieval of nucleotide sequences.
//! Key features include:
//!
//! * **Block-based architecture** - Data is stored in fixed-size record blocks that can be processed independently
//! * **Variable-length records** - Unlike fixed-size records, variable-length records can store sequences of any size
//! * **Quality scores** - Optional quality score tracking for each nucleotide
//! * **Paired sequences** - Support for paired-end sequencing data
//! * **Parallel compression** - Support for ZSTD compression with parallel processing
//! * **Random access** - Efficient random access to record blocks
//!
//! ## Usage
//!
//! The two primary interfaces are:
//!
//! * `VBinseqWriter` - For writing nucleotide sequences to a VBINSEQ file
//! * `MmapReader` - For memory-mapped reading of VBINSEQ files
//!
//! ### Writing to a VBINSEQ file
//!
//! ```rust
//! use std::fs::File;
//! use std::io::BufWriter;
//! use vbinseq::{VBinseqHeader, VBinseqWriterBuilder, MmapReader};
//!
//! // Path to the output file
//! let path_name = "some_example.vbq";
//!
//! // Create a header with quality scores and compression enabled
//! let header = VBinseqHeader::new(true, true, false);
//!
//! // Open a file for writing
//! let handle = File::create(path_name).map(BufWriter::new).unwrap();
//!
//! // Create a writer with the specified header
//! let mut writer = VBinseqWriterBuilder::default()
//! .header(header)
//! .build(handle)
//! .unwrap();
//!
//! // Write a nucleotide sequence with quality scores
//! let sequence = b"ACGTACGT";
//! let quality = b"!!!?!?!!";
//! writer.write_nucleotides_quality(0, sequence, quality).unwrap();
//! writer.finish().unwrap();
//!
//! // Open a file for memory-mapped reading
//! let mut reader = MmapReader::new(path_name).unwrap();
//! let mut block = reader.new_block();
//!
//! // Process blocks one at a time
//! let mut seq_buffer = Vec::new();
//! while reader.read_block_into(&mut block).unwrap() {
//! for record in block.iter() {
//! // Decode the sequence
//! record.decode_s(&mut seq_buffer).unwrap();
//! println!("Sequence {}: {}", record.index(), std::str::from_utf8(&seq_buffer).unwrap());
//!
//! // Validate the sequence and quality scores
//! assert_eq!(seq_buffer, sequence);
//! assert_eq!(record.squal(), quality);
//!
//! seq_buffer.clear(); // Clear the buffer for the next sequence
//! }
//! }
//!
//! // Delete the temporary file (for testing purposes)
//! std::fs::remove_file(path_name).unwrap();
//! ```
//!
//! ## File Format Structure
//!
//! The VBINSEQ file format consists of:
//!
//! 1. A file header (32 bytes) containing format information
//! 2. A series of record blocks, each containing:
//! - Block header (32 bytes)
//! - Block data (variable size, containing records)
//! - Block padding (to maintain fixed virtual block size)
//!
//! Each record contains a preamble with metadata and data containing encoded sequences and quality scores.
//!
//! See the README.md for detailed format specifications.
pub use ;
pub use ;
pub use ;
pub use ParallelProcessor;
pub use Policy;
pub use ;
pub use ;