1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
// Clippy can misattribute large_stack_arrays to byte_offset 0 when monomorphizing
// generic code in test builds. Allow it crate-wide in test config since the actual
// allocations are heap-based (`vec![]`), not stack arrays.
//! # fgumi - Fulcrum Genomics UMI Tools Library
//!
//! This library provides core functionality for working with Unique Molecular Identifiers (UMIs)
//! in sequencing data, including grouping, consensus calling, and quality filtering.
//!
//! ## Overview
//!
//! The fgumi library is organized into several key modules:
//!
//! ### Core Functionality
//!
//! - **[`umi`]** - UMI assignment strategies (identity, edit-distance, adjacency, paired)
//! - **[`consensus`]** - Consensus calling algorithms (simplex, duplex, vanilla)
//! - **[`sam`]** - SAM/BAM file utilities and alignment tag manipulation
//!
//! ### Utilities
//!
//! - **[`bam_io`]** - BAM file I/O helpers for reading and writing
//! - **[`validation`]** - Input validation utilities for parameters and files
//! - **[`progress`]** - Progress tracking and logging
//! - **[`logging`]** - Enhanced logging utilities with formatting
//! - **[`metrics`]** - Structured metrics types and file writing utilities
//! - **[`rejection`]** - Rejection reason tracking and statistics
//!
//! ### Specialized Modules
//!
//! - **[`clipper`]** - Read clipping for overlapping pairs
//! - **[`template`]** - Template-based read grouping
//! - **[`reference`][mod@reference]** - Reference genome handling
//!
//! ## Quick Start
//!
//! ### Reading and Writing BAM Files
//!
//! ```no_run
//! use fgumi_lib::bam_io::{create_bam_reader, create_bam_writer};
//!
//! # fn main() -> anyhow::Result<()> {
//! // Open input BAM and get header (path, threads)
//! let (mut reader, header) = create_bam_reader("input.bam", 1)?;
//!
//! // Create output BAM writer (path, header, threads, compression_level)
//! let mut writer = create_bam_writer("output.bam", &header, 1, 6)?;
//! # Ok(())
//! # }
//! ```
//!
//! ### Validating Input Files
//!
//! ```no_run
//! use fgumi_lib::validation::validate_file_exists;
//!
//! # fn main() -> anyhow::Result<()> {
//! // Validate input files exist with clear error messages
//! validate_file_exists("input.bam", "Input BAM")?;
//! validate_file_exists("reference.fa", "Reference FASTA")?;
//! # Ok(())
//! # }
//! ```
//!
//! ### Progress Tracking
//!
//! ```no_run
//! use fgumi_lib::progress::ProgressTracker;
//!
//! # fn main() -> anyhow::Result<()> {
//! let tracker = ProgressTracker::new("Processing records")
//! .with_interval(100);
//!
//! for _i in 0..1000 {
//! // Process one record...
//! tracker.log_if_needed(1); // Track incremental progress
//! }
//! tracker.log_final(); // Log final count if not exactly on interval
//! # Ok(())
//! # }
//! ```
//!
//! ### UMI Assignment
//!
//! ```
//! use fgumi_lib::umi::{IdentityUmiAssigner, UmiAssigner};
//!
//! let assigner = IdentityUmiAssigner::default();
//! let umis = vec!["ACGTACGT".to_string(), "ACGTACGT".to_string(), "TGCATGCA".to_string()];
//! let assignments = assigner.assign(&umis);
//! // With identity assignment, each unique UMI gets its own molecule ID
//! // So we have 2 unique molecule IDs (ACGTACGT and TGCATGCA)
//! assert_eq!(assignments.iter().collect::<std::collections::HashSet<_>>().len(), 2);
//! ```
//!
//! ## Feature Highlights
//!
//! - **Type-safe BAM I/O** - Headers always paired with readers
//! - **Consistent validation** - Standardized error messages
//! - **Progress tracking** - Uniform logging across tools
//! - **Module organization** - Related functionality grouped logically
//! - **Comprehensive testing** - Extensive test suite ensuring correctness
//!
//! ## Architecture
//!
//! The library follows these design principles:
//!
//! - **Separation of concerns** - Modules have clear, focused responsibilities
//! - **Backward compatibility** - Re-exports maintain existing APIs
//! - **Testability** - Comprehensive unit and integration tests
//! - **Documentation** - All public items documented with examples
//!
//! ## Contributing
//!
//! When adding new functionality:
//!
//! 1. Add to appropriate module group (sam, umi, consensus, etc.)
//! 2. Include comprehensive documentation and examples
//! 3. Add unit tests covering edge cases
//! 4. Maintain backward compatibility via re-exports
//!
//! ## See Also
//!
//! - [fgbio](https://github.com/fulcrumgenomics/fgbio) - Scala implementation
//! - [noodles](https://github.com/zaeleus/noodles) - Rust bioinformatics I/O
pub use reader as bgzf_reader;
pub use writer as bgzf_writer;
pub use bitenc;
pub use clipper;
pub use dna;
pub use phred;
pub use rejection;
// Re-export rejection tracking types for convenient access
pub use RejectionReason;
// Re-export commonly used SAM items for backward compatibility
pub use alignment_tags;
// Re-export UMI items for backward compatibility
pub use assigner;
// Re-export consensus items for backward compatibility
pub use caller as consensus_caller;
pub use duplex_caller as duplex_consensus_caller;
pub use filter as consensus_filter;
pub use overlapping as overlapping_consensus;
pub use simple_umi as simple_umi_consensus;
pub use tags as consensus_tags;
pub use vanilla_caller as vanilla_consensus_caller;