Skip to main content

Crate fgumi_lib

Crate fgumi_lib 

Source
Expand description

§fgumi - Fulcrum Genomics UMI Tools Library

This library provides core functionality for working with Unique Molecular Identifiers (UMIs) in sequencing data, including grouping, consensus calling, and quality filtering.

§Overview

The fgumi library is organized into several key modules:

§Core Functionality

  • umi - UMI assignment strategies (identity, edit-distance, adjacency, paired)
  • consensus - Consensus calling algorithms (simplex, duplex, vanilla)
  • sam - SAM/BAM file utilities and alignment tag manipulation

§Utilities

  • bam_io - BAM file I/O helpers for reading and writing
  • validation - Input validation utilities for parameters and files
  • progress - Progress tracking and logging
  • logging - Enhanced logging utilities with formatting
  • metrics - Structured metrics types and file writing utilities
  • rejection - Rejection reason tracking and statistics

§Specialized Modules

  • clipper - Read clipping for overlapping pairs
  • template - Template-based read grouping
  • reference - Reference genome handling

§Quick Start

§Reading and Writing BAM Files

use fgumi_lib::bam_io::{create_bam_reader, create_bam_writer};

// Open input BAM and get header (path, threads)
let (mut reader, header) = create_bam_reader("input.bam", 1)?;

// Create output BAM writer (path, header, threads, compression_level)
let mut writer = create_bam_writer("output.bam", &header, 1, 6)?;

§Validating Input Files

use fgumi_lib::validation::validate_file_exists;

// Validate input files exist with clear error messages
validate_file_exists("input.bam", "Input BAM")?;
validate_file_exists("reference.fa", "Reference FASTA")?;

§Progress Tracking

use fgumi_lib::progress::ProgressTracker;

let tracker = ProgressTracker::new("Processing records")
    .with_interval(100);

for _i in 0..1000 {
    // Process one record...
    tracker.log_if_needed(1);  // Track incremental progress
}
tracker.log_final();  // Log final count if not exactly on interval

§UMI Assignment

use fgumi_lib::umi::{IdentityUmiAssigner, UmiAssigner};

let assigner = IdentityUmiAssigner::default();
let umis = vec!["ACGTACGT".to_string(), "ACGTACGT".to_string(), "TGCATGCA".to_string()];
let assignments = assigner.assign(&umis);
// With identity assignment, each unique UMI gets its own molecule ID
// So we have 2 unique molecule IDs (ACGTACGT and TGCATGCA)
assert_eq!(assignments.iter().collect::<std::collections::HashSet<_>>().len(), 2);

§Feature Highlights

  • Type-safe BAM I/O - Headers always paired with readers
  • Consistent validation - Standardized error messages
  • Progress tracking - Uniform logging across tools
  • Module organization - Related functionality grouped logically
  • Comprehensive testing - Extensive test suite ensuring correctness

§Architecture

The library follows these design principles:

  • Separation of concerns - Modules have clear, focused responsibilities
  • Backward compatibility - Re-exports maintain existing APIs
  • Testability - Comprehensive unit and integration tests
  • Documentation - All public items documented with examples

§Contributing

When adding new functionality:

  1. Add to appropriate module group (sam, umi, consensus, etc.)
  2. Include comprehensive documentation and examples
  3. Add unit tests covering edge cases
  4. Maintain backward compatibility via re-exports

§See Also

  • fgbio - Scala implementation
  • noodles - Rust bioinformatics I/O

Modules§

alignment_tags
Alignment tag regeneration (NM, UQ, MD) after base masking.
assigner
UMI Assignment Strategies
bam_io
BAM file I/O utilities.
batched_sam_reader
Adaptive buffered SAM reader that grows based on observed batch sizes.
bgzf_reader
Raw BGZF block reading and decompression.
bgzf_writer
BGZF compression utilities for BAM output.
bitenc
A 2-bit DNA encoding for fast UMI comparison.
clipper
Read clipping utilities for BAM/SAM records.
consensus
Consensus calling and filtering for UMI-based molecular consensus reads.
consensus_caller
Consensus Calling Infrastructure
consensus_filter
Consensus read filtering logic.
consensus_tags
Consensus-related SAM tags for reads generated by consensus calling tools.
dna
DNA sequence utilities.
duplex_consensus_caller
Duplex Consensus Calling
errors
Custom error types for fgumi operations.
fastq
FASTQ file parsing and read structure handling.
grouper
Grouper implementations for the 9-step pipeline.
header
Utilities for adding @PG (program) records to SAM headers.
logging
Enhanced logging utilities for formatted output.
metrics
Metrics collection and reporting for fgumi operations.
mi_group
Molecular Identifier (MI) group utilities for streaming BAM processing.
overlapping_consensus
Overlapping bases consensus caller for paired-end reads.
phred
Phred score utilities and probability calculations.
progress
Progress tracking utilities
read_info
Data structures for tracking read position information.
reference
Reference genome FASTA reading with all sequences loaded into memory.
rejection
Rejection reason tracking for reads and templates.
reorder_buffer
Reordering buffer for out-of-order batch completion.
sam
SAM/BAM file utilities and header manipulation.
simple_umi_consensus
Simple UMI consensus calling for metrics collection.
sort
High-performance BAM sorting module.
tag_reversal
Per-base tag reversal for negative-strand reads.
template
Template data structure for grouping reads by query name.
umi
UMI (Unique Molecular Identifier) utilities
unified_pipeline
Unified thread pool pipeline for --threads N mode.
validation
Input validation utilities
vanilla_consensus_caller
Vanilla UMI consensus calling implementation.
variant_review
Support for reviewing consensus variants

Enums§

RejectionReason
Reasons why a read or template was rejected during processing.