Expand description
§ref-solver
A library for identifying human reference genomes from BAM/SAM/CRAM headers.
When working with alignment files from external sources, it’s often unclear exactly
which reference genome was used. While a reference might be labeled “GRCh38” or “hg19”,
there are dozens of variations with different naming conventions, contig sets, and
sequence versions.
ref-solver solves this by matching the sequence dictionary from your alignment file
against a catalog of known human reference genomes.
§Features
- MD5-based matching: Uses sequence checksums for exact identification
- Fuzzy matching: Falls back to name+length matching when MD5s are missing
- Rename detection: Identifies when files differ only in contig naming
- Order detection: Detects when contigs are reordered vs. reference
- Conflict detection: Identifies problematic differences
- Actionable suggestions: Provides commands to fix issues
§Example
use ref_solver::{ReferenceCatalog, MatchingEngine, MatchingConfig, QueryHeader};
use ref_solver::parsing::sam::parse_header_text;
// Load the embedded catalog of known references
let catalog = ReferenceCatalog::load_embedded().unwrap();
// Parse a SAM header
let header_text = "@SQ\tSN:chr1\tLN:248_956_422\tM5:6aef897c3d6ff0c78aff06ac189178dd\n";
let query = parse_header_text(header_text).unwrap();
// Find matching references
let engine = MatchingEngine::new(&catalog, MatchingConfig::default());
let matches = engine.find_matches(&query, 5);
for m in matches {
println!("{}: {:.1}%", m.reference.display_name, m.score.composite * 100.0);
}§Modules
catalog: Reference catalog storage and indexingcore: Core data types for contigs, references, and headersmatching: Matching engine and scoring algorithmsparsing: Parsers for SAM/BAM/CRAM, dict, and TSV filescli: Command-line interface implementationweb: Web server for browser-based identification
Re-exports§
pub use catalog::store::ReferenceCatalog;pub use core::contig::Contig;pub use core::header::QueryHeader;pub use core::reference::KnownReference;pub use matching::engine::MatchResult;pub use matching::engine::MatchingConfig;pub use matching::engine::MatchingEngine;pub use core::types::*;
Modules§
- catalog
- Reference genome catalog storage and indexing.
- cli
- Command-line interface for ref-solver.
- core
- Core data types for reference genome identification.
- matching
- Reference genome matching engine and scoring algorithms.
- parsing
- Parsers for extracting sequence dictionaries from various file formats.
- utils
- web
- Web server for browser-based reference identification.