Expand description
Reference genome matching engine and scoring algorithms.
This module provides the core matching functionality:
engine::MatchingEngine: Main entry point for finding reference matchesscoring::MatchScore: Detailed similarity scores between a query and referencediagnosis::MatchDiagnosis: Detailed analysis of differences and suggestions
§Matching Algorithm
The matching process uses multiple strategies:
- Signature matching: Exact match via sorted MD5 hash signature
- MD5-based scoring: Jaccard similarity of MD5 checksum sets
- Name+length fallback: When MD5s are missing, uses contig names and lengths
- Order analysis: Detects if contigs are reordered vs. reference
§Scoring
The composite score combines multiple factors:
- MD5 Jaccard: Set similarity of sequence checksums
- Name+Length Jaccard: Set similarity of (name, length) pairs
- Query coverage: Fraction of query contigs matched
- Order score: Fraction of contigs in correct relative order
§Example
use ref_solver::{ReferenceCatalog, MatchingEngine, MatchingConfig, QueryHeader};
use ref_solver::parsing::sam::parse_header_text;
let catalog = ReferenceCatalog::load_embedded().unwrap();
let query = parse_header_text("@SQ\tSN:chr1\tLN:248_956_422\n").unwrap();
let engine = MatchingEngine::new(&catalog, MatchingConfig::default());
let matches = engine.find_matches(&query, 5);
for m in &matches {
println!("{}: {:?} ({:.1}%)",
m.reference.display_name,
m.diagnosis.match_type,
m.score.composite * 100.0
);
}Re-exports§
pub use diagnosis::Suggestion;
Modules§
- diagnosis
- engine
- hierarchical_
engine - Hierarchical matching engine for the new catalog structure.
- scoring