Expand description
§seqwish - A variation graph inducer
Seqwish builds variation graphs from pairwise sequence alignments. It transforms a collection of sequences and their all-to-all alignments into a graph representation that captures the variation between the sequences.
§Overview
The algorithm proceeds in several stages:
- Sequence Indexing - Load and index input sequences
- Alignment Processing - Parse and index PAF alignments
- Transitive Closure - Compute equivalence classes of aligned positions
- Node Compaction - Merge non-bifurcating regions into single nodes
- Link Derivation - Extract edges between nodes
- GFA Emission - Output the variation graph in GFA format
§Example
use seqwish::seqindex::SeqIndex;
use std::sync::{Arc, Mutex};
// Build a sequence index
let mut seqidx = SeqIndex::new();
seqidx.build_index("sequences.fa").unwrap();§Command-line Usage
seqwish -s sequences.fa -p alignments.paf -g output.gfa§Features
- Memory-safe parallel processing
- Disk-backed data structures for scalability
- Produces GFA v1.0 format output
- Compatible with standard pangenome tools
Modules§
- alignments
- cigar
- compact
- dna
- dset64
- dset64_
asm - dset64_
unsafe - gfa
- intervaltree
- Generic interval tree abstraction
- links
- mmap
- paf
- pos
- seqindex
- sxs
- tempfile
- time
- transclosure
- utils
- version
Structs§
- AlnII
Tree Handle - Opaque handle to Alignment IITree (uses Mutex for writing)
- Cigar
Handle - Opaque handle to CIGAR vector
- IITree
Handle - Opaque handle to IITree (for node/path iitrees that use RwLock)
- PafRow
Handle - Opaque handle to a parsed PAF row
- SeqIndex
Handle - Opaque handle to SeqIndex
- SxsHandle
- Opaque handle to a parsed SXS alignment
Functions§
- cigar_
free - Free CIGAR handle
- cigar_
from_ string - Parse CIGAR string and return handle to CIGAR vector Returns NULL on error. Must be freed with cigar_free.
- cigar_
get_ op - Get operation at index Returns false if index out of bounds
- cigar_
length - Get number of operations in CIGAR
- cigar_
to_ string - Convert CIGAR vector to string Returns C string that must be freed with temp_file_free_string
- compact_
compact_ nodes - Compact nodes by marking boundaries in the graph
- dna_
complement - Get complement of a single DNA base
- dna_
reverse_ complement - Reverse complement a DNA sequence (allocates new string that must be freed)
- dna_
reverse_ complement_ in_ place - Reverse complement a DNA sequence in place
- file_
exists - Check if a file exists
- handy_
parameter - Parse a number with optional suffix (k, m, g)
- keep_
sparse - Determine if a match should be kept based on sparsification factor
- match_
hash - Hash function for match parameters
- mmap_
close_ rust - Close a memory-mapped file
- mmap_
open_ rust - Open a file and memory-map it Returns the file size on success, 0 on error The buffer pointer and file descriptor are written to the provided pointers
- paf_
row_ alignment_ block_ length - paf_
row_ cigar - paf_
row_ free - Free a PAF row handle
- paf_
row_ mapping_ quality - paf_
row_ num_ matches - paf_
row_ parse - Parse a PAF row from a C string line Returns NULL if parsing fails
- paf_
row_ query_ end - paf_
row_ query_ sequence_ length - paf_
row_ query_ sequence_ name - paf_
row_ query_ start - paf_
row_ query_ target_ same_ strand - paf_
row_ target_ end - paf_
row_ target_ sequence_ length - paf_
row_ target_ sequence_ name - paf_
row_ target_ start - parse_
paf_ spec - Parse PAF spec string, calling callback for each (filename, weight) pair Callback signature: void callback(void* user_data, const char* filename, uint64_t weight)
- pos_
decr_ pos - Decrement position
- pos_
decr_ pos_ by - Decrement position by N
- pos_
incr_ pos - Increment position
- pos_
incr_ pos_ by - Increment position by N
- pos_
is_ rev - Check if position is reverse
- pos_
make_ pos_ t - Create a position from offset and orientation
- pos_
offset - Extract offset from position
- pos_
rev_ pos_ t - Reverse position orientation
- pos_
to_ string_ c - Convert position to string (returns C string that must be freed)
- seqwish_
rust_ add - Simple test function to verify FFI is working
- seqwish_
rust_ version - Returns the version string of the Rust component
- sxs_
cigar - sxs_
free - Free an SXS handle
- sxs_
is_ good - sxs_
is_ reverse - sxs_
mapping_ quality - sxs_new
- Create a new empty SXS alignment
- sxs_
num_ matches - sxs_
parse_ lines - Parse SXS alignment from array of C strings (lines) Returns NULL if parsing fails
- sxs_
query_ end - sxs_
query_ sequence_ name - sxs_
query_ start - sxs_
target_ end - sxs_
target_ sequence_ name - sxs_
target_ start - temp_
file_ create - Create a temporary file. Returns a C string that must be freed with temp_file_free_string. Returns NULL on error.
- temp_
file_ free_ string - Free a string returned by temp_file functions
- temp_
file_ get_ dir - Get temp directory. Returns a C string that must be freed with temp_file_free_string.
- temp_
file_ remove - Remove a temporary file
- temp_
file_ set_ dir - Set temp directory
- temp_
file_ set_ keep_ temp - Set whether to keep temp files
- time_
since_ epoch_ ms - Get milliseconds since Unix epoch
- transclosure_
compute - Compute transitive closures for variation graph construction