What is SigAlign?
SigAlign is a library for biological sequence alignment, the process of matching two sequences to identify similarity, which is a crucial step in analyzing sequence data in bioinformatics and computational biology. If you are new to sequence alignment, a quick overview on Wikipedia will be helpful.
SigAlign is a non-heuristic algorithm that outputs alignments satisfying two cutoffs:
- Minimum Length
- Maximum Penalty per Length
In SigAlign, the penalty is calculated based on a gap-affine scheme, which imposes different penalties on mismatches, gap openings, and gap extensions.
Core Purpose
SigAlign is designed to be:
- ⚡️ Fast to collect highly similar alignments
- 💡 Easy to customize and explain results
- 🧱 Small and flexible to be a basic building block for other tools
SigAlign is not intended to:
- Align ultra-long reads
- Search for low similarity alignments
Quick Start Examples
For Rust developer
- As a Rust library, SigAlign can take advantage of the most abundant features in Rust.
- Registered on
crates.io: https://crates.io/crates/sigalign/ - API documentation: https://docs.rs/sigalign/
- Registered on
use ;
// (1) Build `Reference`
let fasta =
br#">target_1
ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA
>target_2
TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC"#;
let reference = new
.set_uppercase // Ignore case
.ignore_base // 'N' is never matched
.add_fasta.unwrap // Add sequences from FASTA
.add_target // Add sequence manually
.build.unwrap;
// (2) Initialize `Aligner`
let algorithm = new.unwrap;
let mut aligner = new;
// (3) Align query to reference
let query = b"CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA";
let result = aligner.align;
println!;
For Python developer
- SigAlign's Python binding is available on PyPI: https://pypi.org/project/sigalign/
- Use
pipto install the package:pip install sigalign
- Use
# (1) Construct `Reference`
=
# (2) Initialize `Aligner`
=
# (3) Execute Alignment
=
=
# (4) Display Results
For Web developer
- SigAlign offers a WebAssembly (WASM) build, opening up the potential for web-based applications. While it is not currently available through package managers such as
npm, plans for web support are in the pipeline. - An exemplary WASM implementation can be found within the
exampledirectory. Below is a TypeScript example showcasing SigAlign's application via this WASM wrapper:
import init, { Reference, Aligner, type AlignmentResult } from '../wasm/sigalign_demo_wasm';
async function run() {
await init();
// (1) Construct `Reference`
const fasta: string = `>target_1
ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA
>target_2
TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC`;
const reference: Reference = await Reference.build(fasta);
// (2) Initialize `Aligner`
const aligner: Aligner = new Aligner(
4, // Mismatch penalty
6, // Gap-open penalty
2, // Gap-extend penalty
50, // Minimum aligned length
0.2, // Maximum penalty per length
);
// (3) Execute Alignment
const query: string = "CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA";
const result: AlignmentResult = await aligner.alignment(query, reference);
// (4) Parse and Display Results
const parsedJsonObj = JSON.parse(result.to_json());
console.log(parsedJsonObj);
}
run();
- To gain further insight into web-based implementation of SigAlign, visit the SigAlign tour page. This page utilizes the WASM wrapper exemplified above.
License
SigAlign is released under the MIT License.
Citation
Bahk, K., & Sung, J. (2024). SigAlign: an alignment algorithm guided by explicit similarity criteria. Nucleic Acids Research, gkae607. https://doi.org/10.1093/nar/gkae607