1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
/*!
# SigAlign
SigAlign is a library for gap-affine sequence alignment tasks guided by explicit similarity cutoffs.
## Quick Start
```rust
use sigalign::{
Aligner,
algorithms::Local,
ReferenceBuilder,
};
// (1) Build `Reference`
let fasta =
br#">target_1
ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA
>target_2
TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC"#;
let reference = ReferenceBuilder::new()
.set_uppercase(true) // Ignore case
.ignore_base(b'N') // 'N' is never matched
.add_fasta(&fasta[..]).unwrap() // Add sequences from FASTA
.add_target(
"target_3",
b"AAAAAAAAAAA",
) // Add sequence manually
.build().unwrap();
// (2) Initialize `Aligner`
let algorithm = Local::new(
4, // Mismatch penalty
6, // Gap-open penalty
2, // Gap-extend penalty
50, // Minimum length
0.2, // Maximum penalty per length
).unwrap();
let mut aligner = Aligner::new(algorithm);
// (3) Align query to reference
let query = b"CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA";
let result = aligner.align(query, &reference);
println!("{:#?}", result);
```
## Core Structures
- `Reference`: A **database** for multiple target sequences.
- **Generated** from `ReferenceBuilder`.
- **Purpose**: Combining multiple sequences into one struct, indexing them to facilitate alignment processes.
- Can be **immutable** while alignment.
- `Aligner`: An **executor** for alignment tasks.
- **Generated** from `Algorithm`.
- **Purpose**: Managing the workspace for alignment tasks.
- Need to be **mutable** while alignment.
## Parameters: Definition of alignment results
- Penalties
- Mismatch penalty (`u32`)
- Gap-open penalty (`u32`)
- Gap-extend penalty (`u32`)
- Cutoffs
- Minimum alignment length (MinL) (`u32`)
- Maximum penalty per alignment length (MaxP) (`f32`)
## Inputs and Outputs
- Inputs
- Query: `&[u8]` (byte array)
- Reference: ref(&) to `Reference`
- Outputs
- `QueryAlignment`: A vector of `TargetAlignment` for each target sequence.
- `TargetAlignment`: A vector of `Alignment` for each alignment.
- Index: Index of the target sequence in Reference.
- Alignment: Alignment results.
- Penalty score
- Length of alignment
- Alignment position
- Operations (Match, Substitution, Insertion, Deletion)
*/
pub use ;
pub use ;