1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
/*!
Provides the `Reference` struct, a database for multiple targeted sequences.
## Features
- The `Reference` struct operates as a central repository for multiple target sequences, primarily for use by the `Aligner` to perform alignments.
- During alignment, `Reference` remains immutable. It manages sequences through the `SequenceBuffer` defined in `SequenceStorage`.
- The search range for the `Reference` can be tailored to specific needs.
## Architecture
The `Reference` encapsulates types conforming to the `SequenceStorage` and `PatternIndex` traits. Construction of a `Reference` requires specification of structs that implement these traits. While SigAlign offers default implementations in the 'sequence_storage' and 'pattern_index' modules, custom implementations are supported.
Upon the `Reference` structure's creation, direct access to `SequenceStorage` and `PatternIndex` implementations is restricted, with management handled by the `Aligner`.
### Trait `SequenceStorage`
`SequenceStorage` is responsible for returning a target sequence when provided with its index. SigAlign remains agnostic to the storage and retrieval methods for sequences; these could be held in memory, stored in a file, or located in a remote physical location accessible over a network.
`SequenceStorage` is designed to fetch a target sequence based on a given target index. SigAlign remains indifferent to the sequence storage and retrieval methods, which could be memory, file-based, or located in a remote physical space connected over a network.
### Trait `PatternIndex`
`PatternIndex` accepts pattern bytes and returns the indices of the targets exactly matching the pattern. The performance of `PatternIndex` significantly influences overall performance, and it can vary widely based on implementation details. Thus, `PatternIndex` should be defined differently according to use cases, considering factors like the maximum number of character types that can be indexed, the length of the targets, and the characteristics of the input query.
## Usage
### (1) Constructing `Reference` with `SequenceStorage`
```rust
use sigalign::reference::{
Reference,
sequence_storage::in_memory::InMemoryStorage,
pattern_index::lfi::{Lfi32B2V64, LfiOption},
};
// (1) Define the SequenceStorage
let mut sequence_storage = InMemoryStorage::new();
sequence_storage.add_target(
"target_1",
b"AAAA...AAA",
);
sequence_storage.add_target(
"target_2",
b"CCCC...CCC",
);
// (2) Set options for PatternIndex
let pattern_index_option = LfiOption::new(2, 4, true);
// (3) Construct Reference
let reference = Reference::<Lfi32B2V64, InMemoryStorage>::new(
sequence_storage,
pattern_index_option,
).unwrap();
```
### (2) Performing Alignment
#### Use `Aligner`
```rust
let result = aligner.align_query(
&reference,
b"AA...CC",
);
let result = aligner.align_fasta_file(
&reference,
"FASTA_FILE_PATH",
);
```
#### Directly manipulate `SequenceBuffer`
```rust
let mut sequence_buffer = reference.get_sequence_buffer();
for query in [b"AA...CC", b"GG...TT"] {
aligner.alignment(
&reference,
&mut sequence_buffer,
query,
);
}
```
### (3) Additional Features
#### Adjusting search range
```rust
let mut reference = reference;
// Perform alignment only on targets with index 0 and 1
reference.set_search_range(vec![0, 1]).unwrap();
```
#### Saving and Loading `Reference`
```rust
use sigalign::reference::extensions::Serialize;
// Save
reference.save_to(&mut buffer).unwrap();
// Load
let reference = Reference::<Lfi32B2V64, InMemoryStorage>::load_from(&buffer[..]).unwrap();
```
*/
// Internal components
mod pattern_index;
mod sequence_storage;
// Implementations
mod pattern_locate; // Implements the `BufferedPatternLocater` trait.
mod debug;
// Extensions for additional features
pub mod extensions;
pub use pattern_index::PatternIndex;
pub use sequence_storage::SequenceStorage;
pub use crate::core::{PatternLocation, SequenceBuffer};
/// A database for multiple target sequences.
#[derive(Debug)]
pub struct Reference<I, S> where
I: PatternIndex,
S: SequenceStorage,
{
target_boundaries: Vec<u32>,
pattern_index: I,
sequence_storage: S,
}
impl<I, S> Reference<I, S> where
I: PatternIndex,
S: SequenceStorage,
{
pub fn new(
sequence_storage: S,
pattern_index_option: I::Option,
) -> Result<Self, I::BuildError> {
let (concatenated_sequence, target_boundaries) = sequence_storage.get_concatenated_sequence_with_boundaries_of_targets();
let pattern_index = I::new(concatenated_sequence, pattern_index_option)?;
Ok(Self {
target_boundaries,
pattern_index,
sequence_storage,
})
}
pub fn get_sequence_storage(&self) -> &S {
&self.sequence_storage
}
pub fn get_pattern_index(&self) -> &I {
&self.pattern_index
}
}