1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
/*!
Provides the `Reference` struct, a database for multiple targeted sequences.

## Features

- The `Reference` struct operates as a central repository for multiple target sequences, primarily for use by the `Aligner` to perform alignments.
- During alignment, `Reference` remains immutable. It manages sequences through the `SequenceBuffer` defined in `SequenceStorage`.
- The search range for the `Reference` can be tailored to specific needs.

## Architecture

The `Reference` encapsulates types conforming to the `SequenceStorage` and `PatternIndex` traits. Construction of a `Reference` requires specification of structs that implement these traits. While SigAlign offers default implementations in the 'sequence_storage' and 'pattern_index' modules, custom implementations are supported.

Upon the `Reference` structure's creation, direct access to `SequenceStorage` and `PatternIndex` implementations is restricted, with management handled by the `Aligner`.

### Trait `SequenceStorage`

`SequenceStorage` is responsible for returning a target sequence when provided with its index. SigAlign remains agnostic to the storage and retrieval methods for sequences; these could be held in memory, stored in a file, or located in a remote physical location accessible over a network.

`SequenceStorage` is designed to fetch a target sequence based on a given target index. SigAlign remains indifferent to the sequence storage and retrieval methods, which could be memory, file-based, or located in a remote physical space connected over a network.

### Trait `PatternIndex`

`PatternIndex` accepts pattern bytes and returns the indices of the targets exactly matching the pattern. The performance of `PatternIndex` significantly influences overall performance, and it can vary widely based on implementation details. Thus, `PatternIndex` should be defined differently according to use cases, considering factors like the maximum number of character types that can be indexed, the length of the targets, and the characteristics of the input query.

## Usage

### (1) Constructing `Reference` with `SequenceStorage`

```rust
use sigalign::reference::{
    Reference,
    sequence_storage::in_memory::InMemoryStorage,
    pattern_index::lfi::{Lfi32B2V64, LfiOption},
};

// (1) Define the SequenceStorage
let mut sequence_storage = InMemoryStorage::new();
sequence_storage.add_target(
    "target_1",
    b"AAAA...AAA",
);
sequence_storage.add_target(
    "target_2",
    b"CCCC...CCC",
);

// (2) Set options for PatternIndex
let pattern_index_option = LfiOption::new(2, 4, true);

// (3) Construct Reference
let reference = Reference::<Lfi32B2V64, InMemoryStorage>::new(
    sequence_storage,
    pattern_index_option,
).unwrap();
```

### (2) Performing Alignment

#### Use `Aligner`

```rust
let result = aligner.align_query(
    &reference,
    b"AA...CC",
);

let result = aligner.align_fasta_file(
    &reference,
    "FASTA_FILE_PATH",
);
```

#### Directly manipulate `SequenceBuffer`

```rust
let mut sequence_buffer = reference.get_sequence_buffer();
for query in [b"AA...CC", b"GG...TT"] {
    aligner.alignment(
        &reference,
        &mut sequence_buffer,
        query,
    );
}
```

### (3) Additional Features

#### Adjusting search range

```rust
let mut reference = reference;
// Perform alignment only on targets with index 0 and 1
reference.set_search_range(vec![0, 1]).unwrap();
```

#### Saving and Loading `Reference`

```rust
use sigalign::reference::extensions::Serialize;
// Save
reference.save_to(&mut buffer).unwrap();
// Load
let reference = Reference::<Lfi32B2V64, InMemoryStorage>::load_from(&buffer[..]).unwrap();
```
*/

// Internal components
mod pattern_index;
mod sequence_storage;
// Implementations
mod pattern_locate; // Implements the `BufferedPatternLocater` trait.
mod debug;
// Extensions for additional features
pub mod extensions;

pub use pattern_index::PatternIndex;
pub use sequence_storage::SequenceStorage;
pub use crate::core::{PatternLocation, SequenceBuffer};

/// A database for multiple target sequences.
#[derive(Debug)]
pub struct Reference<I, S> where
    I: PatternIndex,
    S: SequenceStorage,
{
    target_boundaries: Vec<u32>,
    pattern_index: I,
    sequence_storage: S,
}

impl<I, S> Reference<I, S> where
    I: PatternIndex,
    S: SequenceStorage,
{
    pub fn new(
        sequence_storage: S,
        pattern_index_option: I::Option,
    ) -> Result<Self, I::BuildError> {
        let (concatenated_sequence, target_boundaries) = sequence_storage.get_concatenated_sequence_with_boundaries_of_targets();
        let pattern_index = I::new(concatenated_sequence, pattern_index_option)?;

        Ok(Self {
            target_boundaries,
            pattern_index,
            sequence_storage,
        })
    }
    pub fn get_sequence_storage(&self) -> &S {
        &self.sequence_storage
    }
    pub fn get_pattern_index(&self) -> &I {
        &self.pattern_index
    }
}