edlib_rs 0.1.0

A rust interface to the C++ edlib library
# edlib_rs

This crate provides a Rust interface to the Edlib C++ library by Martin Šošić. See [Martinsos-edlib](https://github.com/Martinsos/edlib)

The reference paper is :

Martin Šošić, Mile Šikić; Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 2017 [btw753. doi] <https://doi.org/10.1093/bioinformatics/btw753>

The crate offers 2 interfaces to edlib.  
The first, accessed via module bindings, is direcly the interface generated by the bindgen crate.  
The second, accessed via module edlibrs, provides a more idiomatic Rust interface. It comes at the cost of cloning information stored in pointers startLocations and endLocations in C **struct EdlibAlignResult** to get a Rust **struct EdlibAlignResultRs** with **Option<Vec\<u8\>>** fields instead of pointers. The cigar string representation is also cloned when computed.  
As a consequence memory management is fully transferred to Rust.  
Structures and functions have the same name as in edlib with just "Rs" appended to original names.

## Example

For the edlibrs interface we have for example:

in normal mode:

```rust
    use edlib_rs::edlibrs::*;
    ...
    let query = "ACCTCTG";
    let target = "ACTCTGAAA";
    let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &EdlibAlignConfigRs::default());
    assert_eq!(align_res.status, EDLIB_STATUS_OK);
    assert_eq!(align_res.editDistance, 4);
```

in the infix mode :

```rust
    use edlib_rs::edlibrs::*;
    ...
    let query = "ACCTCTG";
    let target = "TTTTTTTTTTTTTTTTTTTTTACTCTGAAA";
    //
    let mut config = EdlibAlignConfigRs::default();
    config.mode = EdlibAlignModeRs::EDLIB_MODE_HW;
    let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &config);
    assert_eq!(align_res.editDistance, 1);
```

## Installation

The crate relies on the C++ edlib library being installed and compiled as described in edlib documentation.  
Before running cargo build (or cargo install) the environment variable EDLIB_DIR must be set to where the original C++ edlib directory was cloned. This is necessary for the build.rs step of Cargo to access the edlib library includes.
Also libstdc++ must be in your path.  
The crate enables a logger to monitor the call to the C-interface which is by default set in Cargo.toml to *info* for release mode and *trace* for debug mode, but can changed by setting the variable RUST_LOG (see env_logger doc).

## Tests

Some tests in module edlib.rs can serve as basic examples. Please note that cargo test must be run with variable EDLIB_DIR set.
In directory examples there is also a small version of the edlib edaligner module (see apps/aligner in edlib installation dir) which runs on
Fasta files containing only one sequence as contained in the **edlib** directory *test_data*. Contrary to the edlib version the module given a query and a target sequence runs the 3 modes (normal/NW, prefix/SHW and infix/HW) in one pass.

With *RUST_LOG=info ./target/release/examples/edaligner --dirdata  "$edlibpath/test_data/Enterobacteria_Phage_1" --tf "Enterobacteria_phage_1.fasta" --qf "mutated_90_perc.fasta"*

we get the following timing in release mode for Enterobacteria_phage_1.fasta as target sequence  and  mutated_90_perc.fasta as query sequence.

| mode    | edlibrs time(s) | edlib time(s) |  distance |
|  :---:  |     :---:       |  :------:     |  :----:   |
|  NW     |     0.106       |  0.106        |  9506     |
|  SHW    |     0.184       |  0.191        |  9502     |
|  HW     |     0.682       |  0.695        |  9502     |

We get the following timing in release mode for Enterobacteria_phage_1.fasta as target sequence  and  mutated_60_perc.fasta as query sequence.

| mode    | edlibrs time(s) | edlib time(s) |  distance |
|  :---:  |     :---:       |  :------:     |  :----:   |
|  NW     |     0.398       |  0.398        |  39829    |
|  SHW    |     0.670       |  0.684        |  39828    |
|  HW     |     1.182       |  1.206        |  39828    |

Except for infinitesimal variations of cpu time measurement we see we have the same computation times.

## License

Licensed under either of

* Apache License, Version 2.0, [LICENSE-APACHE]LICENSE-APACHE or <http://www.apache.org/licenses/LICENSE-2.0>
* MIT license [LICENSE-MIT]LICENSE-MIT or <http://opensource.org/licenses/MIT>

at your option.

This software was written on my own while working at [CEA](http://www.cea.fr/), [CEA-LIST](http://www-list.cea.fr/en/)