edlib_rs 0.1.0

A rust interface to the C++ edlib library
docs.rs failed to build edlib_rs-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: edlib_rs-0.1.2

edlib_rs

This crate provides a Rust interface to the Edlib C++ library by Martin Šošić. See Martinsos-edlib

The reference paper is :

Martin Šošić, Mile Šikić; Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 2017 [btw753. doi] https://doi.org/10.1093/bioinformatics/btw753

The crate offers 2 interfaces to edlib.
The first, accessed via module bindings, is direcly the interface generated by the bindgen crate.
The second, accessed via module edlibrs, provides a more idiomatic Rust interface. It comes at the cost of cloning information stored in pointers startLocations and endLocations in C struct EdlibAlignResult to get a Rust struct EdlibAlignResultRs with Option<Vec<u8>> fields instead of pointers. The cigar string representation is also cloned when computed.
As a consequence memory management is fully transferred to Rust.
Structures and functions have the same name as in edlib with just "Rs" appended to original names.

Example

For the edlibrs interface we have for example:

in normal mode:

    use edlib_rs::edlibrs::*;
    ...
    let query = "ACCTCTG";
    let target = "ACTCTGAAA";
    let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &EdlibAlignConfigRs::default());
    assert_eq!(align_res.status, EDLIB_STATUS_OK);
    assert_eq!(align_res.editDistance, 4);

in the infix mode :

    use edlib_rs::edlibrs::*;
    ...
    let query = "ACCTCTG";
    let target = "TTTTTTTTTTTTTTTTTTTTTACTCTGAAA";
    //
    let mut config = EdlibAlignConfigRs::default();
    config.mode = EdlibAlignModeRs::EDLIB_MODE_HW;
    let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &config);
    assert_eq!(align_res.editDistance, 1);

Installation

The crate relies on the C++ edlib library being installed and compiled as described in edlib documentation.
Before running cargo build (or cargo install) the environment variable EDLIB_DIR must be set to where the original C++ edlib directory was cloned. This is necessary for the build.rs step of Cargo to access the edlib library includes. Also libstdc++ must be in your path.
The crate enables a logger to monitor the call to the C-interface which is by default set in Cargo.toml to info for release mode and trace for debug mode, but can changed by setting the variable RUST_LOG (see env_logger doc).

Tests

Some tests in module edlib.rs can serve as basic examples. Please note that cargo test must be run with variable EDLIB_DIR set. In directory examples there is also a small version of the edlib edaligner module (see apps/aligner in edlib installation dir) which runs on Fasta files containing only one sequence as contained in the edlib directory test_data. Contrary to the edlib version the module given a query and a target sequence runs the 3 modes (normal/NW, prefix/SHW and infix/HW) in one pass.

With RUST_LOG=info ./target/release/examples/edaligner --dirdata "$edlibpath/test_data/Enterobacteria_Phage_1" --tf "Enterobacteria_phage_1.fasta" --qf "mutated_90_perc.fasta"

we get the following timing in release mode for Enterobacteria_phage_1.fasta as target sequence and mutated_90_perc.fasta as query sequence.

mode edlibrs time(s) edlib time(s) distance
NW 0.106 0.106 9506
SHW 0.184 0.191 9502
HW 0.682 0.695 9502

We get the following timing in release mode for Enterobacteria_phage_1.fasta as target sequence and mutated_60_perc.fasta as query sequence.

mode edlibrs time(s) edlib time(s) distance
NW 0.398 0.398 39829
SHW 0.670 0.684 39828
HW 1.182 1.206 39828

Except for infinitesimal variations of cpu time measurement we see we have the same computation times.

License

Licensed under either of

at your option.

This software was written on my own while working at CEA, CEA-LIST