RustSASA
โก Ludicrously fast Rust crate for protein solvent accessible surface area (SASA) calculations - 63x faster than Biopython, 5x faster than FreeSASA. Pure Rust with Python bindings & CLI. Implements Shrake-Rupley algorithm [1].
Features:
- ๐ฆ Written in Pure Rust.
- โก๏ธ Ludicrously fast. 63X faster than Biopython, 14X faster than mdakit_sasa, and 5X faster than Freesasa.
- ๐งช Full test coverage.
- ๐ Python support.
- ๐ค Command line interface.
Table of Contents
Installation
Rust ๐ฆ
cargo add rust-sasa
Python ๐
pip install rust-sasa-python
MDAnalysis package
pip install mdsasa-bolt
Command-line interface ๐ค
1. Install Cargo Bin Install
curl -L --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/cargo-bins/cargo-binstall/main/install-from-binstall-release.sh | bash
2. Install rust-sasa
cargo binstall rust-sasa
Quick start
Using in Rust ๐ฆ
use StrictnessLevel;
use ;
let = open.unwrap;
let result = new.process;
Full documentation can be found here.
Using in Python ๐
You can now utilize RustSasa within Python to speed up your scripts! Take a look at rust-sasa-python!
# Simple calculation - use convenience function
=
See full docs here.
Using CLI ๐ค
Processing single file
rust-sasa path_to_pdb_file.pdb output.json # Also supports .xml, .pdb, and .cif!
Processing an entire directory
rust-sasa input_directory/ output_directory/ --format json # Also supports .xml, .pdb, and .cif!
Using with MDAnalysis
RustSASA can be used with MDAnalysis to calculate SASA for a protein in a trajectory. RustSASA is 17x faster than mdakit_sasa.
# Load your trajectory
=
# Create SASA analysis
=
# Run the analysis
# Access results
See the mdsasa-bolt package for more information.
Benchmarking
Results:
-
RustSasa: 5.237 s ยฑ 0.049 s
-
Freesasa: 28.042 s ยฑ 2.269 s
-
Biopython: 368.025 s ยฑ 51.156 s
Methodology:
We computed residue level SASA values for the entire AlphaFold E. coli proteome structure database using RustSASA, Freesasa, and Biopython. Benchmarks were run with Hyperfine with options: --warmup 3 --runs 3. All three methods ran across 8 cores on an Apple M3 Macbook with 24GB of unified memory. The RustSASA CLI was used to take advantage of profile guided optimization. GNU Parallel was used to run Freesasa and Biopython in parallel.
Validation against Freesasa
Other
License
MIT
Version 0.6.0 (Latest update)
- RustSASA now excludes hydrogens by default and uses ProtOr radii for improved accuracy. Hydrogens can be included by passing
--include-hydrogens(or API equivalent). If you include hydrogen atoms, you should also provide a custom atomic radii file designed to work with the hydrogens you are including. See README and documentation for more information. - Improved error handling and reduced code duplication.
Building from source
First, make sure you have the Rust compiler installed. See https://rust-lang.org/tools/install/ for installation instructions.
To build RustSASA from source start by initializing git submodules with the following command:
git submodule update --init
Then build the binary with:
cargo build --release
Contributing
Contributions are welcome! Please feel free to submit pull requests and open issues. As this is an actively developed library, I encourage sharing your thoughts, ideas, suggestions, and feedback.
Hydrogen handling
By default, RustSASA strips hydrogen atoms and uses the ProtOr radii config. If you want to include hydrogens, you can use the CLI argument --include-hydrogens. If you do so, you should provide your own atom radii config designed to work with hydrogens. Custom radii configs can be provided with --radii-file. --radii-file accepts a Freesasa style .config file see configs here.
How to cite
If you use the RustSASA library in your publication please cite it. To cite this reposity scroll up to the top of this page, and then click on the "Cite this repository" button in the right hand GitHub side bar. This will give you a citation in your desired format (i.e: BiBTeX, APA).
Citations:
1: Shrake A, Rupley JA. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J Mol Biol. 1973 Sep 15;79(2):351-71. doi: 10.1016/0022-2836(73)90011-9. PMID: 4760134.