qsar
qsar is a lightweight Rust library for computing common molecular descriptors and integrating descriptor data with Linfa for basic QSAR modeling. The initial release focuses on an exact molecular weight calculator implemented in pure Rust (no native Open Babel / RDKit dependencies), convenient ndarray conversion utilities, and a minimal Linfa example.
This project is developed and maintained by Saw Simeon, author of multiple highly cited works in QSAR, cheminformatics, and structure-based virtual screening.
Selected Publications
-
Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers
P Gomez-Sacristan, S Simeon, VK Tran-Nguyen, S Patil, PJ Ballester
Journal of Advanced Research 67, 185–196 (2025) -
A practical guide to machine-learning scoring for structure-based virtual screening
VK Tran-Nguyen, M Junaid, S Simeon, PJ Ballester
Nature Protocols 18 (11), 3460–3511 (2023) – 68 citations -
Structure-based virtual screening for PDL1 dimerizers: evaluating generic scoring functions
VK Tran-Nguyen, S Simeon, M Junaid, PJ Ballester
Current Research in Structural Biology 4, 206–210 (2022) -
Characterizing the relationship between the chemical structures of drugs and their activities on primary cultures of pediatric solid tumors
S Simeon, G Ghislat, PJ Ballester
Current Medicinal Chemistry 28 (38), 7830–7839 (2021) -
Towards reproducible computational drug discovery
N Schaduangrat, S Lampa, S Simeon, MP Gleeson, O Spjuth et al.
Journal of Cheminformatics 12 (1), 9 (2020) – 207 citations
Full list available on Google Scholar → Saw Simeon
Goals
- Provide a small, easy-to-publish crate for common QSAR descriptor tasks.
- Keep the core pure Rust and dependency-light so it is easy to use across platforms.
- Provide clear extension points for optional, feature-gated integrations with native chemoinformatics toolkits.
Features
- Exact molecular weight calculation from simple SMILES or InChI strings.
- Conversion helpers to prepare descriptor matrices for Linfa.
- Minimal CSV loader helper for descriptor datasets.
- Example demonstrating training and predicting with linfa_linear::LinearRegression.
Quickstart
Add to your Cargo.toml:
[]
= "0.0.1"
Compute molecular weight:
use molecular_weight;
Convert descriptor vectors and train a linear model:
use ;
Loading descriptors from CSV
use read_csv_descriptors;
let = read_csv_descriptors?;
Limitations and roadmap
- The bundled SMILES parser supports a useful subset (linear molecules,
=,#, bracketed atoms with H counts). It does not yet fully support rings, branches, or stereochemistry. This is deliberate to keep the first release small and portable. - Future work:
- Add an optional Cargo feature to enable Open Babel / RDKit FFI bindings for full parsing.
- Add more descriptors (logP, TPSA, rotatable bonds, fingerprinting).
- Add cross-validation and model selection helpers (Linfa pipelines).
- Improve error messages and parsing robustness.
Contributing
Contributions are welcome. Please open issues and pull requests on the repository. The project follows standard Rust contribution practices: format with cargo fmt, lint with cargo clippy, and run tests with cargo test.
Recommended local validation
- cargo fmt
- cargo clippy --all-targets -- -D warnings
- cargo test
- cargo doc --no-deps --open
License This project is dual-licensed under MIT OR Apache-2.0. See the LICENSE file for details.