qsar

qsar is a lightweight Rust library for computing common molecular descriptors and integrating descriptor data with Linfa for basic QSAR modeling. The initial release focuses on an exact molecular weight calculator implemented in pure Rust (no native Open Babel / RDKit dependencies), convenient ndarray conversion utilities, and a minimal Linfa example.

This project is developed and maintained by Saw Simeon, author of multiple highly cited works in QSAR, cheminformatics, and structure-based virtual screening.

Selected Publications

Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers
P Gomez-Sacristan, S Simeon, VK Tran-Nguyen, S Patil, PJ Ballester
Journal of Advanced Research 67, 185–196 (2025)
A practical guide to machine-learning scoring for structure-based virtual screening
VK Tran-Nguyen, M Junaid, S Simeon, PJ Ballester
Nature Protocols 18 (11), 3460–3511 (2023) – 68 citations
Structure-based virtual screening for PDL1 dimerizers: evaluating generic scoring functions
VK Tran-Nguyen, S Simeon, M Junaid, PJ Ballester
Current Research in Structural Biology 4, 206–210 (2022)
Characterizing the relationship between the chemical structures of drugs and their activities on primary cultures of pediatric solid tumors
S Simeon, G Ghislat, PJ Ballester
Current Medicinal Chemistry 28 (38), 7830–7839 (2021)
Towards reproducible computational drug discovery
N Schaduangrat, S Lampa, S Simeon, MP Gleeson, O Spjuth et al.
Journal of Cheminformatics 12 (1), 9 (2020) – 207 citations

Full list available on Google Scholar → Saw Simeon

Goals

Provide a small, easy-to-publish crate for common QSAR descriptor tasks.
Keep the core pure Rust and dependency-light so it is easy to use across platforms.
Provide clear extension points for optional, feature-gated integrations with native chemoinformatics toolkits.

Features

Exact molecular weight calculation from simple SMILES or InChI strings.
Conversion helpers to prepare descriptor matrices for Linfa.
Minimal CSV loader helper for descriptor datasets.
Example demonstrating training and predicting with linfa_linear::LinearRegression.

Quickstart

Add to your Cargo.toml:

[dependencies]
qsar = "0.0.1"

Compute molecular weight:

use qsar::descriptors::molecular_weight;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mw = molecular_weight("CCO")?; // ethanol
    println!("Ethanol exact molecular weight: {}", mw);
    Ok(())
}

Convert descriptor vectors and train a linear model:

use qsar::models::{to_ndarrays, train_and_predict_example};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Example: run the built-in Linfa example
    let pred = train_and_predict_example()?;
    println!("Prediction for sample [5.0, 6.0]: {}", pred[0]);

    // Example: convert your own data
    let descriptors = vec![vec![1.0, 2.0], vec![3.0, 4.0]];
    let targets = vec![3.0, 7.0];
    let (x, y) = to_ndarrays(descriptors, targets)?;
    println!("Features shape: {:?}", x.dim());

    Ok(())
}

Loading descriptors from CSV

use qsar::data_io::read_csv_descriptors;

let (descriptors, targets) = read_csv_descriptors("data/my_dataset.csv", &["mol_wt", "logp"], "pIC50")?;

Limitations and roadmap

The bundled SMILES parser supports a useful subset (linear molecules, =, #, bracketed atoms with H counts). It does not yet fully support rings, branches, or stereochemistry. This is deliberate to keep the first release small and portable.
Future work:
- Add an optional Cargo feature to enable Open Babel / RDKit FFI bindings for full parsing.
- Add more descriptors (logP, TPSA, rotatable bonds, fingerprinting).
- Add cross-validation and model selection helpers (Linfa pipelines).
- Improve error messages and parsing robustness.

Contributing Contributions are welcome. Please open issues and pull requests on the repository. The project follows standard Rust contribution practices: format with cargo fmt, lint with cargo clippy, and run tests with cargo test.

Recommended local validation

cargo fmt
cargo clippy --all-targets -- -D warnings
cargo test
cargo doc --no-deps --open

License This project is dual-licensed under MIT OR Apache-2.0. See the LICENSE file for details.

qsar 0.0.2

qsar

Selected Publications