cosmolkit 0.2.7

Rust-native cheminformatics and structural biology toolkit for molecules, SMILES, SDF, molecular graphs, conformers, and AI-ready workflows
Documentation

COSMolKit Rust

cosmolkit is the Rust facade crate for COSMolKit. It re-exports the molecular model, chemistry operations, file I/O, fingerprints, drawing, batch helpers, and protein structure APIs from cosmolkit-core.

Documentation

Installation

cargo add cosmolkit

Quick Start

use cosmolkit::{Molecule, SmilesWriteParams};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mol = Molecule::from_smiles("CCO")?;
    let mol = mol.with_2d_coordinates()?;

    let smiles = mol.to_smiles_with_params(&SmilesWriteParams::default())?;
    let svg = mol.to_svg(300, 300)?;

    println!("{smiles}");
    println!("{}", svg.len());
    Ok(())
}

Molecule Operations

Normal Molecule operations return new values and leave the receiver unchanged:

let mol = Molecule::from_smiles("CCO")?;
let with_h = mol.with_hydrogens()?;
assert_ne!(mol.num_atoms(), with_h.num_atoms());

In-place operations are explicit and always end with _:

let mut mol = Molecule::from_smiles("CCO")?;
mol.add_hydrogens_()?;
mol.sanitize_()?;

The trailing underscore is reserved for in-place mutation on public Molecule methods; it has no other meaning. In-place operations prioritize avoiding the operation-system working-copy clone when molecule blocks are uniquely owned. If an in-place operation returns an error, the receiver is not guaranteed to equal its pre-call value; use the non-mutating operation when failure-preserving value semantics are required.

Protein Structures

use cosmolkit::Protein;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let protein = Protein::from_pdb("1crn.pdb")?;
    let summary = protein.selection_summary();

    println!("chains: {}", summary.chains);
    println!("residues: {}", summary.residues);
    println!("atoms: {}", summary.atoms);
    Ok(())
}

Batch Workflows

use cosmolkit::{BatchErrorMode, MoleculeBatch};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let smiles = vec![
        "CCO".to_string(),
        "c1ccccc1".to_string(),
        "CC(=O)O".to_string(),
    ];

    let batch = MoleculeBatch::from_smiles_list(&smiles)
        .with_parallel_jobs(Some(8))
        .with_2d_coordinates(BatchErrorMode::Strict)?;

    let out = batch.to_smiles_list(BatchErrorMode::Strict)?;
    println!("{out:?}");
    Ok(())
}

Conformer Generation And Force Field Applications

Native conformer generation uses RDKit-aligned distance-geometry parameters. The default value-style molecule operation uses ETKDGv3 and returns a new molecule value. Multi-conformer generation supports deterministic seeded runs, RMS pruning, and sequential seed expansion:

use cosmolkit::{EmbedParameters, Molecule};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let molecule = Molecule::from_smiles("CC(=O)NC")?.with_hydrogens()?;

    let embedded = molecule.with_3d_conformer()?;
    println!("{}", embedded.conformers_3d().len());

    let mut params = EmbedParameters::etkdg();
    params.random_seed = 123;
    params.num_threads = 1;
    params.prune_rms_thresh = 0.5;

    let pruned = molecule.with_3d_conformers_with_params(5, params)?;
    println!("{}", pruned.conformers_3d().len());
    Ok(())
}

Force-field APIs operate on molecules with existing 3D conformers and return new molecule values, so the input coordinates are left unchanged.

use cosmolkit::{
    Molecule, mmff_has_all_molecule_params, mmff_optimize_molecule,
    uff_has_all_molecule_params, uff_optimize_molecule,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let molecule = Molecule::from_smiles("CCO")?.with_hydrogens()?.sanitize()?;

    let mut builder = molecule.to_builder();
    builder.add_3d_conformer(vec![
        [0.000, 0.000, 0.000],
        [1.540, 0.000, 0.000],
        [2.100, 1.200, 0.000],
        [-0.600, 0.900, 0.000],
        [-0.600, -0.900, 0.000],
        [0.000, 0.000, 1.000],
        [1.900, -0.900, 0.000],
        [1.700, 0.000, 1.000],
        [2.900, 1.200, 0.000],
    ])?;
    let molecule = builder.build()?;

    if uff_has_all_molecule_params(&molecule)? {
        let result = uff_optimize_molecule(&molecule, 200, 10.0, -1, true)?;
        println!("UFF energy: {:.6}", result.energy);
    }

    if mmff_has_all_molecule_params(&molecule)? {
        let result = mmff_optimize_molecule(&molecule, "MMFF94", 200, 100.0, -1, true)?;
        println!("MMFF94 needs_more: {}", result.needs_more);
    }

    Ok(())
}

Examples

cargo run -p cosmolkit-core --example smiles_minimal_roundtrip
cargo run -p cosmolkit-core --example draw_svg
cargo run -p cosmolkit-core --example draw_png
cargo run -p cosmolkit-core --example sdf_to_smiles
cargo run -p cosmolkit --example protein_from_pdb
cargo run -p cosmolkit --example read_xyz
cargo run -p cosmolkit --example conformer_generation
cargo run -p cosmolkit --example forcefield_optimization

Development

Core validation should use operation-contract checks:

cargo check -p cosmolkit-core --features op-contracts-strict
cargo test -p cosmolkit-core --features op-contracts-strict
cargo check -p cosmolkit-py
cargo fmt --all

Python binding development:

uv sync --group dev
.venv/bin/maturin develop --manifest-path python/Cargo.toml
.venv/bin/pytest

The facade crate should stay thin. Public Rust APIs should be exposed through cosmolkit or clearly scoped public modules, while molecule mutation continues to go through registered operations in the core.