opensmiles 0.1.3

A SMILES parser following the OpenSMILES specification
Documentation

opensmiles

A fast, correct SMILES parser for Rust, following the OpenSMILES specification.

Crates.io docs.rs License: MIT CI Benchmarks

Features

  • Full OpenSMILES compliance — all 118 elements, organic subset, bracket atoms, rings, branches, stereochemistry
  • Canonical SMILES output via Display (round-trip)
  • Detailed parse errors with character position
  • Optional parallel batch parsing with Rayon
  • Optional Hückel's rule aromaticity validation (4n+2 π-electron check)
  • Zero unsafe code, no C dependencies

Installation

[dependencies]
opensmiles = "0.1"

With optional features:

[dependencies]
opensmiles = { version = "0.1", features = ["parallel", "huckel-validation"] }

Usage

Basic parsing

use opensmiles::parse;

let mol = parse("CCO").unwrap();            // ethanol
let mol = parse("c1ccccc1").unwrap();       // benzene
let mol = parse("[C@H](F)(Cl)Br").unwrap(); // chiral center

// Access atoms
for node in mol.nodes() {
    println!(
        "{} — aromatic: {}, H: {}, charge: {}",
        node.atom().element(),
        node.aromatic(),
        node.hydrogens(),
        node.atom().charge(),
    );
}

// Access bonds
for bond in mol.bonds() {
    println!("{}{} ({:?})", bond.source(), bond.target(), bond.kind());
}

Round-trip to canonical SMILES

Molecule implements Display, which serializes back to a canonical SMILES string:

use opensmiles::parse;

let mol = parse("OCC").unwrap();
println!("{}", mol); // OCC

Error handling

use opensmiles::{parse, ParserError};

match parse("C(C") {
    Ok(mol) => println!("parsed: {}", mol),
    Err(ParserError::UnclosedParenthesis) => eprintln!("missing closing )"),
    Err(e) => eprintln!("parse error: {}", e),
}

Parallel batch parsing

Enable the parallel feature for multi-threaded parsing of large datasets:

use opensmiles::parse_batch;

let dataset = vec!["CCO", "c1ccccc1", "CC(=O)O", /* ... */];
let results = parse_batch(&dataset); // Vec<Result<Molecule, ParserError>>

Benchmark results (4-core CPU):

Batch size Sequential Parallel Speedup
100 76 µs 169 µs 0.45×
1 000 877 µs 396 µs 2.2×
10 000 8.6 ms 2.2 ms 3.9×

For batches smaller than ~500 molecules, sequential is faster due to thread overhead.

See the full benchmark dashboard and sequential vs parallel comparison.

Aromaticity validation (Hückel's rule)

Enable the huckel-validation feature to have parse() reject chemically invalid aromatic rings:

// With the feature enabled, this returns Err(MoleculeError::HuckelViolation)
// for rings that don't satisfy 4n+2 π electrons.
let mol = parse("c1ccccc1").unwrap(); // benzene: 6 π-electrons ✓

The validation API is also available explicitly, without the feature flag:

use opensmiles::{parse, ast::aromaticity::validate_aromaticity};

let mol = parse("c1ccccc1").unwrap();
let checks = validate_aromaticity(&mol);
assert!(checks[0].is_valid);
assert_eq!(checks[0].pi_electrons, Some(6));

Supported SMILES features

Feature Status
All 118 elements in bracket atoms
Organic subset (B C N O P S F Cl Br I) with implicit H
Wildcard *
Isotopes [13C]
Formal charges [NH4+], [Fe-3]
Explicit hydrogen count [CH3]
Atom class [C:1]
Single, double, triple, quadruple bonds
Aromatic bonds :
Directional bonds / \ (E/Z stereochemistry)
Branches () with arbitrary nesting
Ring closures 0–9 and %10%99
Disconnected structures .
Tetrahedral chirality @ @@
Extended chirality @TH, @AL, @SP, @TB, @OH
Kekule aromatic forms
Aromatic bracket symbols [se], [as]
Whitespace terminator

Feature flags

Flag Default Description
parallel off Multi-threaded batch parsing via Rayon
huckel-validation off Reject aromatic rings violating Hückel's 4n+2 rule in parse()

Part of the bigsmiles-rs ecosystem

opensmiles is the SMILES foundation of bigsmiles-rs. The bigsmiles crate extends it with support for polymer notation.

References

License

MIT — see LICENSE.