opensmiles
A fast, correct SMILES parser for Rust, following the OpenSMILES specification.
Features
- Full OpenSMILES compliance — all 118 elements, organic subset, bracket atoms, rings, branches, stereochemistry
- Canonical SMILES output via
Display(round-trip) - Detailed parse errors with character position
- Optional parallel batch parsing with Rayon
- Optional Hückel's rule aromaticity validation (4n+2 π-electron check)
- Zero unsafe code, no C dependencies
Installation
[]
= "0.1"
With optional features:
[]
= { = "0.1", = ["parallel", "huckel-validation"] }
Usage
Basic parsing
use parse;
let mol = parse.unwrap; // ethanol
let mol = parse.unwrap; // benzene
let mol = parse.unwrap; // chiral center
// Access atoms
for node in mol.nodes
// Access bonds
for bond in mol.bonds
Round-trip to canonical SMILES
Molecule implements Display, which serializes back to a canonical SMILES string:
use parse;
let mol = parse.unwrap;
println!; // OCC
Error handling
use ;
match parse
Parallel batch parsing
Enable the parallel feature for multi-threaded parsing of large datasets:
use parse_batch;
let dataset = vec!;
let results = parse_batch; // Vec<Result<Molecule, ParserError>>
Benchmark results (4-core CPU):
| Batch size | Sequential | Parallel | Speedup |
|---|---|---|---|
| 100 | 76 µs | 169 µs | 0.45× |
| 1 000 | 877 µs | 396 µs | 2.2× |
| 10 000 | 8.6 ms | 2.2 ms | 3.9× |
For batches smaller than ~500 molecules, sequential is faster due to thread overhead.
See the full benchmark dashboard and sequential vs parallel comparison.
Aromaticity validation (Hückel's rule)
Enable the huckel-validation feature to have parse() reject chemically invalid aromatic rings:
// With the feature enabled, this returns Err(MoleculeError::HuckelViolation)
// for rings that don't satisfy 4n+2 π electrons.
let mol = parse.unwrap; // benzene: 6 π-electrons ✓
The validation API is also available explicitly, without the feature flag:
use ;
let mol = parse.unwrap;
let checks = validate_aromaticity;
assert!;
assert_eq!;
Supported SMILES features
| Feature | Status |
|---|---|
| All 118 elements in bracket atoms | ✅ |
| Organic subset (B C N O P S F Cl Br I) with implicit H | ✅ |
Wildcard * |
✅ |
Isotopes [13C] |
✅ |
Formal charges [NH4+], [Fe-3] |
✅ |
Explicit hydrogen count [CH3] |
✅ |
Atom class [C:1] |
✅ |
| Single, double, triple, quadruple bonds | ✅ |
Aromatic bonds : |
✅ |
Directional bonds / \ (E/Z stereochemistry) |
✅ |
Branches () with arbitrary nesting |
✅ |
Ring closures 0–9 and %10–%99 |
✅ |
Disconnected structures . |
✅ |
Tetrahedral chirality @ @@ |
✅ |
Extended chirality @TH, @AL, @SP, @TB, @OH |
✅ |
| Kekule aromatic forms | ✅ |
Aromatic bracket symbols [se], [as] |
✅ |
| Whitespace terminator | ✅ |
Feature flags
| Flag | Default | Description |
|---|---|---|
parallel |
off | Multi-threaded batch parsing via Rayon |
huckel-validation |
off | Reject aromatic rings violating Hückel's 4n+2 rule in parse() |
Part of the bigsmiles-rs ecosystem
opensmiles is the SMILES foundation of bigsmiles-rs.
The bigsmiles crate extends it with support for polymer notation.
References
- OpenSMILES Specification
- SMILES Formal Grammar (LL(1))
- Weininger, D. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36.
License
MIT — see LICENSE.