chem-name-resolver
A pure-Rust library for resolving IUPAC chemical names to SMILES strings and molecular graphs. The Rust equivalent of Java's OPSIN, with WebAssembly support.
Why chem-name-resolver?
Converting an IUPAC name like "2,4-pentanedione" to its SMILES representation "CC(=O)CC(=O)C" sounds simple, but every existing solution comes with a significant trade-off:
| OPSIN | RDKit | OpenBabel | CDK | Indigo | PubChem API | PubChemPy | STOUT v2 | ChemCore | chem-name-resolver | |
|---|---|---|---|---|---|---|---|---|---|---|
| Language | Java | Python/C++ | C++ | Java | C++ | REST | Python | Python/ML | Rust | Rust |
| WASM | ✗ | △ | △ | ✗ | ✓ | ✗ | ✗ | ✗ | △ | ✓ |
| Offline | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | △ | ✓ | ✓ |
| CJK names | ✗ | ✗ | ✗ | ✗ | ✗ | △ | ✗ | ✗ | ✗ | ✓ |
| IUPAC Parser | ✓ (best) | ✗ | ✗ | ✗ | ✗ | Lookup | ✗ | ✓ (neural) | ✗ | ✓ |
| License | MIT | BSD-3 | GPL-2 | LGPL-2.1 | Apache-2 | Public domain | BSD | MIT | MIT | MIT/Apache-2 |
| Notes | JVM required | ~50 MB; C++ toolchain; rdkit-js WASM is subset | C++ FFI; copyleft; WASM experimental | JVM; IUPAC parsing delegates to OPSIN | Official WASM (npm); structure ops only | Network-dependent; 67M+ compounds | Thin REST wrapper | GPU recommended; non-deterministic; model ~GB | Dormant since 2020; incomplete SMILES | Pure Rust; no native deps |
△ = partial / experimental
This library fills the gap: a pure-Rust, WASM-compatible, offline IUPAC→SMILES engine with CJK support. It enables:
- Browser-side chemistry — ship a WASM module and resolve names client-side with zero server round-trips
- Rust-native tooling — integrate into CLI tools, database indexers (e.g. Cheminee), or Axum services without pulling in a JVM or C++ build
- Japanese/Chinese workflows — normalize katakana and kanji chemical names in the same pipeline, without a separate preprocessing step
- Lightweight embedding — the
releaseprofile produces a small binary (opt-level = "s", LTO enabled) suitable for edge deployments
Features
- Pure Rust — no C/C++ dependencies (no RDKit, no Boost)
- WASM-compatible — compiles to
wasm32-unknown-unknown - CJK support — resolves Japanese katakana names (メタン, エタノール, …)
- Zero-copy normalization — returns
Cow::Borrowedwhen input needs no changes - JSON serialization —
ResolveResultimplementsserde::Serialize
Quick Start
use resolve;
// Systematic IUPAC name
let r = resolve.unwrap;
assert_eq!;
assert_eq!;
assert!;
// Trivial name
let r = resolve.unwrap;
assert_eq!;
// Japanese katakana
let r = resolve.unwrap;
assert_eq!;
// n- prefix
let r = resolve.unwrap;
assert_eq!;
// JSON output
let json = to_string.unwrap;
Coverage
Normalizer
| Input | Output |
|---|---|
Fullwidth ASCII (2-) |
Halfwidth (2-) |
Katakana prolonged sound mark (ー) |
Hyphen (-) |
Greek letters (α, β, γ) |
ASCII (alpha, beta, gamma) |
| Consecutive whitespace | Single space |
n- prefix |
Stripped (n-butane → butane) |
Dictionary
| Type | Examples |
|---|---|
| Trivial → IUPAC | acetone, acetic acid, glycerol, formaldehyde, propionic/butyric/valeric acid, … |
| Trivial → SMILES | water, benzene, toluene, ether, chloroform, aspirin, glucose, caffeine |
| iso/sec/tert aliases | isopropanol, isobutane, tert-butanol, neopentane, sec-butanol, … |
| Branched alkanes | isopentane, isohexane (+ IUPAC systematic aliases) |
| Lab abbreviations | MeOH, EtOH, DCM, DMSO, DMF, THF, MeCN (+ full names) |
| Halomethanes | chloromethane, bromomethane, iodomethane, dibromomethane, … |
| Common reagents | ethyl acetate, methyl acetate, MEK (+ full names) |
| Amines | methylamine, dimethylamine, trimethylamine, aniline, triethylamine, … |
| Phenols / aromatics | phenol, anisole, styrene, o/m/p-xylene, mesitylene, … |
| Cyclic compounds | cyclohexane, cyclohexanol, cyclohexanone, cyclopentane, cyclopropane, … |
| Nitro compounds | nitromethane, nitroethane, nitrobenzene |
| Katakana → IUPAC | メタン–デカン, エタノール, アセトン, ベンゼン, … |
IUPAC Parser
Chain stems: methane–decane (C1–C10), undecane–icosane/eicosane (C11–C20)
Suffixes:
| Suffix | Functional group | Example |
|---|---|---|
-ane |
alkane | ethane → CC |
-ene |
alkene | hex-1-ene → C=CCCCC |
-yne |
alkyne | but-2-yne → CC#CC |
-ol / -diol |
alcohol | propan-2-ol → CC(C)O |
-one / -dione |
ketone | propan-2-one → CC(=O)C |
-al |
aldehyde | pentanal → CCCCC=O |
-oic acid / -dioic acid |
carboxylic acid | ethanoic acid → CC(=O)O |
-amine |
amine | ethanamine → CCN |
-amide |
amide | ethanamide → CC(=O)N |
-thiol |
thiol | ethanethiol → CCS |
-nitrile |
nitrile | propanenitrile → CCC#N |
Multiplier prefixes di-, tri-, tetra- are supported for all suffixes.
Substituents:
| Substituent | Atom/group | Example |
|---|---|---|
chloro-, bromo-, fluoro-, iodo- |
halogens | 2-chlorobutane → CC(CC)Cl |
methyl-, ethyl-, propyl-, butyl-, pentyl-, hexyl- |
n-alkyl chains | 3-methylpentane → CCC(C)CC |
hydroxy- |
–OH | — |
oxo- |
=O | — |
amino- |
–NH₂ | 2-aminobutane → CC(CC)N |
mercapto- |
–SH | 3-mercaptopentane → CCC(CC)S |
cyano- |
–C≡N | 2-cyanopentane → CC(C#N)CCC |
acetyl- |
–C(=O)CH₃ | 3-acetylheptane → CCC(C(=O)C)CCCC |
formyl- |
–CHO | 3-formylpentane → CCC(C=O)CC |
Multiplier prefixes di-, tri-, tetra- are supported (e.g. 2,3-dichlorobutane → CC(C(C)Cl)Cl).
Output
molecular_formula and molecular_weight are None when resolved via DirectSmiles (e.g. benzene).
Installation
[]
= "0.1"
# for JSON output
= "1"
Building & Testing
# run all 75 tests
# verify WASM build
# benchmarks
WASM Usage
import init from './chem_name_resolver.js';
await ;
console.log; // "CC(=O)C"
console.log; // "alpha-d-glucose"
// Full result as JSON string
const json = ;
// '{"smiles":"CCO","canonical_name":"ethanol","source":"Dictionary","molecular_formula":"C2H6O","molecular_weight":46.069}'
CLI Usage
# {
# "smiles": "CCO",
# "canonical_name": "ethanol",
# "source": "Dictionary",
# "molecular_formula": "C2H6O",
# "molecular_weight": 46.069
# }
# CC(=O)C
Known Limitations
- Cyclic and aromatic compounds are not parsed (dictionary lookup only)
- Stereochemistry (R/S, E/Z) is not supported
Roadmap
- Branched alkyl substituents (isopropyl, tert-butyl, …)
-
cyclo-prefix (cyclic compounds) - CLI binary (
chem resolve "ethanol") - Chinese/kanji chemical name dictionary
- Canonical SMILES (subtree-signature DFS ordering)
- Python bindings (PyO3 / Maturin)
License
MIT OR Apache-2.0