chematic
A pure-Rust cheminformatics library targeting RDKit feature parity — with zero C/C++ dependencies.
Why does zero C/C++ matter? RDKit.js, Indigo WASM, and OpenBabel all ship C++ code compiled via Emscripten. That means 30–50 MB WASM binaries, complex build toolchains, and platform-specific build failures. chematic compiles to a ~550 KB WASM bundle with a single
wasm-pack build— nocmake, noclang, no-syscrates, nobuild.rsC compilation anywhere in the dependency tree.
Live Demo
https://kent-tokyo.github.io/chematic/ — Interactive descriptor calculator, drug-likeness rules, fingerprint similarity, 3D viewer, and reaction schemes running entirely in your browser via WebAssembly.
Design Goals
Pure Rust, zero C/C++ FFI — guaranteed
No rdkit-sys, no openbabel-sys, no cc build dependencies, no bindgen. Every
algorithm — from SSSR ring perception to ECFP fingerprints to force-field minimization —
is implemented in 100% safe Rust. The entire dependency tree is verified FFI-free.
WASM-compatible and lightweight
All crates compile to wasm32-unknown-unknown without modification. The npm package
@kent-tokyo/chematic is ~550 KB versus 30–50 MB for C++ FFI alternatives.
No cmake, no emcc, no Emscripten toolchain required.
80+ WebAssembly API endpoints The WASM layer exposes 80 functions covering descriptors, fingerprints, scaffold analysis, stereoisomer enumeration, 3D geometry, diversity selection, and more — all callable from JavaScript/TypeScript with full TypeScript type definitions.
Domain-specific algorithms Rather than wrapping a generic graph library, chematic implements chemistry-specific algorithms directly: Kekulization, Hückel aromaticity, CIP stereochemistry, SSSR ring perception, Gasteiger charges, MaxMin/Butina diversity picking.
Reproducible and deterministic Fingerprints use FNV-1a hashing with a fixed invariant ordering. Given the same SMILES input, the same bits are always produced. No RNG, no platform-specific behavior.
Current Status
All phases complete. 933 tests, all passing. Zero C/C++ dependencies.
| Crate | Description | Tests |
|---|---|---|
chematic-core |
Atom, Bond, Molecule, Element, kekulization (no deps); mutable add/remove_atom/bond, fragments(), is_connected(), formula_with_isotopes, validate_valence |
48 |
chematic-smiles |
OpenSMILES parser, writer, canonical SMILES | 57 |
chematic-perception |
SSSR, Hückel aromaticity, apply_aromaticity, aromatize/kekulize_inplace, assign_stereo_from_2d |
18 |
chematic-mol |
MOL/SDF V2000+V3000 (R/W), CML (R/W), CDXML (R); SdfRecord with coords+props; MDL RXN R/W |
61 |
chematic-depict |
2D SVG depiction (CPK colors, highlighting, grid), DepictData, suggest_bond_direction, reaction SVG |
39 |
chematic-chem |
40+ descriptors incl. xlogp3, BRICS (BricsConfig), QED, standardization, CIP, IFG, expand_abbreviation |
226 |
chematic-fp |
ECFP2/4/6, FCFP4/6, MACCS 166-bit, TopoPF, AtomPair, Torsion — Tanimoto/Dice | 50 |
chematic-smarts |
SMARTS parser, VF2 (MatchConfig max_matches), MCS (AtomCompare/BondCompare/ring-awareness) |
84 |
chematic-3d |
3D coordinate generation, force-field minimization, shape descriptors, ConformerEnsemble, PDB/XYZ | 68 |
chematic-rxn |
Reaction SMILES/SMIRKS — run_reactants with product valence validation |
28 |
chematic-wasm |
100+ WASM exports — npm: @kent-tokyo/chematic |
162 |
chematic-iupac |
Local IUPAC name generation — pure Rust, no network; alkanes, cycloalkanes, alcohols, amines, halides | 8 |
chematic |
Umbrella crate with feature flags (all sub-crates, incl. iupac) |
1 |
cargo test --workspace # 933 tests, all passing
Quick Start
Using the umbrella crate
# Cargo.toml
[]
= { = "https://github.com/kent-tokyo/chematic", = ["smiles", "fp"] }
use ;
use ecfp4;
Using individual crates
# Cargo.toml
[]
= { = "https://github.com/kent-tokyo/chematic" }
= { = "https://github.com/kent-tokyo/chematic" }
= { = "https://github.com/kent-tokyo/chematic" }
use ;
use ;
use ;
SMARTS substructure search
use parse;
use ;
let mol = parse.unwrap; // aspirin
let query = parse_smarts.unwrap; // carboxylic / ester C
let matches = find_matches;
println!; // 2
Molecular descriptors
use parse;
use ;
let aspirin = parse.unwrap;
println!; // ~180.16
println!; // ~63.6
println!; // ~1.2
println!; // ~0.111
println!; // drug-likeness score
println!; // true
BRICS fragmentation
use parse;
use brics_fragments;
let aspirin = parse.unwrap;
let frags = brics_fragments;
println!; // ≥ 2
Fingerprints
use parse;
use ;
let aspirin = parse.unwrap;
let caffeine = parse.unwrap;
let sim_ecfp4 = ecfp4.tanimoto;
let sim_atompair = atom_pair_fp.tanimoto;
let sim_torsion = torsion_fp.tanimoto;
2D depiction
use parse;
use depict_svg;
let caffeine = parse.unwrap;
let svg = depict_svg;
write.unwrap;
Highlighted depiction
use HashSet;
use parse;
use depict_svg_highlighted;
let mol = parse.unwrap; // pyridine
let n_idx = mol.atoms.find
.map.unwrap;
let svg = depict_svg_highlighted;
JavaScript / TypeScript (WebAssembly)
~550 KB, zero C/C++ dependencies. Drop-in for browser or Node.js. Compare with RDKit.js at ~30 MB built via Emscripten.
import init from '@kent-tokyo/chematic';
await ;
// ── Parsing & descriptors ─────────────────────────────────────────
const mol = ; // aspirin
console.log; // ~180.16
console.log; // drug-likeness [0,1]
console.log; // synthetic accessibility [1,10]
console.log; // true
// All descriptors at once (JSON object)
const desc = JSON.;
console.log;
// ── Molecule processing ───────────────────────────────────────────
const salt = ;
const clean = ; // remove Na+
const neutral = ; // neutralize [O-]
const tautomer = ;
const scaffold = ;
// ── Fingerprints & similarity ─────────────────────────────────────
const caffeine = ;
console.log; // ECFP4 Tanimoto
console.log; // ECFP6 Tanimoto
console.log; // MACCS Tanimoto
// ── Scaffold / fragmentation / MCS ───────────────────────────────
const frags = JSON.;
const mcs = ;
// ── Stereochemistry ───────────────────────────────────────────────
const isomers = JSON.;
// ["[C@@H](F)(Cl)Br","[C@H](F)(Cl)Br"]
// ── 3D geometry ───────────────────────────────────────────────────
const pdb = ;
const shape = JSON.;
console.log;
// ── Diversity selection ───────────────────────────────────────────
const library = '["CC","c1ccccc1","CCO","CCCC","c1ccncc1"]';
const picks = JSON.;
const clusters = JSON.;
// ── SDF round-trip with properties ───────────────────────────────
const records = JSON.;
// records[0].smiles, records[0].name, records[0].properties.MW
const sdf = ;
Comparison with Other Cheminformatics Libraries
| Feature | chematic | RDKit (rdkit-sys) | OpenBabel FFI | RDKit.js (WASM) |
|---|---|---|---|---|
| C/C++ dependencies | None — pure Rust | Extensive C++ | Extensive C++ | C++ via Emscripten |
| WASM binary size | ~550 KB | N/A (no WASM) | N/A (no WASM) | ~30 MB |
| Build requirement | cargo build only |
cmake + clang | cmake + clang | Emscripten SDK |
| WASM target support | Full (native) | No | No | Yes (Emscripten) |
| Unsafe Rust | None | Extensive | Extensive | N/A |
| OpenSMILES parser | Full | Full | Full | Full |
| SMILES writer / canonical | Yes | Yes | Yes | Yes |
| Kekulization | Yes | Yes | Yes | Yes |
| Ring perception (SSSR) | Yes | Yes | Yes | Yes |
| SDF/MOL V2000+V3000 + SD fields | Yes | Yes | Yes | Yes |
| 2D depiction (SVG, CPK colors) | Yes | Yes | Yes | Yes |
| ECFP/FCFP fingerprints (2/4/6) | All variants + bitvec | Yes | Yes | Yes |
| AtomPair / Torsion / MACCS FP | Yes | Yes | Yes | Yes |
| Molecular descriptors | 40+ (MW/LogP/…/SA) | ~30 | ~20 | ~30 |
| BRICS fragmentation | Yes (bonds + SMILES) | Yes | No | Yes |
| Murcko scaffold | Yes | Yes | No | Yes |
| Tautomer normalisation | Yes | Yes | No | Yes |
| MCS | Yes | Yes | No | Yes |
| Stereoisomer enumeration | Yes | Yes | No | Yes |
| CIP stereo (R/S, E/Z) detail | Yes (per-atom JSON) | Yes | Yes | Yes |
| 3D coordinate generation | Yes (DG + minimization) | Yes (ETKDG) | Yes | Yes |
| 3D shape descriptors (PMI/NPR/…) | Yes | Yes | No | Yes |
| PDB / XYZ file formats | Yes | Yes | Yes | Yes |
| MaxMin / Butina diversity picking | Yes | Yes | No | No |
| Reaction SMILES/SMIRKS | Yes | Yes | Yes | Yes |
| InChI / InChIKey | No (C lib required) | Yes | Yes | Yes |
| Maintenance (2026) | Active | Active | Minimal | Active |
Notes:
- chematic WASM binary size measured with
wasm-optoptimization; RDKit.js is the official WASM build. - "None" for C/C++ means verified: no
*-syscrates, noccbuild dependencies, nobuild.rsC compilation in the entire dependency tree.
Roadmap
Phase 1 — Foundation (complete)
Core types, OpenSMILES parse/write, Kekulization, canonical SMILES.
Phase 2 — Molecular Perception (complete)
SSSR, Huckel aromaticity, SDF/MOL V2000+V3000, 2D SVG depiction.
Phase 3 — Chemical Intelligence (complete)
Descriptors (MW, LogP, TPSA, Fsp3, Lipinski), QED, BRICS fragmentation, ECFP4/6 fingerprints, SMARTS+VF2 (recursive SMARTS, valence, hybridization), molecular standardization, Murcko scaffold, CIP R/S and E/Z.
Phase 4 — Similarity and Search (complete)
MACCS 166-bit keys, topological path FP, AtomPair FP, Topological Torsion FP, MCS, tautomer normalization.
Phase 5 — 3D Chemistry (complete)
Rule-based 3D coordinate generation, PDB/XYZ formats, UFF-like minimization.
Phase 6 — RDKit Parity (complete)
Reaction SMILES/SMIRKS ✓, umbrella crate with feature flags ✓,
WASM npm package @kent-tokyo/chematic ✓, CPK coloring + highlighted depiction ✓,
ChEMBL 37 full-set validation (2,897,819 molecules, 100.000%) ✓.
Phase 7 — Extended Descriptors and Diversity (v0.1.14–v0.1.15, complete)
EState indices (Hall & Kier 1991), path fingerprint (DFS path FP, 2048-bit), SDF/MOL WASM bindings, functional group identification (Ertl 2017 IFG), Gasteiger-Marsili PEOE partial charges, VSA descriptors (SlogP_VSA × 12, SMR_VSA × 10, PEOE_VSA × 14), SA score (complexity-based), MaxMin diversity picking, Butina clustering.
Phase 8 — WASM Expansion + Mutable API (v0.1.20–v0.1.22, complete)
100+ WASM exports, CML/CDXML, Mutable Molecule API (with_atom_* / with_bond_*),
DepictData, MMP, R-group decomposition, ConformerEnsemble, SDF/V3000 write,
MCS ring-awareness constraints.
Phase 15 — Mutable API, 2D Stereo, Reaction SVG, RXN format (v0.1.29–32, complete)
Mutable Molecule (add/remove_atom/bond, set_charge/element, fragments, is_connected),
MoleculeBuilder::from_molecule, assign_stereo_from_2d (wedge→R/S), aromatize/kekulize_inplace,
depict_reaction_svg, SdfRecord with coords+properties, MDL RXN V2000 R/W,
expand_abbreviation (30 symbols), formula_with_isotopes.
Phase 14 — XLogP3, IUPAC naming, MCS/BRICS/SMARTS config (v0.1.28, complete)
xlogp3() (Cheng 2007 atom types), chematic-iupac new crate (pure Rust, offline IUPAC naming),
BricsConfig { min_fragment_size }, MatchConfig { max_matches },
McsConfig { atom_compare: AtomCompare, bond_compare: BondCompare } for scaffold hopping.
Phase 13 — MolMetadata builder API (v0.1.27, complete)
MolMetadata::default().with_name("aspirin").with_comment("...") — fluent builder for MOL/SDF metadata.
Phase 12 — atom_color_rgb (v0.1.26, complete)
atom_color_rgb(atomic_number: u8) -> [u8; 3] — CPK color as RGB byte triple, no hex parsing needed.
Phase 11 — Bond Direction Suggestion (v0.1.25, complete)
suggest_bond_direction(mol, atom, layout) -> f64 (radians): chemistry-aware new-bond placement using sp2/sp3 angle offsets + maximum-separation selection. BOND_LEN constant now exported.
Phase 10 — Valence Validation API (v0.1.24, complete)
validate_valence(mol) -> Vec<ValenceError> public API (chematic-core + chematic-perception re-export),
run_reactants now silently filters product sets containing over-valenced atoms.
Phase 9 — Element Radius API + Aromaticity Application (v0.1.23, complete)
Element::vdw_radius() / covalent_radius() (Bondi/Alvarez tables, all 118 elements),
Molecule::implicit_hydrogen_count() / total_formula() (Hill formula with implicit H),
apply_aromaticity() (convert kekulized molecules to aromatic representation),
with_atom_aromatic() / with_bond_order() immutable update API,
minimize_uff() alias for UFF force-field minimization.
See tasks/todo.md for the detailed per-task breakdown.
Repository Structure
chematic/
├── Cargo.toml workspace root
├── CHANGELOG.md version history
├── crates/
│ ├── chematic-core/ Atom, Bond, Molecule, Element, kekulization
│ ├── chematic-smiles/ OpenSMILES parser, writer, canonical SMILES
│ ├── chematic-perception/ SSSR ring perception, Huckel aromaticity
│ ├── chematic-mol/ MOL/SDF V2000+V3000 parser and writer
│ ├── chematic-depict/ 2D SVG depiction engine (CPK colors, highlighting)
│ ├── chematic-chem/ Descriptors, BRICS, QED, standardization, scaffold
│ ├── chematic-fp/ ECFP4/6, MACCS, path, AtomPair, Torsion FP
│ ├── chematic-smarts/ SMARTS parser + VF2 subgraph isomorphism, MCS
│ ├── chematic-3d/ 3D coordinate generation, PDB/XYZ formats
│ ├── chematic-rxn/ Reaction SMILES parser and writer
│ └── chematic/ Umbrella crate with feature flags
└── tasks/
├── todo.md full roadmap checklist (Japanese)
└── lessons.md development lessons learned
Development Commands
License
Licensed under either of Apache License 2.0 or MIT License, at your option.