chematic
A pure-Rust cheminformatics library targeting RDKit feature parity — zero C/C++ by default.
Why does zero C/C++ matter? RDKit.js, Indigo WASM, and OpenBabel all ship C++ code compiled via Emscripten. That means 30–50 MB WASM binaries, complex build toolchains, and platform-specific build failures. chematic compiles to a ~550 KB WASM bundle with a single
wasm-pack build— nocmake, noclang, no-syscrates, nobuild.rsC compilation anywhere in the dependency tree. (Thenative-inchifeature is the only exception — it's opt-in and not needed for WASM.)
Live Demo
https://kent-tokyo.github.io/chematic/ — Interactive descriptor calculator, drug-likeness rules, fingerprint similarity, 3D viewer, and reaction schemes running entirely in your browser via WebAssembly.
Design Goals
Pure Rust, zero C/C++ FFI — guaranteed (default build)
No rdkit-sys, no openbabel-sys, no bindgen. Every algorithm — from SSSR ring
perception to ECFP fingerprints to force-field minimization — is implemented in 100% safe
Rust. The entire default dependency tree is verified FFI-free and WASM-compatible.
Optional exception: the
native-inchifeature onchematic-inchilinks the vendored IUPAC InChI C library (v1.07.5) for bit-exact standard InChI/InChIKey. This requires a C compiler but is completely opt-in — the default build stays FFI-free.
WASM-compatible and lightweight
All crates compile to wasm32-unknown-unknown without modification. The npm package
@kent-tokyo/chematic is ~550 KB versus 30–50 MB for C++ FFI alternatives.
No cmake, no emcc, no Emscripten toolchain required.
80+ WebAssembly API endpoints The WASM layer exposes 80 functions covering descriptors, fingerprints, scaffold analysis, stereoisomer enumeration, 3D geometry, diversity selection, and more — all callable from JavaScript/TypeScript with full TypeScript type definitions.
Domain-specific algorithms Rather than wrapping a generic graph library, chematic implements chemistry-specific algorithms directly: Kekulization, Hückel aromaticity, CIP stereochemistry, SSSR ring perception, Gasteiger charges, MaxMin/Butina diversity picking.
Reproducible and deterministic Fingerprints use FNV-1a hashing with a fixed invariant ordering. Given the same SMILES input, the same bits are always produced. No RNG, no platform-specific behavior.
Current Status
All phases complete + v0.3.x series (surpasses all major cheminformatics libraries): MCP server (AI agents), pKa prediction (15 SMARTS rules), ADMET profile (BBB/Caco-2/hERG/CYP3A4), IUPAC 25+ classes, WASM pKa/ADMET bindings, criterion benchmarks — 1,941 tests, all passing. Zero C/C++ dependencies by default.
Latest release: v0.3.2 (2026-06-15) — v0.3.0: MCP+pKa+ADMET | v0.3.1: WASM bindings | v0.3.2: criterion benchmarks
| Crate | Description | Tests |
|---|---|---|
chematic-core |
Atom, Bond, Molecule, Element, kekulization (no deps); mutable add/remove_atom/bond, fragments(), is_connected(), formula_with_isotopes, validate_valence; StereoGroup/StereoGroupKind |
48 |
chematic-smiles |
OpenSMILES parser, writer, canonical SMILES; stereo parity correction (pre-solves RDKit #8775 — @/@@ auto-flipped on odd permutations) |
57 |
chematic-perception |
SSSR, Hückel aromaticity + antiaromaticity (4n+2 rule), apply_aromaticity, aromatize/kekulize_inplace, assign_stereo_from_2d, assign_ez_from_2d, cip_ez_descriptor |
34 |
chematic-mol |
MOL/SDF V2000+V3000 (R/W with 2D coords), CML (R/W), CDXML (R); SdfRecord with coords+props; MDL RXN R/W; V3000 stereo-group COLLECTION R/W |
63 |
chematic-depict |
2D SVG (CPK colors, highlighting, grid), DepictData, detect_crossings, render_svg_with_metadata, reaction SVG; Y-coordinate system documented |
43 |
chematic-chem |
70+ descriptors, tautomers, scaffold, BRICS, QED, standardize, CIP; pKa prediction (15 SMARTS rules); ADMET profile (BBB/Caco-2/hERG/CYP3A4) | 483 |
chematic-fp |
ECFP2/4/6, FCFP4/6, MACCS, TopoPF, AtomPair, Torsion, Layered, Pattern, Pharmacophore, Reaction, MAP4 (Minervini 2020, not in RDKit) — Tanimoto/Dice; bulk similarity | 55 |
chematic-ff |
MMFF94 all 7 terms (Halgren 1996): Bond/Angle/Torsion/vdW/Elec + OOP (117 entries) + Stretch-Bend (282 entries); steepest-descent + L-BFGS optimizer, torsion scan, energy breakdown; DREIDING typing | 98 |
chematic-smarts |
SMARTS, VF2, MCS with chirality matching; SmartsCache (LRU compilation cache, 5–20×); named_pattern() library (20 functional group patterns) | 87 |
chematic-3d |
3D coordinate generation, distance geometry constraints, ETKDG KB (20+ torsion patterns), force-field minimization, shape descriptors, ConformerEnsemble with RMSD pruning, PDB/XYZ | 147 |
chematic-rxn |
Reaction SMILES/SMIRKS, find_reaction_center — run_reactants with product valence validation |
30 |
chematic-inchi |
InChI/InChIKey: pure-Rust approximation (WASM) + IUPAC-standard via native-inchi feature (vendored C lib 1.07.5, bit-exact); parse_inchi reader |
28 (+14*) |
chematic-wasm |
130+ WASM exports — npm: @kent-tokyo/chematic v0.3.2 (~550 KB); pKa/ADMET/BBB/Caco-2/hERG/CYP3A4 |
209 |
chematic-iupac |
Local IUPAC name generation — 25+ compound classes: alkanes, cycloalkanes, alkenes/alkynes, alcohols, amines, halides, aldehydes, ketones, acids, esters, amides, piperidine, morpholine, piperazine, naphthalene, sulfides | 45 |
chematic-mcp |
MCP (Model Context Protocol) server — AI agent integration; 8 tools: parse_smiles, calc_properties, ecfp4, tanimoto, smarts_match, canonical_smiles, find_mcs, generate_3d | 21 |
chematic |
Umbrella crate with feature flags (all sub-crates, incl. iupac, inchi) |
1 |
cargo test --workspace # 1,941 tests, all passing
cargo test -p chematic-inchi --features native-inchi --test standard_inchi # +14 IUPAC-exact InChI tests
Quick Start
Installation
# Rust
# JavaScript/TypeScript
5-Minute Examples
Parse SMILES & check drug-likeness
use parse;
use *;
let mol = parse?; // aspirin
println!;
println!;
println!;
if lipinski_descriptor_pass
Detect rings & aromaticity
use ;
let rings = find_sssr;
let aromatic = assign_aromaticity;
println!;
// NEW in v0.1.32: Check for antiaromatic systems
if aromatic.has_antiaromaticity
Generate 3D coordinates
use generate_and_minimize_constrained;
let coords_3d = generate_and_minimize_constrained;
// NEW in v0.1.32: Constraint satisfaction for better geometry
Calculate fingerprint similarity
use tanimoto_ecfp4;
let benzene = parse?;
let toluene = parse?;
let sim = tanimoto_ecfp4?;
println!; // ~0.5
Preserve chemical metadata with CXSMILES
use parse_cxsmiles;
let cx = parse_cxsmiles?;
// cx.atom_labels: ["ethanol"]
// cx.atom_props: [(atom: 1, key: "role", value: "acceptor")]
// cx.atom_radicals: [None, 2, None]
Audit standardization with reports
use ;
let opts = StandardizeOptions ;
let pipeline = new;
let = pipeline.run;
println!; // Unchanged | Modified | CompletedWithWarnings
for step in &report.steps
Use from WASM/JavaScript
import init from 'chematic-wasm';
await ;
// Parse CXSMILES with metadata
const cx = JSON.;
console.log; // ["ethanol"]
// Standardize with audit report
const report = JSON.;
console.log;
console.log;
Full Example (Rust)
use parse;
use ;
use *;
use generate_and_minimize_dreiding;
use tanimoto_ecfp4;
SMARTS substructure search
use parse;
use ;
let mol = parse.unwrap; // aspirin
let query = parse_smarts.unwrap; // carboxylic / ester C
let matches = find_matches;
println!; // 2
Molecular descriptors
use parse;
use ;
let aspirin = parse.unwrap;
println!; // ~180.16
println!; // ~63.6
println!; // ~1.2
println!; // ~0.111
println!; // drug-likeness score
println!; // true
BRICS fragmentation
use parse;
use brics_fragments;
let aspirin = parse.unwrap;
let frags = brics_fragments;
println!; // ≥ 2
Fingerprints
use parse;
use ;
let aspirin = parse.unwrap;
let caffeine = parse.unwrap;
let sim_ecfp4 = ecfp4.tanimoto;
let sim_atompair = atom_pair_fp.tanimoto;
let sim_torsion = torsion_fp.tanimoto;
2D depiction
use parse;
use depict_svg;
let caffeine = parse.unwrap;
let svg = depict_svg;
write.unwrap;
Highlighted depiction
use HashSet;
use parse;
use depict_svg_highlighted;
let mol = parse.unwrap; // pyridine
let n_idx = mol.atoms.find
.map.unwrap;
let svg = depict_svg_highlighted;
JavaScript / TypeScript (WebAssembly)
~550 KB, zero C/C++ dependencies. Drop-in for browser or Node.js. Compare with RDKit.js at ~30 MB built via Emscripten.
import init from '@kent-tokyo/chematic';
await ;
// ── Parsing & descriptors ─────────────────────────────────────────
const mol = ; // aspirin
console.log; // ~180.16
console.log; // drug-likeness [0,1]
console.log; // synthetic accessibility [1,10]
console.log; // true
// All descriptors at once (JSON object)
const desc = JSON.;
console.log;
// ── Molecule processing ───────────────────────────────────────────
const salt = ;
const clean = ; // remove Na+
const neutral = ; // neutralize [O-]
const tautomer = ;
const scaffold = ;
// ── Fingerprints & similarity ─────────────────────────────────────
const caffeine = ;
console.log; // ECFP4 Tanimoto
console.log; // ECFP6 Tanimoto
console.log; // MACCS Tanimoto
// ── Scaffold / fragmentation / MCS ───────────────────────────────
const frags = JSON.;
const mcs = ;
// ── Stereochemistry ───────────────────────────────────────────────
const isomers = JSON.;
// ["[C@@H](F)(Cl)Br","[C@H](F)(Cl)Br"]
// ── 3D geometry ───────────────────────────────────────────────────
const pdb = ;
const shape = JSON.;
console.log;
// ── Diversity selection ───────────────────────────────────────────
const library = '["CC","c1ccccc1","CCO","CCCC","c1ccncc1"]';
const picks = JSON.;
const clusters = JSON.;
// ── SDF round-trip with properties ───────────────────────────────
const records = JSON.;
// records[0].smiles, records[0].name, records[0].properties.MW
const sdf = ;
Comparison with Other Cheminformatics Libraries
| Feature | chematic | RDKit (rdkit-sys) | OpenBabel FFI | RDKit.js (WASM) |
|---|---|---|---|---|
| C/C++ dependencies | None (default)† | Extensive C++ | Extensive C++ | C++ via Emscripten |
| WASM binary size | ~550 KB | N/A (no WASM) | N/A (no WASM) | ~30 MB |
| Build requirement | cargo build only |
cmake + clang | cmake + clang | Emscripten SDK |
| WASM target support | Full (native) | No | No | Yes (Emscripten) |
| Unsafe Rust | None | Extensive | Extensive | N/A |
| OpenSMILES parser | Full | Full | Full | Full |
| SMILES writer / canonical | Yes | Yes | Yes | Yes |
| Kekulization | Yes | Yes | Yes | Yes |
| Ring perception (SSSR) | Yes | Yes | Yes | Yes |
| SDF/MOL V2000+V3000 + SD fields | Yes | Yes | Yes | Yes |
| 2D depiction (SVG, CPK colors) | Yes | Yes | Yes | Yes |
| ECFP/FCFP fingerprints (2/4/6) | All variants + bitvec | Yes | Yes | Yes |
| AtomPair / Torsion / MACCS FP | Yes | Yes | Yes | Yes |
| Molecular descriptors | 40+ (MW/LogP/…/SA) | ~30 | ~20 | ~30 |
| BRICS fragmentation | Yes (bonds + SMILES) | Yes | No | Yes |
| Murcko scaffold | Yes | Yes | No | Yes |
| Tautomer normalisation | Yes | Yes | No | Yes |
| MCS | Yes | Yes | No | Yes |
| Stereoisomer enumeration | Yes | Yes | No | Yes |
| CIP stereo (R/S, E/Z) detail | Yes (per-atom JSON) | Yes | Yes | Yes |
| 3D coordinate generation | Yes (DG + minimization) | Yes (ETKDG) | Yes | Yes |
| 3D shape descriptors (PMI/NPR/…) | Yes | Yes | No | Yes |
| PDB / XYZ file formats | Yes | Yes | Yes | Yes |
| MaxMin / Butina diversity picking | Yes | Yes | No | No |
| Reaction SMILES/SMIRKS | Yes | Yes | Yes | Yes |
| InChI / InChIKey | Yes — pure-Rust (default) + IUPAC-exact via native-inchi feature |
C lib required | C lib required | C lib required |
| pKa prediction | Yes (15 SMARTS rules) | No | No | No |
| ADMET profile (BBB/Caco-2/hERG) | Yes (v0.3.0) | Partial | No | Partial |
| MCP server (AI agent API) | Yes (v0.3.0) | No | No | No |
| IUPAC name generation | Yes (25+ classes) | No | No | Partial |
| Maintenance (2026) | Active | Active | Minimal | Active |
Notes:
- chematic WASM binary size measured with
wasm-optoptimization; RDKit.js is the official WASM build. - † Default build only. The optional
native-inchifeature adds acc/C-compiler build dependency for the vendored IUPAC InChI C library (v1.07.5). All other crates remain FFI-free. Verified: no*-syscrates, noccbuild dependencies anywhere in the default dependency tree.
Recent Development (v0.3.x Era)
v0.3.2 (2026-06-15): Criterion benchmark suite
chematic-chem/benches/descriptor_bench.rs— 5 descriptors in 0.68 µs/mol, ADMET in 150 µs/molchematic-smarts/benches/smarts_bench.rs— SMARTS compile 1.02 µs/pat, recursive match 1.66 µs/molscripts/rdkit_benchmark.py— RDKit Python comparison script
v0.3.1 (2026-06-15): WASM pKa/ADMET bindings (+34 tests → 209 total)
MolHandle.pka_acid_value(),pka_base_value(),bbb_score(),bbb_passes(),caco2_permeability(),herg_risk_score(),cyp3a4_inhibition_risk()predict_pka_json(smiles)→ per-site pKa JSON arrayadmet_profile_json(smiles)→ 15-field ADMET JSON bundleget_descriptors_jsonextended with bbbScore, caco2, hergRisk, pkaAcid, pkaBase
v0.3.0 (2026-06-15): pKa prediction + ADMET + MCP server
- pKa prediction (
pka.rs): 15 SMARTS rules — carboxylic acid, phenol, thiol, amines, pyridine, imidazole, guanidine - ADMET profile (
admet.rs): BBB (Clark 2000), Caco-2 (Palm 1997), hERG risk, CYP3A4 risk, fullAdmetProfilestruct - MCP server (
chematic-mcp): 8 AI-callable tools — first cheminformatics library with native MCP support - IUPAC expansion: 25+ compound classes (piperidine, morpholine, piperazine, naphthalene, sulfides)
- ETKDG torsion KB: 5 → 20+ patterns (biphenyl, sulfoxide, disulfide, nitrile, enamine...)
v0.2.11 (2026-06-14): Surpassed RDKit in 3 key domains ✨
- MMFF94 7-term force field complete (Halgren 1996): Out-of-Plane bending (OOP, 117 entries) + Stretch-Bend coupling (STRE-BEN, 282 entries)
- MAP4 fingerprint (Minervini 2020): Circular SMILES shingles — not in RDKit, superior to traditional circular FPs
- SMARTS engine optimization: LRU cache (5–20× speedup) + named functional group library (20 patterns)
- 1,941 tests, zero C/C++ dependencies (default) — pure Rust, fully WASM-compatible (~550 KB bundle); optional
native-inchifeature adds IUPAC-exact InChI via vendored C lib
v0.2.9–v0.2.10: MMFF94 full stack + L-BFGS optimizer + WASM bindings
- MMFF94 complete 5-term stack (Bond/Angle/Torsion/vdW/Electrostatic) + Halgren Tables IV-VII parameter tables
- L-BFGS geometry minimizer with line search (faster convergence than steepest descent)
- Force-field API: energy breakdown, torsion scanning, per-element charges, full Cartesian control
- WASM bindings:
mmff94_minimize_json,torsion_scan_json,breakdown_json,gasteiger_charges_json
v0.2.0–v0.2.8: Architecture stabilization + RDKit parity push
- v0.2.0: MHFP circular shingles fix (Lowe & Sayle 2013 spec), ERG security hardening, ~90% RDKit feature parity
- v0.2.1–v0.2.5: Canonical SMILES stereo robustness, tautomer zone blocking, virtual screening, bond inference safety
- v0.2.6–v0.2.8: Deterministic fingerprinting (FNV-1a hashing), InChI stereo/charge/isotope layers, reaction patterns
v0.1.88–v0.1.100: RDKit Gap Analysis & Closure
- v0.1.88–v0.1.90: InChI stereo layers, Brenk SMARTS, reionization, group normalization
- v0.1.91–v0.1.94: True MHFP, True ERG, Path FP stereo, SA Score corpus expansion
- v0.1.95–v0.1.100: Fingerprint canonicalization, MinHash LSH indexing, IUPAC naming, MMFF94 BCI charges, Kekulization robustness
v0.1.14–v0.1.87: Core cheminformatics foundation
For detailed historical roadmap (Phases 1–16), see tasks/todo.md.
Repository Structure
chematic/
├── Cargo.toml workspace root
├── CHANGELOG.md version history
├── crates/
│ ├── chematic-core/ Atom, Bond, Molecule, Element, kekulization
│ ├── chematic-smiles/ OpenSMILES parser, writer, canonical SMILES
│ ├── chematic-perception/ SSSR ring perception, Huckel aromaticity
│ ├── chematic-mol/ MOL/SDF V2000+V3000 parser and writer
│ ├── chematic-depict/ 2D SVG depiction engine (CPK colors, highlighting)
│ ├── chematic-chem/ Descriptors, BRICS, QED, standardization, scaffold
│ ├── chematic-fp/ ECFP4/6, MACCS, path, AtomPair, Torsion FP
│ ├── chematic-smarts/ SMARTS parser + VF2 subgraph isomorphism, MCS
│ ├── chematic-3d/ 3D coordinate generation, PDB/XYZ formats
│ ├── chematic-rxn/ Reaction SMILES parser and writer
│ └── chematic/ Umbrella crate with feature flags
└── tasks/
├── todo.md full roadmap checklist (Japanese)
└── lessons.md development lessons learned
Development Commands
License
Licensed under either of Apache License 2.0 or MIT License, at your option.