chematic
A cheminformatics library for Python, Rust, and the browser.
Cheminformatics that's fast by default, safe by design.
Pure Rust · Zero C/C++ · Python · WebAssembly · Live Demo
| chematic | RDKit (Python) | RDKit.js (WASM) | |
|---|---|---|---|
| Get started | pip install chematic |
conda / cmake required | no Python bindings |
| Browser bundle | 504 KB | not available | ~30 MB (60× larger) |
| Batch fingerprints | 3.6 µs/mol (5–14× faster) | 20–50 µs/mol | — |
| Memory safety | compiler-enforced (Rust) | C++ | C++ |
| Build from source | cargo build only |
cmake + clang + Boost | Emscripten SDK |
All numbers are reproducible — see benchmark details.
WASM sizes: chematic 504 KB · RDKit.js ~30 MB · Indigo WASM ~40 MB
Feature maturity at a glance:
| Feature | Status |
|---|---|
| SMILES / SMARTS / fingerprints / descriptors | Stable |
| 3D conformer generation (DG + MMFF94) | Experimental |
| pKa / ADMET | Rule-based screening (not for clinical use) |
| IUPAC name generation | Partial (25+ classes) |
| Pure-Rust InChI | Approximate (enable native-inchi feature for exact) |
What you get
$ python -c "import chematic; print(chematic.from_smiles('CC(=O)Oc1ccccc1C(=O)O').describe())"
Molecular weight 180.2 Da, formula C9H8O4.
LogP 1.31 (mildly lipophilic), TPSA 63.6 Ų.
HBD 1, HBA 3, 3 rotatable bond(s), 1 aromatic ring(s).
Drug-likeness: no Lipinski rule-of-5 violations. likely orally bioavailable (passes Veber criteria).
QED 0.56 (0 = non-drug-like, 1 = ideal).
Structural alerts: Brenk alert.
One pip install. No RDKit, no conda, no C compiler. Works in Python, Rust, the browser, and AI agents.
# HTML report — self-contained, opens in any browser and renders in Jupyter
=
=
# or: display(report) in Jupyter
# Side-by-side comparison
=
Common Use Cases
| Scenario | How chematic helps |
|---|---|
| HTML report | chematic.report(mols, output="report.html") — self-contained compound grid, no server needed |
| Drug screening | 190+ descriptors, ADMET, PAINS/Brenk, QED — batch over thousands of compounds |
| Molecule search | ECFP4/MACCS fingerprints, Tanimoto, LSH approximate nearest-neighbour |
| AI agent / MCP | Built-in MCP server — Claude Desktop can call chemistry tools directly |
| Browser app | 504 KB WASM bundle, zero backend required, React/Vue/Svelte ready |
| Jupyter notebook | mol renders SVG inline; descriptors_df() returns a pandas DataFrame |
| Batch analysis | Rayon-parallel descriptor/fingerprint/3D pipelines; SDF/CSV in, CSV out |
| Rust server | Pure-Rust crates with no C/C++ toolchain; Axum/Actix compatible |
Full worked examples → Use cases
When to use chematic
Use chematic if:
- You want chemistry in the browser (WASM, 504 KB, no server required)
- You need a pure Rust stack with no C++ toolchain dependencies
- You deploy to environments where
pip install rdkitis impractical (Cloudflare Workers, Lambda, embedded) - You build AI agents and want native MCP tool integration
- You process molecules in batch at high throughput (ECFP4: 5–14× faster than RDKit)
- You want
pip install chematicto just work — anywhere, no compiler needed
Use RDKit if:
- You need maximum ecosystem compatibility and 20+ years of production validation
- You need publication-quality 3D structures with ML-assisted torsion corrections (RDKit's ETKDGv3)
- You need bit-exact standard InChI without enabling the
native-inchifeature - You depend on community plugins written against the RDKit Python API
Quick Start
Installation
# Python — no C/C++ compiler required
# Rust
# JavaScript/TypeScript
Python
= # aspirin
# In Jupyter, type `mol` in a cell — 2D structure renders automatically
# Access 190+ descriptors as properties
# 180.16 1.31 63.6
# True True
# Substructure search
# True
# → [[1, 2, 3], [7, 8, 9]]
# Natural-language summary (one paragraph)
# Structured Markdown report — paste into LLM, Jupyter, or save as .md
# → # Molecular Review\n## Structure\n## Physical Properties\n## Drug-likeness\n## ADMET...
# Structural diff between two molecules
=
= # {"summary": "+C7, -O2. ΔLogP +2.75 ...", "delta_mw": 66.1, ...}
# Batch processing — parallel, numpy-ready
= # (3, 2048) uint8
# One-liner DataFrame
=
For Rust and JavaScript/TypeScript examples, see the documentation.
Diagnostics
# chematic v0.4.25
# Python 3.12.x | darwin arm64
#
# Descriptor accuracy (benchmark 2026-06, v0.4.25 vs RDKit 2026.03.3):
# MW / HBA / HBD / ARC 100% (4,999-mol ChEMBL subset)
# TPSA 100% within ±0.1 Ų
# LogP (Crippen) 100%* (max Δ = 1.1×10⁻¹³)
# Num stereocenters 99.98% (legacy) / 98.7% (new CIP FindPotentialStereo)
# ...
For AI / LLM Developers
chematic ships a native MCP (Model Context Protocol) server — the first cheminformatics library with built-in AI agent integration.
// Claude Desktop (~/.config/claude/claude_desktop_config.json)
15 chemistry tools are callable from any MCP-compatible agent:
| Tool | What it does |
|---|---|
name_to_smiles |
Resolve "aspirin", "caffeine", … to SMILES via PubChem |
calc_properties |
MW, LogP, TPSA, HBA/HBD, QED, SA Score, pKa, ADMET |
smarts_match |
Substructure search |
pains_check / brenk_check |
Flag assay interference or reactive groups |
generate_3d |
3D coordinates (ETKDG + MMFF94) |
find_mcs |
Maximum common substructure |
| + 9 more | ecfp4, tanimoto, canonical_smiles, admet_profile, boiled_egg, sa_score, lipinski_check … |
Why Pure Rust?
Fast
Rust's zero-cost abstractions and ownership model eliminate overhead at the source.
chematic's ECFP4 fingerprint batch pipeline runs at 3.6 µs/mol — 5–14× faster
than RDKit's Python API on the same hardware. No GIL, no interpreter overhead, no
FFI call overhead hidden inside a _sys crate.
Safe
The entire default dependency tree contains ~6 unsafe blocks across 15,000+ lines
of Rust. No C++ heap corruptions. No segfaults from malformed SMILES input. No
platform-specific build failures from -sys crates. The compiler enforces memory
safety at every call site.
The
native-inchifeature is the single opt-in exception — it vendors the IUPAC InChI C library (v1.07.5) for bit-exact standard InChI. All other crates stay FFI-free.
Anywhere
Pure Rust compiles to wasm32-unknown-unknown natively — no Emscripten, no cmake,
no clang. The npm package @kent-tokyo/chematic is 504 KB gzip — 60× smaller
than RDKit.js. One codebase runs on Linux, macOS, Windows, and in every browser.
Benchmarks & Validation
| Metric | Result | Corpus |
|---|---|---|
| ECFP4 throughput | 3.6 µs/mol (5–14× vs RDKit) | 4,999-mol ChEMBL subset |
| HBA / HBD / aromatic ring count | 100% RDKit agreement | 4,999-mol ChEMBL subset |
| TPSA | 100% RDKit agreement within ±0.1 Ų | 4,999-mol ChEMBL subset |
| LogP (Crippen) | 100% RDKit agreement* | 4,999-mol ChEMBL subset |
| Num stereocenters | 99.98% vs legacy†; 98.7% vs new CIP | 4,999-mol ChEMBL subset |
| WASM bundle | 504 KB gzip | — |
*LogP max Δ = 1.1×10⁻¹³ across 4,999 molecules — within float64 rounding error.
†Stereocenters: 99.98% vs legacy CalcNumAtomStereoCenters (1 molecule where chematic matches FindPotentialStereo=4 and legacy under-counts at 2); 98.7% vs new-CIP FindPotentialStereo (67 cage/bridgehead molecules where both chematic and legacy correctly return fewer than the new oracle). chematic is calibrated between both extremes.
All numbers are reproducible with the scripts in this repo.
Full history → benchmarks/ · Methodology → validation/
Comparison with Other Cheminformatics Libraries
| Feature | chematic | RDKit (rdkit-sys) | OpenBabel FFI | RDKit.js (WASM) |
|---|---|---|---|---|
| C/C++ dependencies | None (default)† | Extensive C++ | Extensive C++ | C++ via Emscripten |
| WASM binary size | ~500 KB (504 KB gzip) | N/A (no WASM) | N/A (no WASM) | ~30 MB |
| Build requirement | cargo build only |
cmake + clang | cmake + clang | Emscripten SDK |
| WASM target support | Full (native) | No | No | Yes (Emscripten) |
| Python bindings | Yes (pip install chematic, PyO3) |
Yes (rdkit-sys) | Yes | No |
| Unsafe Rust | None | Extensive | Extensive | N/A |
| Feature | chematic | RDKit (rdkit-sys) | OpenBabel FFI | RDKit.js (WASM) |
|---|---|---|---|---|
| OpenSMILES parser | Full | Full | Full | Full |
| SMILES writer / canonical | Yes | Yes | Yes | Yes |
| Kekulization | 4-pass (incl. Edmonds' blossom) | Yes | Yes | Yes |
| Ring perception (SSSR) | Yes + iterative augmentation | Yes | Yes | Yes |
| SDF/MOL V2000+V3000 + SD fields | Yes | Yes | Yes | Yes |
| Tripos MOL2 format | Yes (parser + writer) | Yes | Yes | No |
| 2D depiction (SVG, CPK colors, PDF, EPS) | Yes | Yes | Yes | Yes |
| ECFP/FCFP fingerprints (2/4/6) | All variants + bitvec | Yes | Yes | Yes |
| AtomPair / Torsion / MACCS FP | Yes | Yes | Yes | Yes |
| MAP4 fingerprint | Yes (Minervini 2020) | No (external pkg) | No | No |
| Molecular descriptors | 190+ descriptor values (71 functions; MQN×42, BCUT2D, autocorr2d return multi-value arrays) | ~30 | ~20 | ~30 |
| Topological descriptors | Yes (Petitjean, Hosoya Z, ECI, Moran, Geary) | Partial | Partial | No |
| BRICS / RECAP fragmentation | Yes | Yes | No | Yes |
| Murcko scaffold | Yes | Yes | No | Yes |
| Tautomer normalisation | Yes | Yes | No | Yes |
| MCS | Yes | Yes | No | Yes |
| Stereoisomer enumeration | Yes | Yes | No | Yes |
| CIP stereo (R/S, E/Z) detail | Yes (per-atom JSON) | Yes | Yes | Yes |
Allene cumulated stereo (C=C=C) |
Yes (@/@@, round-trip stable) |
Yes | Partial | No |
| 3D coordinate generation | Yes (DG + MMFF94/DREIDING + L-BFGS) | Yes (ETKDG) | Yes | Yes |
| 3D shape descriptors (PMI/NPR/USR/…) | Yes | Yes | No | Yes |
| 3D GETAWAY descriptors (HATS-matrix) | Yes (19-dim; whim_getaway_combined 29-dim) |
Yes | No | No |
| MMFF94 force field (all 7 energy terms) | Yes | Yes | Yes | No |
| UFF force field (metals, organometallics) | Yes | No | Yes | No |
| AutoDock PDBQT format (parse + write) | Yes (docking pipeline ready) | Via Python API | Yes | No |
| SDF with partial charges | Yes (write_sdf_with_charges) |
Yes | Yes | No |
| MaxMin / Butina diversity picking | Yes | Yes | No | No |
| Reaction SMILES/SMIRKS | Yes | Yes | Yes | Yes |
| InChI / InChIKey | Yes — pure-Rust + IUPAC-exact via native-inchi |
C lib required | C lib required | C lib required |
| pKa prediction | Yes (15 SMARTS rules) | No | No | No |
| ADMET profile (BBB/Caco-2/hERG/CYP3A4) | Yes + BOILED-Egg | Partial | No | Partial |
| MCP server (AI agent API) | Yes — 15 tools incl. Name→SMILES | No | No | No |
| IUPAC name generation | Yes (25+ classes) | No | No | Partial |
| Name → SMILES (PubChem proxy) | Yes (name_to_smiles MCP tool) |
No | No | No |
| Maintenance (2026) | Active | Active | Minimal | Active |
† Default build only. The optional native-inchi feature adds a C-compiler dependency for the vendored IUPAC InChI C library (v1.07.5). All other crates remain FFI-free.
JavaScript / TypeScript (WebAssembly)
504 KB gzip — 60× smaller than RDKit.js. No Emscripten, no cmake. Drop-in for browser or Node.js.
import init from '@kent-tokyo/chematic';
await ;
const mol = ; // aspirin
console.log;
// All descriptors as a JSON object
const desc = JSON.;
// Fingerprint similarity
const caffeine = ;
console.log; // 0.26
// 3D coordinates, stereoisomers, diversity picking
const pdb = ;
const isomers = JSON.;
const picks = JSON.;
130+ exported functions cover descriptors, fingerprints, 3D geometry, reactions, diversity picking, and SDF round-trips. See the full WASM API reference for all exports.
Crate Reference
| Crate | Description | Tests |
|---|---|---|
chematic-core |
Atom, Bond, Molecule, Element, kekulization (no deps); mutable add/remove_atom/bond, fragments(), is_connected(), formula_with_isotopes, validate_valence; StereoGroup/StereoGroupKind |
69 |
chematic-smiles |
OpenSMILES parser, writer, canonical SMILES; stereo parity correction (pre-solves RDKit #8775 — @/@@ auto-flipped on odd permutations); allene cumulated double bond stereo (C=C=C @/@@, round-trip stable) |
48 |
chematic-perception |
SSSR, Hückel aromaticity + antiaromaticity (4n+2 rule), apply_aromaticity, aromatize/kekulize_inplace, assign_stereo_from_2d, assign_ez_from_2d, cip_ez_descriptor; zero-order/dative bonds excluded from ring perception |
34 |
chematic-mol |
MOL/SDF V2000+V3000 (R/W with 2D coords, +partial charge writing), CML (R/W), CDXML (R); SdfRecord with coords+props; MDL RXN R/W; V3000 stereo-group COLLECTION R/W; AutoDock PDBQT (parse + write); ChemicalJSON (parse_cjson/write_cjson, Avogadro/MolSSI format) |
31 |
chematic-depict |
2D SVG (CPK colors, highlighting, grid), DepictData, detect_crossings, render_svg_with_metadata, reaction SVG; PDF output (depict_pdf/depict_pdf_opts via svg2pdf); EPS output (depict_eps/depict_eps_opts, pure Rust); tiny_skia PNG is optional png feature (default on, disabled for WASM) |
28 |
chematic-chem |
190+ descriptors, tautomers, scaffold, BRICS, QED, standardize, CIP; pKa prediction (15 SMARTS rules); ADMET profile (BBB/Caco-2/hERG/CYP3A4); HBA 100% RDKit agreement (4 999 / 4 999 mol benchmark); TPSA 100% ±0.1 Ų / LogP 100%* / HBD 100% / stereocenters 99.98% (legacy) / 98.7% (new CIP) vs RDKit (4,999-mol ChEMBL); topological descriptors (petitjean_index, graph_diameter, graph_radius, graph_eccentricities, eccentric_connectivity_index, hosoya_index, moran_autocorr, geary_autocorr); schultz_mti, gutman_mti, vabc (Bondi radii vdW volume), gravitational_index; clean_stereo_groups() in standardize |
211 |
chematic-fp |
ECFP2/4/6, FCFP4/6, MACCS, TopoPF, AtomPair, Torsion, Layered, Pattern, Pharmacophore, Reaction, MAP4 (Minervini 2020, not in RDKit) — Tanimoto/Dice; bulk similarity | 87 |
chematic-ff |
MMFF94 all 7 terms (Halgren 1996): Bond/Angle/Torsion/vdW/Elec + OOP (117 entries) + Stretch-Bend (282 entries); steepest-descent + L-BFGS optimizer, torsion scan, energy breakdown; DREIDING typing; UFF (metals/organometallics: Zn, Fe, Cu, …) | 51 |
chematic-smarts |
SMARTS, VF2, MCS with chirality matching; SmartsCache (LRU compilation cache, 5–20×); named_pattern() library (20 functional group patterns); atom map :N in SMARTS ([O;D1;H0:3] — stored as metadata, not a match criterion); [kN] ring-size primitive; VF2 early-exit when query > target atom count; find_matches_with_rings — share SSSR across multi-pattern batches |
142 |
chematic-3d |
3D coordinate generation, distance geometry constraints, ETKDG KB (40 torsion patterns, adaptive noise), force-field minimization, shape descriptors, ConformerEnsemble with RMSD pruning, PDB/XYZ; GETAWAY HATS-matrix (full 19-dim implementation); whim_getaway_combined() now 29-dim |
45 |
chematic-rxn |
Reaction SMILES/SMIRKS, run_reactants/run_reactants_strict; retro_disconnect() — 60 retro-SMIRKS templates (AmideBond/Ester/Ether/CNBond/CCBond/CSBond) + SA Score ranking; parity-aware @/@@ SMIRKS stereo filtering; E/Z double-bond stereo filtering in run_reactants (ez_stereo_outward, smirks_ez_stereo_ok) |
25 |
chematic-inchi |
InChI/InChIKey: pure-Rust approximation (WASM) + IUPAC-standard via native-inchi feature (vendored C lib 1.07.5, bit-exact); parse_inchi reader |
28 (+16*) |
chematic-wasm |
130+ WASM exports — npm: @kent-tokyo/chematic v0.4.18 (~500 KB, 504 KB gzip); pKa/ADMET/BBB/Caco-2/hERG/CYP3A4; smiles_to_pdbqt, minimize_uff_json |
209 |
chematic-iupac |
Local IUPAC name generation — 25+ compound classes: alkanes, cycloalkanes, alkenes/alkynes, alcohols, amines, halides, aldehydes, ketones, acids, esters, amides, piperidine, morpholine, piperazine, naphthalene, sulfides | 45 |
chematic-mcp |
MCP (Model Context Protocol) server — AI agent integration; 15 tools: parse_smiles, calc_properties, ecfp4, tanimoto, smarts_match, canonical_smiles, find_mcs, generate_3d, pains_check, brenk_check, sa_score, admet_profile, boiled_egg, lipinski_check, name_to_smiles | 28 |
chematic-py |
PyO3 Python bindings (pip install chematic); 300+ API endpoints: from_smiles(), Mol.descriptors(), Mol.minimize_dreiding(), from_cxsmiles(), from_rxn_file()/to_rxn_file(), parse_sdf_with_coords(), Mol.ring_families(), tanimoto_matrix(), iter_sdf(), SimilarityIndex; mol.to_pdf()/mol.to_eps() (depict); from_cjson()/mol.to_cjson() (ChemicalJSON); mol.schultz_mti, mol.gutman_mti, mol.vabc, mol.gravitational_index; bulk.substructure_match(smarts, mols) (parallel VF2 on pre-parsed Mol objects); mol.describe() (LLM/MCP-ready natural-language summary); mol.diff(other) (element + descriptor diff); Sprint 18–27 coverage |
300+ |
chematic-ewald |
PME Ewald summation, B-spline interpolation (cubic, phase-corrected) | 12 |
chematic |
Umbrella crate with feature flags (all sub-crates, incl. iupac, inchi) |
1 |
cargo test --workspace --lib --quiet # 211 tests, all passing
cargo test -p chematic-inchi --features native-inchi --test standard_inchi # +16 IUPAC-exact InChI tests
Recent Development (v0.4.x Era)
v0.4.19 (2026-06-23): PDF/EPS output, ChemicalJSON, new descriptors, WASM −38.5%
chematic-depict:depict_pdf()/depict_eps()— PDF and EPS output; pure Rust, no external toolschematic-mol: ChemicalJSON —parse_cjson()/write_cjson()for Avogadro2 / MolSSI interopchematic-chem: 4 new descriptors —schultz_mti(),gutman_mti(),vabc()(Bondi vdW volume),gravitational_index()chematic-3d: Spectrophores 3D fingerprints (pharmacophore shell encoding)chematic-py:mol.to_pdf(),mol.to_eps(),mol.to_cjson(),from_cjson();bulk.substructure_match(smarts, mols)parallel VF2;estate_all()andring_bundlein bulk- WASM bundle: 819 → 504 KB gzip (−38.5%) —
tiny_skiamade optional, inline SHA-256,opt-level="z" lto=true codegen-units=1
v0.4.18 (2026-06-23): Python API expansion + benchmark docs
chematic-py: Jupyter auto-display — writingmolin a cell renders 2D structure via_repr_svg_();mol.has_substructure(smarts),mol.find_matches(smarts);from_smiles_list(),descriptors_df()chematic-chem:chi_all()— all 10 Hall-Kier connectivity indices in a single pass;cns_mpo_from_parts();pains_passes_and_matches()/brenk_passes_and_matches()— combined pass/match in one scan- Docs: benchmark page added (ECFP4 5–14× vs RDKit, 100% descriptor accuracy on 4,999-mol ChEMBL corpus)
v0.4.16–v0.4.17 (2026-06-22–23): SSSR sharing performance sprint
chematic-smarts:find_matches_with_rings()— share a pre-computedRingSetacross all patterns in a batchchematic-chem: Crippen 117 SSSR → 1 perlogp_crippencall; PAINS ~480 → 1; QED 113 → 1; pKa 42 → 1; newlogp_and_mr(),logd_from_logp(),pka_both()to avoid redundant passeschematic-fp: MHFP incremental BFS — 3N → N BFS operations per molecule at radius=2
v0.4.15 (2026-06-21): TPSA calibration + E/Z stereo in reactions
chematic-chem: TPSA ±0.1 Ų calibration sprint — HBA 100%, HBD 100%, aromatic ring count 100% on 4,999-mol ChEMBL subset; TPSA 86.7% → 93.3% (4,999-mol), 100% on 175-mol drug-like setchematic-rxn: E/Z double-bond stereo filtering inrun_reactants— SMIRKS//\geometry matching viasmirks_ez_stereo_ok()/ez_stereo_outward()
v0.4.14 (2026-06-21): Topological descriptors + stereo correctness
chematic-chem: 8 topological descriptors —petitjean_index(),graph_eccentricities(),graph_diameter(),graph_radius(),eccentric_connectivity_index(),hosoya_index(),moran_autocorr(),geary_autocorr()chematic-3d: GETAWAY HATS-matrix (19-dim);whim_getaway_combined()now 29-dimchematic-smiles: allene cumulated stereoC=C=C@/@@— round-trip stablechematic-smarts:[kN]ring-size primitive; VF2 early-exit when query > target atom countchematic-rxn: parity-aware SMIRKS chirality matching; product bracket cleanup ([O:1]→O)chematic-perception: zero-order/dative bonds excluded from SSSR;count_aromatic_rings()handles Kekulé input
v0.4.13 (2026-06-21): Template retrosynthesis + descriptor fixes
chematic-rxn:retro_disconnect()— 60 retro-SMIRKS templates (AmideBond / Ester / Ether / CNBond / CCBond / CSBond) with SA Score ranking; Pythonmol.retro_disconnect(reaction_class=...)chematic-3d: ETKDG torsion KB 28 → 40 patterns; adaptive bond-flexibility noise scalingchematic-chem:hbd_count()now includes S-H (thiol); TPSA nitro-N / aromatic oxide bridge / Kekulé-N corrections
v0.4.9–v0.4.12 (2026-06-19–21): AutoDock, UFF, SMARTS atom-map, ring augmentation
chematic-mol: AutoDock PDBQT parse/write;write_sdf_with_chargeschematic-ff: UFF force field for metals/organometallics (Zn, Fe, Cu, …)chematic-smarts: atom map:Nin SMARTS ([O;D1;H0:3]— stored as metadata)chematic-perception: iterativeaugmented_ring_setfor fused polycyclic aromatic ring counting (222/222 bench5k fixes)- MCP: 15th tool
name_to_smilesvia PubChem REST proxy
v0.4.5–v0.4.7 (2026-06-19): Kekulization blossom + BOILED-Egg + InChI E/Z
- Edmonds' blossom algorithm for non-bipartite aromatic graphs (128→2 failures)
- InChI
/bE/Z layer, 6 new MCP tools, BOILED-Egg descriptor + Python/WASM bindings
v0.4.0–v0.4.4 (2026-06-17–18): PyO3 Python bindings + native-inchi
chematic-py: PyO3/maturin bindings —from_smiles(),Mol.aromatic_ring_count,Mol.descriptors()native-inchifeature: IUPAC-exact InChI via vendored C lib v1.07.5- HBA rewrite: 99.98% agreement with RDKit (4,999-mol ChEMBL benchmark)
Full changelog: CHANGELOG.md
Built with chematic
Using chematic in a project? Share it in Discussions or open a PR to add it here.
Reliability by Feature
Not all features have the same validation depth. This table tells you what to trust.
| Feature | Status | Validation |
|---|---|---|
| SMILES parse / write | Stable | 4,999-mol ChEMBL comparison; OpenSMILES corpus |
| MW / HBA / HBD | Stable | 100% RDKit agreement on 4,999 mol |
| TPSA | Stable | 100% on 175-mol drug-like set; 99.7% on 4,999-mol ChEMBL subset (±0.1 Ų) |
| LogP (Crippen) | Stable | 100% on 4,999-mol corpus (±0.01); ~99% on 175-mol drug-like set (±0.3) |
| ECFP4 / MACCS fingerprints | Stable | RDKit comparison + benchmark |
| Tanimoto similarity | Stable | RDKit comparison |
| SDF / MOL V2000/V3000 I/O | Stable | round-trip tests |
| Substructure search (SMARTS / VF2) | Stable | internal test suite |
| PAINS / Brenk filters | Stable | rule-based; matches public SMARTS databases |
| 2D SVG depiction | Stable | visual spot-checks; not publication-quality |
| 3D conformer (DG + MMFF94) | Experimental | reasonable geometry; not equivalent to RDKit ETKDGv3 quality |
| pKa prediction | Rule-based screening | 15 SMARTS rules; early triage only, not clinical |
| ADMET (BBB / Caco-2 / hERG / CYP3A4) | Rule-based screening | empirical models; directional, not validated on clinical endpoints |
| IUPAC name generation | Partial | common compound classes; complex structures may fail |
| Pure-Rust InChI | Approximate | enable native-inchi feature for bit-exact IUPAC InChI |
Full benchmark methodology → validation/ · History → benchmarks/
Known Limitations
- Aromaticity model: chematic applies Hückel 4n+2 per SSSR ring independently; RDKit uses fused-ring electron delocalization. Visible differences in N-heterocycles (pyridone, quinolone, indolizine). Current benchmark on 4,999-mol ChEMBL subset: HBA/HBD/aromatic ring count 100%; TPSA 99.7% (±0.1 Ų); LogP 100% (±0.01).
- TPSA edge cases: remaining 0.3% discrepancy (16 of 4,999 molecules) concentrated in exotic phosphazene ring-N calibration and cyclic sulfurimide/S=N=P chemistry — not relevant for drug-like molecules.
Repository Structure
chematic/
├── Cargo.toml workspace root (v0.4.23)
├── CHANGELOG.md
├── crates/
│ ├── chematic-core/ Atom, Bond, Molecule, Element, kekulization (4-pass + blossom)
│ ├── chematic-smiles/ OpenSMILES parser/writer, canonical SMILES
│ ├── chematic-perception/ SSSR, 2-pass Hückel aromaticity, CIP stereo
│ ├── chematic-smarts/ SMARTS parser, VF2 subgraph isomorphism, MCS, LRU cache
│ ├── chematic-chem/ 190+ descriptors, pKa, ADMET, BOILED-Egg, QED, SA Score,
│ │ PAINS/Brenk filters, scaffold, standardization, BRICS/RECAP
│ ├── chematic-fp/ ECFP/FCFP, MACCS, MAP4, AtomPair, Torsion, MHFP, ERG
│ ├── chematic-ff/ MMFF94 full stack (7 terms), DREIDING, L-BFGS minimizer
│ ├── chematic-3d/ ETKDG, MD, SASA, USR shape screen, WHIM, GETAWAY, XYZ/PDB I/O
│ ├── chematic-depict/ 2D SVG rendering, grid layout, CPK colors, highlighting
│ ├── chematic-rxn/ Reaction SMILES/SMIRKS, RunReactants, RECAP/BRICS
│ ├── chematic-mol/ SDF/MOL V2000+V3000, CML, CDXML parser/writer
│ ├── chematic-inchi/ InChI/InChIKey (pure-Rust approx + IUPAC-exact via native-inchi)
│ ├── chematic-iupac/ IUPAC name generation (25+ compound classes)
│ ├── chematic-mcp/ MCP server — 15 AI-callable tools (JSON-RPC 2.0 over stdio)
│ ├── chematic-wasm/ 130+ WASM exports → npm @kent-tokyo/chematic
│ ├── chematic-py/ PyO3 Python bindings → pip install chematic
│ ├── chematic-ewald/ PME Ewald summation, B-spline interpolation
│ └── chematic/ Umbrella crate with feature flags
├── demo/ Interactive WASM playground (→ /playground/ on GitHub Pages)
│ ├── index.html
│ └── pkg/ Pre-built WASM bundle (rebuilt on each release)
└── docs/ MkDocs documentation site source
├── cookbook.md
├── getting_started/
└── api/
Development Commands
Citation
If you use chematic in academic or research work, please cite:
License
Licensed under either of Apache License 2.0 or MIT License, at your option.
If chematic saves you time, a GitHub star helps others discover it.