chematic-smarts 0.1.3

SMARTS parser, VF2 subgraph isomorphism and MCS for chematic — pure-Rust RDKit alternative
Documentation

chematic

日本語

A pure-Rust cheminformatics library targeting RDKit feature parity, with no C/C++ FFI.


Design Goals

Pure Rust, zero C/C++ FFI No rdkit-sys, no openbabel bindings. Every algorithm is implemented in safe Rust.

WASM-compatible and lightweight Core crates compile to wasm32-unknown-unknown without modification. Binary size is in the hundreds of KB range, versus tens of MB for C++ FFI wrappers.

Domain-specific algorithms Rather than wrapping a generic graph library, chematic implements chemistry-specific algorithms directly: Kekulization, Hückel aromaticity, CIP stereochemistry, SSSR ring perception.

Reproducible and deterministic Fingerprints use FNV-1a hashing with a fixed invariant ordering. Given the same SMILES input, the same bits are always produced. No RNG, no platform-specific behavior.


Current Status

Phases 1–3 and Phase 5 (coordinate generation + file I/O) are complete. Phase 4 (MACCS, topological path, MCS, tautomer normalization) is also done. 332 tests, all passing.

Crate Description Tests
chematic-core Atom, Bond, Molecule, Element, kekulization (no deps) 30
chematic-smiles OpenSMILES parser, writer, canonical SMILES 50
chematic-perception SSSR (Balducci-Pearlman), Huckel aromaticity 14
chematic-mol MOL/SDF V2000+V3000 parser and writer 36
chematic-depict 2D SVG depiction (ring+chain templates) 14
chematic-chem Descriptors, standardization (salt strip, charge), Murcko scaffold, CIP 67
chematic-fp ECFP4/ECFP6, MACCS 166-bit keys, topological path FP, Tanimoto/Dice 31
chematic-smarts SMARTS parser, VF2 subgraph isomorphism, MCS 46
chematic-3d 3D coordinate generation, PDB/XYZ file formats 15
chematic-rxn Reaction SMILES parser and writer 15
chematic Umbrella crate with feature flags (all sub-crates) 1
cargo test --workspace   # 332 tests, all passing

Quick Start

Using the umbrella crate

# Cargo.toml
[dependencies]
chematic = { git = "https://github.com/kent-tokyo/chematic", features = ["smiles", "fp"] }
// Using the umbrella crate
use chematic::smiles::{parse, canonical_smiles};
use chematic::fp::ecfp4;
// chematic = { version = "0.1.0", features = ["smiles", "fp"] }

Using individual crates

# Cargo.toml
[dependencies]
chematic-smiles     = { git = "https://github.com/kent-tokyo/chematic" }
chematic-perception = { git = "https://github.com/kent-tokyo/chematic" }
chematic-fp         = { git = "https://github.com/kent-tokyo/chematic" }
use chematic_smiles::{parse, canonical_smiles};
use chematic_perception::{find_sssr, assign_aromaticity};
use chematic_fp::{ecfp4, tanimoto_ecfp4};

fn main() {
    let benzene = parse("c1ccccc1").unwrap();
    let toluene = parse("Cc1ccccc1").unwrap();

    // Ring and aromaticity perception
    let rings = find_sssr(&benzene);
    println!("rings: {}", rings.ring_count()); // 1
    let arom = assign_aromaticity(&benzene);
    println!("aromatic atoms: {}", arom.aromatic_atom_count()); // 6

    // Fingerprint similarity
    let sim = tanimoto_ecfp4(&benzene, &toluene);
    println!("Tanimoto(benzene, toluene): {sim:.3}"); // ~0.5

    // Canonical SMILES
    println!("{}", canonical_smiles(&benzene)); // c1ccccc1
}

SMARTS substructure search

use chematic_smiles::parse;
use chematic_smarts::{parse_smarts, find_matches};

let mol = parse("CC(=O)Oc1ccccc1C(=O)O").unwrap(); // aspirin
let query = parse_smarts("C=O").unwrap();
let matches = find_matches(&query, &mol);
println!("C=O groups: {}", matches.len()); // 2

Molecular descriptors

use chematic_smiles::parse;
use chematic_chem::{molecular_weight, tpsa, lipinski_passes};

let aspirin = parse("CC(=O)Oc1ccccc1C(=O)O").unwrap();
println!("MW:    {:.2}", molecular_weight(&aspirin)); // ~180.16
println!("TPSA:  {:.2}", tpsa(&aspirin));             // ~63.6
println!("Lipinski: {}", lipinski_passes(&aspirin));  // true

2D depiction

use chematic_smiles::parse;
use chematic_depict::depict_svg;

let caffeine = parse("Cn1cnc2c1c(=O)n(c(=O)n2C)C").unwrap();
let svg = depict_svg(&caffeine);
std::fs::write("caffeine.svg", svg).unwrap();

Comparison with Other Cheminformatics Libraries

Feature chematic RDKit (rdkit-sys) OpenBabel FFI chemcore / purr
Language Pure Rust Rust + C++ FFI Rust + C++ FFI Pure Rust
WASM target Yes No No Partial
Binary size (core) ~500 KB ~50 MB ~20 MB ~200 KB
OpenSMILES parser Full Full Full Partial
SMILES writer Yes Yes Yes No
Canonical SMILES Yes Yes Yes No
Kekulization Yes Yes Yes No
Aromaticity perception Yes (Huckel) Yes Yes Partial
Ring perception (SSSR) Yes Yes Yes No
SDF/MOL V2000 Yes Yes Yes No
SDF/MOL V3000 Yes Yes Yes No
2D depiction (SVG) Yes Yes Yes No
ECFP fingerprints Yes (ECFP4/6) Yes Yes No
SMARTS / substructure search Yes (VF2) Yes Yes No
Molecular descriptors Yes (MW/LogP/TPSA/...) Yes Yes No
3D coordinate generation Yes (rule-based) Yes (ETKDG) Yes No
PDB/XYZ file formats Yes Yes Yes No
CIP stereochemistry (R/S) Yes (R/S, E/Z) Yes Yes No
MACCS fingerprints Yes (166-bit keys) Yes Yes No
Force field minimization Yes (rule-based) Yes (UFF/MMFF) Yes No
Reaction SMILES/SMIRKS Yes Yes Yes No
Unsafe Rust None Extensive Extensive None
Maintenance (2026) Active Active Minimal Archived

Notes:

  • "chematic" column reflects current implementation plus the final planned state.
  • Binary sizes are approximate and depend on enabled features.
  • chemcore and purr are archived; chematic supersedes their scope.

Roadmap

Phase 1 — Foundation (complete)

Core types, OpenSMILES parse/write, Kekulization, canonical SMILES. 80 tests.

Phase 2 — Molecular Perception (complete)

SSSR, Huckel aromaticity, SDF/MOL V2000+V3000, 2D SVG depiction. 63 tests.

Phase 3 — Chemical Intelligence (complete)

Descriptors (MW, LogP, TPSA, Lipinski), ECFP4/6 fingerprints, SMARTS+VF2, molecular standardization (salt stripping, charge neutralization), Murcko scaffold, CIP R/S and E/Z stereochemistry assignment.

Phase 4 — Similarity and Search (complete)

MACCS 166-bit structural keys ✓, topological path fingerprints ✓, MCS ✓, tautomer normalization ✓.

Phase 5 — 3D Chemistry (partially complete)

Rule-based 3D coordinate generation, PDB/XYZ formats. Remaining: UFF force field minimization.

Phase 6 — RDKit Parity (partially complete)

Reaction SMILES/SMIRKS (chematic-rxn) ✓, umbrella crate with feature flags (chematic) ✓. Remaining: WASM package (npm: chematic), ChEMBL-scale validation.

See tasks/todo.md for the detailed per-task breakdown.


Repository Structure

chematic/
├── Cargo.toml               workspace root
├── CHANGELOG.md             version history
├── crates/
│   ├── chematic-core/       Atom, Bond, Molecule, Element, kekulization
│   ├── chematic-smiles/     OpenSMILES parser, writer, canonical SMILES
│   ├── chematic-perception/ SSSR ring perception, Huckel aromaticity
│   ├── chematic-mol/        MOL/SDF V2000+V3000 parser and writer
│   ├── chematic-depict/     2D SVG depiction engine
│   ├── chematic-chem/       Molecular descriptors, standardization, scaffold
│   ├── chematic-fp/         ECFP4/6 fingerprints, Tanimoto/Dice similarity
│   ├── chematic-smarts/     SMARTS parser + VF2 subgraph isomorphism, MCS
│   ├── chematic-3d/         3D coordinate generation, PDB/XYZ formats
│   ├── chematic-rxn/        Reaction SMILES parser and writer
│   └── chematic/            Umbrella crate with feature flags
└── tasks/
    ├── todo.md              full roadmap checklist (Japanese)
    └── lessons.md           development lessons learned

Development Commands

cargo build --workspace      # build all crates
cargo test --workspace       # run all tests (332+)
cargo check --workspace      # type-check without building
cargo clippy --workspace     # lints

License

Licensed under either of Apache License 2.0 or MIT License, at your option.