bigsmiles
A BigSMILES parser for polymer and macromolecule notation in Rust.
What is BigSMILES?
BigSMILES is a line notation for polymers and macromolecules.
It extends SMILES with stochastic objects {...} that describe repeat units and end groups.
{[$]CC[$]} → polyethylene
{[$]CC[$],[$]CC(C)[$]} → ethylene-propylene copolymer
CC{[$]CC[$]}CC → α,ω-dimethyl polyethylene
{[>][<]CC(C)[>][<]} → isotactic polypropylene
{[$]CC[$];[$]CCO[$]} → polyethylene with hydroxyl end group
Installation
[]
= "0.1"
Usage
Basic parsing
use parse;
let pe = parse.unwrap; // polyethylene
let ps = parse.unwrap; // polystyrene
let copo = parse.unwrap; // copolymer
// Display produces the canonical BigSMILES string
println!; // {[$]CC[$]}
Inspecting the AST
use ;
let result = parse.unwrap;
for seg in &result.segments
Bond descriptors
BigSMILES uses bond descriptors to define how repeat units connect:
| Descriptor | Meaning |
|---|---|
[] |
No-bond terminal (open end group) |
[$] |
Non-directional (connects to any [$]) |
[<] |
Head (connects to [>]) |
[>] |
Tail (connects to [<]) |
[$1] |
Indexed non-directional (connects to same index) |
[<2] |
Indexed head |
[>2] |
Indexed tail |
Connection atoms
Each stochastic fragment records which atom index bonds to each descriptor:
left_atom— always0(the first written atom)right_atom— the last atom on the main chain (depth 0), not counting branch atoms
For CC(C) (polypropylene): left_atom = 0 (C0), right_atom = 1 (C1, backbone).
The methyl branch C2 is not the connection atom.
Error handling
use ;
match parse
Supported BigSMILES features
| Feature | Status |
|---|---|
Stochastic objects {...} |
✅ |
Non-directional descriptors [$] |
✅ |
Directional descriptors [<] [>] |
✅ |
No-bond descriptors [] |
✅ |
Indexed descriptors [$1], [<2], [>2] |
✅ |
Multiple repeat units (copolymers) , |
✅ |
End groups ; |
✅ |
Outer terminals {[>]...[<]} |
✅ |
Surrounding SMILES CC{...}CC |
✅ |
| Full OpenSMILES inside stochastic objects | ✅ |
Connection atom tracking (left_atom, right_atom) |
✅ |
| Faithful round-trip display (topology-preserving) | ✅ |
Relationship to opensmiles
bigsmiles depends on opensmiles (also part of this workspace)
for parsing the SMILES fragments inside stochastic objects. The opensmiles crate is re-exported
as bigsmiles::opensmiles for convenience.
References
- Lin, T.-S. et al. "BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules." ACS Central Science 2019, 5, 1523–1531. https://pubs.acs.org/doi/10.1021/acscentsci.9b00476
- BigSMILES Documentation
- OpenSMILES Specification
License
MIT — see LICENSE.