zinchi
A compact binary representation for InChI Keys.
This crate provides a space-efficient binary encoding for International Chemical Identifier (InChI) keys, reducing their size from the standard 27-byte ASCII representation to either 9 or 14 bytes. The implementation is based on the work by John Mayfield (NextMove Software): Data Compression of InChI Keys and 2D Coordinates.
Note: This is a personal project created for fun and to explore Rust. While it implements a real compression algorithm, it's primarily a learning exercise rather than a production-critical library.
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Usage
Parsing and Displaying InChI Keys
use InChIKey;
// Parse an InChI key from a string
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse.expect
// Convert back to string
println!;
// Access individual components
println!;
println!;
println!;
Binary Packing and Unpacking
use InChIKey;
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse?;
// Pack to binary (9 or 14 bytes)
let packed = key.packed_bytes;
println!;
// Unpack from binary
let unpacked = unpack_from?;
assert_eq!;
Working with Buffers
use InChIKey;
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse?;
// Pack into an existing buffer
let mut buffer = ;
let size = key.pack_into;
// Use only the relevant bytes
let packed_data = &buffer;
InChI Key Format
An InChI key has the format: AAAAAAAAAAAAAA-BBBBBBBBFV-P
- First block (14 chars): Encodes core molecular constitution (65 bits → 9 bytes)
- Second block (8 chars): Encodes stereochemistry and isotopes (37 bits → 5 bytes)
- Flag (1 char):
Sfor standard,Nfor non-standard - Version (1 char): Currently always
A - Protonation (1 char):
Nfor neutral, orA-Mfor protonated states
Binary Encoding
Standard InChI keys with the common second block UHFFFAOYSA (empty stereochemistry hash) are packed into just 9 bytes. All other InChI keys require 14 bytes.
This represents a 48-66% reduction in size compared to the ASCII representation.
Encoding Details
The first block (14 characters) is decoded into four 14-bit triples and one 9-bit pair, then packed into 9 bytes. The second block (8 characters) is decoded into two 14-bit triples and one 9-bit pair, then packed into 5 bytes. Additional metadata (standard flag, version, protonation) is encoded into spare bits.
Serde Support
When the serde feature is enabled, InChIKey implements Serialize and Deserialize:
use InChIKey;
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse?;
// JSON serialization (human-readable)
let json = to_string?;
assert_eq!;
// Binary serialization with bincode (compact)
let bytes = encode_to_vec?;
// Uses the 9 or 14 byte packed representation
The serialization format automatically adapts:
- Human-readable formats (JSON, YAML, etc.): Serializes as an InChIKey string
- Binary formats (bincode, etc.): Serializes using the compact 9 or 14 byte representation
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- John Mayfield and NextMove Software for the original compression algorithm
- The InChI Trust for the InChI specification