Crate zinchi

Crate zinchi 

Source
Expand description

A compact binary representation for InChI Keys.

This crate provides a space-efficient binary encoding for International Chemical Identifier (InChI) keys, reducing their size from the standard 27-byte ASCII representation to either 9 or 14 bytes. The implementation is based on the work by John Mayfield (NextMove Software): Data Compression of InChI Keys and 2D Coordinates

§InChI Key Format

An InChI key has the format: AAAAAAAAAAAAAA-BBBBBBBBFV-P

  • First block (14 chars): Encoding core molecular constitution
  • Second block (8 chars): Encoding advanced structural features whichever are applicable (stereochemistry, isotopic substitution, exact position of mobile hydrogens, metal ligation data)
  • Flag (1 char): ‘S’ for standard, ‘N’ for non-standard
  • Version (1 char): Currently always ‘A’
  • Protonation (1 char): ‘N’ for neutral, or ‘A’-‘M’ for protonated states

§Binary Encoding

Standard InChI keys with the common second block UHFFFAOYSA can be packed into just 9 bytes. All other InChI keys require 14 bytes.

§Optional Features

  • serde: Enable serialization/deserialization support. When enabled, InChIKey serializes as a string in human-readable formats (JSON, YAML) and uses the compact binary representation in binary formats (bincode, MessagePack).

§Example

use zinchi::InChIKey;

let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse().unwrap();
let packed = key.packed_bytes(); // 9 or 14 bytes
let unpacked = InChIKey::unpack_from(&packed).unwrap();
assert_eq!(key, unpacked);

Structs§

InChIKey
A compact binary representation of an InChI key.

Enums§

InChIKeyParseError
Error type for InChI key parsing operations.