dreid-typer 0.2.0

A pure Rust library for DREIDING atom typing and molecular topology perception.
Documentation
# Phase 1: Chemical Perception

The Chemical Perception phase is the first and most critical stage in the `dreid-typer` pipeline. Its purpose is to transform a simple, user-provided `MolecularGraph`—which only describes raw connectivity—into a chemically-aware `ProcessingGraph`. This is achieved by systematically inferring a rich set of properties for each atom, effectively teaching the library the "chemistry" of the molecule.

This phase employs a powerful hybrid paradigm, combining **general chemical algorithms** with a targeted **expert template system**. The entire process is orchestrated by the `processor::perceive` function.

## The Perception Pipeline

The perception process is a carefully ordered sequence of steps. Each step builds upon the information generated by the previous ones, culminating in a fully annotated `ProcessingGraph`. The order of these operations is crucial for ensuring correctness.

The diagram below details the flow of information within this phase, showing which properties are determined at each step.

```mermaid
graph TD
    subgraph Input
        A["<b>MolecularGraph</b> (Connectivity)"]
    end

    subgraph "Perception Pipeline (<b>processor::perceive</b>)"
        B["<b>Step 1: Electron Counting</b><br><i>perceive_electron_counts</i>"]
        B -- "Computes..." --> B_Out("
            - <b>valence_electrons</b>
            - <b>bonding_electrons</b>
            - <b>lone_pairs</b>
        ")

        C["<b>Step 2: Expert Template Matching</b><br><i>apply_functional_group_templates</i>"]
        C -- "Identifies special functional groups<br>(amides, carboxylates, etc.) and sets..." --> C_Out("
            - <b>hybridization</b> (forced)
            - <b>steric_number</b> (forced)
            - <b>is_aromatic</b> (for some templates)
            - <b>perception_source</b> = <b>Template</b>
        ")

        D["<b>Step 3: Ring System Perception</b><br><i>perceive_rings & apply_ring_annotations</i>"]
        D -- "Finds all smallest rings and sets..." --> D_Out("
            - <b>is_in_ring</b>
            - <b>smallest_ring_size</b>
        ")

        E["<b>Step 4: Generic Properties Perception</b><br><i>perceive_generic_properties</i>"]
        subgraph " "
            direction LR
            E_Aro["<b>4a. Aromaticity</b><br> (Hückel's Rule)"]
            E_Hyb["<b>4b. Hybridization</b><br> (VSEPR Theory)"]
        end

        E_Aro -- "Sets <b>is_aromatic</b> flag,<br>which overrides VSEPR" --> E_Hyb
        E_Hyb -- "Determines remaining properties..." --> E_Out("
            - <b>hybridization</b> (generic)
            - <b>steric_number</b> (generic)
            - <b>perception_source</b> = <b>Generic</b>
        ")

    end

    subgraph Output
        F["<b>ProcessingGraph</b> (Chemically-Aware)"]
    end

    A --> B --> C --> D --> E --> F
```

### Step 1: Electron Counting

- **Function:** `processor::perception::perceive_electron_counts`
- **Purpose:** To establish the fundamental electronic configuration of each atom based on valence bond theory. This is the bedrock upon which all subsequent chemical judgments are made.
- **Algorithm:**
  1. **Valence Electrons:** The number of valence electrons is determined based on the atom's element and its column in the periodic table.
  2. **Bonding Electrons:** The number of electrons participating in bonds is calculated by summing the contributions from each bond connected to the atom (e.g., 1 for single, 2 for double). Aromatic bonds are treated as contributing 1 electron in this count, consistent with their delocalized nature.
  3. **Lone Pairs:** The number of lone pairs is calculated using the formula:
     `lone_pairs = (valence_electrons - bonding_electrons - formal_charge) / 2`
     The result is floored to the nearest integer, assuming any remaining single electron is not a pair.

At the end of this step, each `AtomView` has its core electronic properties populated.

### Step 2: Expert Template Matching (The "Expert System")

- **Function:** `processor::templates::apply_functional_group_templates`
- **Purpose:** To correctly identify specific functional groups whose electronic structures are exceptions to simple, local chemical rules. For example, the nitrogen in an amide is planar (`sp2`) despite having a steric number that might suggest `sp3`.
- **Design Rationale (Why this is crucial):** A purely generic algorithm based on local connectivity (like VSEPR theory) cannot capture the non-local electronic effects of resonance that define many important functional groups. This "expert system" encodes this essential chemical knowledge.
- **Algorithm:**
  1. The system iterates through a predefined list of functional group templates (e.g., carboxylate, guanidinium, nitro group).
  2. For each template, it attempts to find a matching subgraph in the molecule. The matching is based on element types, bond orders, and formal charges.
  3. When a match is found, the system applies predefined actions to the matched atoms, directly setting their `hybridization`, `steric_number`, and sometimes `is_aromatic` status.
  4. Crucially, it also flags these atoms with `perception_source = Some(PerceptionSource::Template)`. This "locks" their properties, preventing them from being overridden by the generic algorithms in later steps.

### Step 3: Ring System Perception

- **Function:** `processor::perception::perceive_rings` and `apply_ring_annotations`
- **Purpose:** To identify all cyclic structures within the molecule, as ring membership is a key determinant for aromaticity and certain atom types.
- **Algorithm:**
  The library employs a path-based search algorithm to find the Smallest Set of Smallest Rings (SSSR). This algorithm is efficient and sufficient for the vast majority of molecules encountered in chemistry and biology.
- **Output:** The `is_in_ring` and `smallest_ring_size` fields of the relevant `AtomView`s are populated.

### Step 4: Generic Properties Perception

This final step handles all atoms that were not explicitly typed by the expert template system.

- **Function:** `processor::perception::perceive_generic_properties`

#### 4a. Aromaticity Perception

- **Purpose:** To identify aromatic systems based on Hückel's rule.
- **Algorithm:**
  1. The system iterates through each ring found in Step 3.
  2. For each ring, it calculates the total number of π-electrons. An atom's contribution is determined by its local environment:
     - An atom with a π-bond within the ring contributes **1** π-electron.
     - A heteroatom with a lone pair and no exocyclic π-bonds (like the nitrogen in pyrrole or oxygen in furan) contributes **2** π-electrons.
  3. If the total number of π-electrons is `4n + 2` (where `n` is a non-negative integer) and all atoms in the ring are planar-compatible, the ring is considered aromatic.
  4. The `is_aromatic` flag is set to `true` for all atoms in the identified aromatic ring.

#### 4b. Generic Hybridization Perception

- **Purpose:** To assign a hybridization state to all remaining atoms based on Valence Shell Electron Pair Repulsion (VSEPR) theory.
- **Algorithm:**
  1. The system iterates through all atoms where `perception_source` is not `Template`.
  2. **Aromatic Override:** If an atom's `is_aromatic` flag is `true`, its hybridization is immediately set to `Resonant`.
  3. **VSEPR Calculation:** Otherwise, the `steric_number` is calculated as `degree + lone_pairs`.
  4. **Mapping:** The hybridization is assigned based on the `steric_number`:
     - `steric_number = 4` -> `Hybridization::SP3`
     - `steric_number = 3` -> `Hybridization::SP2`
     - `steric_number = 2` -> `Hybridization::SP`
     - Certain elements like halogens or alkali metals are assigned `Hybridization::None`.
  5. The `perception_source` for these atoms is set to `Some(PerceptionSource::Generic)`.

At the conclusion of this phase, every atom in the `ProcessingGraph` has a fully defined set of chemical properties, making it ready for the Typing Engine.