mrrc 0.7.6

A Rust library for reading, writing, and manipulating MARC bibliographic records in ISO 2709 binary format
Documentation
# Project Layout

This document describes how the MRRC codebase is organized and how its
components relate to each other. For implementation details (GIL release
strategy, parser internals, concurrency model), see
[Architecture](architecture.md).

## How the Pieces Fit Together

MRRC is a **Rust library with Python bindings**. The core parsing,
serialization, and record manipulation logic lives in pure Rust. A
separate crate wraps that logic in Python-callable functions and classes
using [PyO3](https://pyo3.rs/), and [maturin](https://www.maturin.rs/)
packages everything into a Python wheel that users install with `pip`.

```
┌─────────────────────────────────────────────────────┐
│  Python user code                                   │
│    import mrrc                                      │
│    reader = mrrc.MARCReader("file.mrc")             │
├─────────────────────────────────────────────────────┤
│  mrrc/  (Python package)                            │
│    __init__.py  — re-exports, pymarc-compat wrappers│
│    _mrrc.pyi    — type stubs for IDE support        │
│    formats/     — pure-Python format helpers         │
├─────────────────────────────────────────────────────┤
│  src-python/  (PyO3 bindings crate, "mrrc-python")  │
│    Compiled to _mrrc.cpython-*.so by maturin        │
│    Wraps Rust types as Python classes                │
│    Manages GIL release during parsing               │
├─────────────────────────────────────────────────────┤
│  src/  (core Rust library crate, "mrrc")            │
│    Pure Rust, no Python dependency                  │
│    Parsing, serialization, queries, encoding        │
└─────────────────────────────────────────────────────┘
```

When a user calls `mrrc.MARCReader("file.mrc")`, the call flows:

1. `mrrc/__init__.py` imports from the compiled `_mrrc` extension module
2. `_mrrc` is the shared library built from `src-python/`, linked against
   the core `mrrc` crate
3. The core crate does the actual parsing work

## Directory Structure

### `src/` — Core Rust Library

The `mrrc` crate. Pure Rust with no Python dependency. This is where
parsing, serialization, encoding, and query logic lives.

Key modules:

| Module | Purpose |
|--------|---------|
| `reader.rs` | ISO 2709 binary record reader |
| `writer.rs` | ISO 2709 binary record writer |
| `record.rs` | Record, Field, Subfield data structures |
| `leader.rs` | MARC leader (24-byte header) parsing |
| `encoding.rs` | MARC-8 ↔ UTF-8 character conversion |
| `field_query.rs` | Query DSL for searching fields |
| `json.rs`, `xml.rs`, `marcjson.rs` | Serialization formats |
| `dublin_core.rs`, `mods.rs` | Metadata crosswalk formats |
| `bibframe/` | BIBFRAME RDF conversion |
| `csv.rs` | CSV export |
| `boundary_scanner.rs` | Fast record boundary detection |
| `rayon_parser_pool.rs` | Parallel parsing with Rayon |
| `producer_consumer_pipeline.rs` | Threaded producer-consumer reader |

This crate is also usable as a standalone Rust library (`cargo add mrrc`),
independent of Python.

### `src-python/` — PyO3 Bindings

The `mrrc-python` crate. Depends on the core `mrrc` crate (via
`mrrc = { path = ".." }` in its `Cargo.toml`) and on PyO3 for the
Python ↔ Rust bridge.

This crate:

- Wraps Rust types as Python classes (`PyRecord`, `PyField`, `PyMARCReader`, etc.)
- Implements the three-phase GIL release pattern (see [Architecture]architecture.md)
- Handles type detection for `MARCReader` inputs (file paths, bytes, file objects)
- Compiles to a `cdylib` shared library (`_mrrc.cpython-*.so`)

Key modules:

| Module | Purpose |
|--------|---------|
| `lib.rs` | PyO3 module definition, exports all Python-visible types |
| `unified_reader.rs` | `MARCReader` — dispatches to the right backend |
| `backend.rs` | `ReaderBackend` enum (RustFile, Cursor, PythonFile) |
| `wrappers.rs` | `Record`, `Field`, `Leader` Python wrappers |
| `writers.rs` | `MARCWriter` Python wrapper |
| `query.rs` | Query DSL Python interface |
| `formats.rs` | Format conversion function exports |
| `bibframe.rs` | BIBFRAME Python wrappers |

### `mrrc/` — Python Package

The installable Python package. Uses maturin's
[mixed Rust/Python layout](https://www.maturin.rs/project_layout#mixed-rustpython-project):
pure Python code lives here alongside the compiled extension.

| File | Purpose |
|------|---------|
| `__init__.py` | Re-exports from `_mrrc`, adds pymarc-compatible wrappers (e.g. `Record` subclass with kwargs) |
| `_mrrc.pyi` | Type stubs so IDEs and type checkers understand the Rust extension |
| `py.typed` | PEP 561 marker — tells type checkers this package ships inline types |
| `formats/` | Pure-Python format helper classes |
| `rayon_parser_pool.py` | Python-side helpers for Rayon parallel parsing |

### `tests/` — Test Suites

```
tests/
├── *.rs              # 17 Rust integration test files (bibframe, mods, field query, etc.)
├── common/           # Shared Rust test utilities
├── data/             # MARC test fixtures (.mrc files, BIBFRAME baselines, MODS samples)
│   └── fixtures/     # Large benchmark fixtures (1k, 5k, 10k records)
└── python/           # 27 Python test files (pytest)
```

Rust unit tests live inline in `src/` (`#[cfg(test)]` modules).
Rust integration tests live in `tests/*.rs`.
Python tests live in `tests/python/` and are run with pytest.

### Other Directories

| Directory | Purpose |
|-----------|---------|
| `benches/` | Rust benchmarks (Criterion/Codspeed) |
| `examples/` | Example code in both Rust and Python |
| `scripts/` | Profiling, benchmarking, and fixture generation scripts |
| `docs/` | Documentation (mkdocs) |
| `.cargo/` | Cargo config and `check.sh` (local CI script) |
| `.githooks/` | Optional git hooks (pre-push runs check.sh) |
| `.github/workflows/` | CI workflows (lint, test, build, benchmark, etc.) |

## Build System

### How maturin builds the package

[maturin](https://www.maturin.rs/) is the build backend (declared in
`pyproject.toml`). When you run `maturin develop` or `pip install`:

1. maturin reads `pyproject.toml` to find the manifest path
   (`src-python/Cargo.toml`) and the Python package name (`mrrc._mrrc`)
2. Cargo compiles `src-python/` as a `cdylib`, linking against the core
   `mrrc` crate from `src/`
3. The resulting shared library is placed at `mrrc/_mrrc.cpython-*.so`
4. maturin bundles this `.so` with the pure Python files in `mrrc/` into
   a wheel

### Cargo workspace

The root `Cargo.toml` defines a workspace with two members:

- `.` (root) — the `mrrc` core library crate
- `src-python` — the `mrrc-python` PyO3 bindings crate

This means `cargo test`, `cargo clippy`, etc. operate on both crates.
The bindings crate depends on the core crate, so changes to `src/` are
picked up automatically when rebuilding the Python extension.

### Development build commands

```bash
# Rebuild the Python extension after Rust changes (debug, fast)
uv run maturin develop

# Rebuild with optimizations (for benchmarking)
uv run maturin develop --release

# Run all local checks (fmt, clippy, docs, audit, build, tests, ruff)
.cargo/check.sh

# Quick checks (skip docs, audit, maturin build)
.cargo/check.sh --quick
```

## Configuration Files

| File | Purpose |
|------|---------|
| `Cargo.toml` | Rust workspace + core crate config |
| `src-python/Cargo.toml` | PyO3 bindings crate config |
| `pyproject.toml` | Python package metadata, maturin settings, pytest/mypy/ruff config |
| `rustfmt.toml` | Rust formatting rules |
| `clippy.toml` | Clippy lint thresholds |
| `codecov.yml` | Code coverage settings |
| `mkdocs.yml` | Documentation site config |

## Common Development Workflows

**Adding a new Rust feature exposed to Python:**

1. Implement the feature in `src/` (core crate)
2. Write Rust unit tests inline and/or integration tests in `tests/*.rs`
3. Add PyO3 wrapper in `src-python/src/`
4. Export from `src-python/src/lib.rs`
5. Re-export from `mrrc/__init__.py`
6. Add type stub to `mrrc/_mrrc.pyi`
7. Write Python tests in `tests/python/`
8. Run `.cargo/check.sh`

**Adding a pure-Python feature (no Rust changes):**

1. Add to `mrrc/__init__.py` or a new file in `mrrc/`
2. Write Python tests in `tests/python/`
3. Run `.cargo/check.sh --quick`

**Changing only Rust internals (no API change):**

1. Edit files in `src/`
2. Run `cargo test` for fast feedback
3. Run `.cargo/check.sh` before pushing