bibtex-parser
BibTeX parsing for Rust and Python applications.
bibtex-parser provides a Rust parser and library API, plus a Python package
built with PyO3 and maturin. It supports strict parsing, explicit tolerant
recovery, source locations, raw-text preservation, semantic helpers, editing
primitives, streaming, multi-file corpus parsing, and configurable
serialization.
Install
Rust:
[]
= "0.2"
Python:
The Python distribution and import name are both citerra:
Core Concepts
Libraryis the compact Rust bibliography collection API for application code that wants entries, fields, strings, comments, preambles, validation, transforms, and writing.ParsedDocumentis the Rust model for tooling that needs diagnostics, source-order blocks, raw source text, partial entries, failed blocks, and source locations.citerra.Documentis the Python document model. It exposes source-order blocks, diagnostics, raw text, editing operations, and writing through Python classes and functions.Parserconfigures strict versus tolerant parsing, source capture, raw-text preservation, expanded values, streaming, and multi-source parsing.
Strict parsing is the default. Tolerant parsing is explicit.
Rust Quick Start
use ;
Rust Tolerant Parsing And Diagnostics
Use tolerant mode when a corpus may contain malformed entries but valid entries before or after those regions should still be returned.
use ;
For diagnostics and source-preserving output, parse into ParsedDocument:
use ;
Rust Query, Edit, And Write
use ;
For simple serialization:
let bibtex = library.to_bibtex?;
library.write_file?;
Rust Semantic Helpers
use ;
Helpers include name parsing, date extraction, DOI normalization, field normalization, resource classification, duplicate-key detection, and validation.
Rust Streaming And Multi-File Parsing
Streaming lets callers process source-order events without building a full intermediate collection:
use ;
Multi-source parsing keeps source identity and can detect duplicate keys across files:
use ;
Enable the parallel feature for Rayon-backed parsing of multiple files from
disk with Parser::parse_files.
Python Quick Start
=
=
assert is not None
assert ==
assert ==
assert . == 2026
load, loads, dump, and dumps are available for file-like workflows:
=
Python Diagnostics, Source, And Raw Text
=
=
=
Python Mutation And Writing
=
=
=
Use preserve_raw=True for low-churn source-preserving writes. Use
preserve_raw=False when normalized formatting is desired.
Python Plain Records
Some application code wants ordinary dictionaries for filtering, indexing, or bulk transforms. The Python package provides explicit helpers for that shape without changing the native document model:
=
=
=
=
Python Helpers
assert ==
assert ==
=
assert . ==
=
assert ==
Feature Flags
[]
= { = "0.2", = ["parallel", "latex_to_unicode"] }
parallel: enables Rayon-backedParser::parse_filesfor multi-file workloads.latex_to_unicode: enables LaTeX accent-to-Unicode conversion helpers.python-extension: builds the PyO3 extension module used by the Python package.
Semantics
Library::parseandParser::parseare strict by default.Parser::tolerant()recovers valid blocks after malformed input and records failures and diagnostics.Libraryexpands string definitions and month constants for field access.ParsedDocumentcan retain raw entry, field, value, comment, preamble, string, and failed-block text whenpreserve_raw()is enabled.- Source columns are 1-based Unicode scalar columns. Byte spans are also exposed for exact source slicing.
- Writer defaults preserve source order. Sorting, alignment, trailing commas, and normalized output are explicit choices.
Local Development
This repository uses a Guix manifest for local development:
Common gates:
Python wheel smoke test without installing into the user environment:
Release Process
The release workflow builds the Rust crate, Python source distribution, and
ABI3 wheels for Linux, macOS, and Windows. Tagging vX.Y.Z runs release
validation, creates a GitHub release, publishes to crates.io, and publishes to
PyPI when the required repository environments and secrets are configured.
See RELEASE_CHECKLIST.md for the exact release gates and setup requirements.
Benchmarks And Comparisons
The repository includes Criterion benchmarks for parser throughput, tolerant
parsing, source-preserving parsing, streaming, writing, corpus parsing, common
library operations, and memory-oriented workloads. The tables below are one run
on tests/fixtures/tugboat.bib:
- 2,701,551 bytes, 73,993 lines, 3,644 entries.
- AMD Ryzen 5 5600G, 6 cores / 12 threads.
- Rust
1.93.0, Python3.11.14. - Measured on 2026-05-13. Throughput is input-size normalized.
Reproduce the Rust parser table:
| Rust parser / mode | Version | Median time | Throughput | Notes |
|---|---|---|---|---|
serde_bibtex ignore |
0.7.1 | 2.411 ms | 1.04 GiB/s | Parses and discards data |
serde_bibtex selected struct |
0.7.1 | 4.143 ms | 621.9 MiB/s | Deserializes selected fields |
bibtex-parser strict Library |
0.2.1 | 5.234 ms | 492.3 MiB/s | Entries, fields, strings, comments, preambles |
serde_bibtex borrowed entries |
0.7.1 | 6.872 ms | 374.9 MiB/s | Borrowed parsed entries |
bibtex-parser tolerant Library |
0.2.1 | 7.015 ms | 367.3 MiB/s | Recovery and failed-block tracking |
biblatex raw bibliography |
0.11.0 | 10.839 ms | 237.7 MiB/s | Raw BibLaTeX bibliography |
serde_bibtex owned entries |
0.7.1 | 13.469 ms | 191.3 MiB/s | Owned entries with month macros |
bibtex-parser streaming events |
0.2.1 | 21.943 ms | 117.4 MiB/s | Source-order callback events |
bibtex-parser source-preserving document |
0.2.1 | 31.862 ms | 80.9 MiB/s | Raw text, source locations, diagnostics model |
nom-bibtex |
0.6.0 | 34.334 ms | 75.0 MiB/s | Parsed bibliography |
Reproduce the Rust writing table:
| Rust writer mode | Version | Median time | Throughput |
|---|---|---|---|
| Raw-preserving document write | 0.2.1 | 1.816 ms | 1.39 GiB/s |
Normalized Library write |
0.2.1 | 5.325 ms | 483.8 MiB/s |
The Python comparison used the local citerra wheel plus bibtexparser 1.4.4,
bibtexparser 2.0.0b9, and pybtex 0.26.1. The comparison script uses
whichever optional packages are installed in the active environment:
| Python parser / mode | Version | Median parse time | Throughput | Relative time |
|---|---|---|---|---|
citerra structured parse |
0.2.1 | 0.058 s | 44.3 MiB/s | 1.0x |
citerra source-preserving parse |
0.2.1 | 0.065 s | 39.9 MiB/s | 1.1x |
bibtexparser parse |
2.0.0b9 | 0.372 s | 6.9 MiB/s | 6.4x |
pybtex parse |
0.26.1 | 0.859 s | 3.0 MiB/s | 14.8x |
bibtexparser parse |
1.4.4 | 10.483 s | 0.2 MiB/s | 180.1x |
| Python writer / mode | Version | Median write time | Throughput | Relative time |
|---|---|---|---|---|
citerra raw-preserving write |
0.2.1 | 0.003 s | 953.2 MiB/s | 1.0x |
citerra normalized write |
0.2.1 | 0.014 s | 181.3 MiB/s | 5.3x |
bibtexparser write |
1.4.4 | 0.106 s | 24.3 MiB/s | 39.2x |
bibtexparser write |
2.0.0b9 | 0.493 s | 5.2 MiB/s | 182.2x |
pybtex write |
0.26.1 | 3.942 s | 0.7 MiB/s | 1458.5x |
License
Licensed under either of Apache-2.0 or MIT, at your option.