bibtex-parser
BibTeX parser for Rust.
bibtex-parser parses BibTeX into structured Rust types. It supports strict
parsing by default, opt-in tolerant recovery, source locations, raw-text
retention, semantic helpers, editing primitives, streaming, multi-file corpus
parsing, and configurable serialization.
Performance Snapshot
Measured on tests/fixtures/tugboat.bib: 2,701,551 bytes, 73,993 lines, and
3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured
on 2026-05-13 with Rust 1.93.0; throughput is input-size normalized.
The first table compares parser modes that return reusable bibliography data. The second table keeps narrower throughput baselines separate because those rows intentionally do less work than a full document parse.
| Rust parser / mode | Version | Median time | Throughput | Output retained |
|---|---|---|---|---|
bibtex-parser strict Library |
0.2.2 | 5.234 ms | 492.3 MiB/s | Entries, fields, strings, comments, preambles |
serde_bibtex borrowed entries |
0.7.1 | 6.872 ms | 374.9 MiB/s | Borrowed entries |
bibtex-parser tolerant Library |
0.2.2 | 7.015 ms | 367.3 MiB/s | Recovery and failed-block tracking |
biblatex raw bibliography |
0.11.0 | 10.839 ms | 237.7 MiB/s | Raw BibLaTeX bibliography |
serde_bibtex owned entries |
0.7.1 | 13.469 ms | 191.3 MiB/s | Owned entries with month macros |
bibtex-parser streaming events |
0.2.2 | 21.943 ms | 117.4 MiB/s | Source-order callback events |
bibtex-parser source-preserving document |
0.2.2 | 31.862 ms | 80.9 MiB/s | Raw text, source locations, diagnostics model |
nom-bibtex |
0.6.0 | 34.334 ms | 75.0 MiB/s | Parsed bibliography |
| Narrow baseline | Version | Median time | Throughput | Scope |
|---|---|---|---|---|
serde_bibtex ignore |
0.7.1 | 2.411 ms | 1.04 GiB/s | Parse and discard |
serde_bibtex selected struct |
0.7.1 | 4.143 ms | 621.9 MiB/s | Deserialize selected fields |
| Rust writer mode | Version | Median time | Throughput |
|---|---|---|---|
| Raw-preserving document write | 0.2.2 | 1.816 ms | 1.39 GiB/s |
Normalized Library write |
0.2.2 | 5.325 ms | 483.8 MiB/s |
Reproduction commands are listed in Reproducing Benchmarks.
Install
[]
= "0.2"
Enable optional functionality as needed:
[]
= { = "0.2", = ["parallel", "latex_to_unicode"] }
parallel: Rayon-backed parsing for multiple files.latex_to_unicode: LaTeX accent-to-Unicode conversion helpers.python-extension: PyO3 extension module used by theciterrapackage.
Core Types
Libraryis the compact bibliography collection for application code that needs entries, fields, strings, comments, preambles, validation, transforms, and writing.ParsedDocumentis the source-preserving document for tooling that needs diagnostics, source-order blocks, raw source text, partial entries, failed blocks, and source locations.Parserconfigures strict versus tolerant parsing, source capture, raw-text preservation, expanded values, streaming, and multi-source parsing.
Strict parsing is the default. Tolerant parsing is explicit.
Parse
use ;
Tolerant Parsing
Use tolerant mode when a corpus may contain malformed entries but valid entries before or after those regions should still be returned.
use ;
Diagnostics And Source Locations
Parse into ParsedDocument when callers need source-order blocks,
diagnostics, locations, partial entries, or raw text.
use ;
Query, Edit, And Write
use ;
For simple serialization:
let bibtex = library.to_bibtex?;
library.write_file?;
Semantic Helpers
use ;
Helpers include name parsing, date extraction, DOI normalization, field normalization, resource classification, duplicate-key detection, and validation.
Streaming And Multi-File Parsing
Streaming lets callers process source-order events without building a full intermediate collection:
use ;
Multi-source parsing keeps source identity and can detect duplicate keys across files:
use ;
Use Parser::parse_files with the parallel feature for Rayon-backed parsing
of multiple files from disk.
Semantics
Library::parseandParser::parseare strict by default.Parser::tolerant()recovers valid blocks after malformed input and records failures and diagnostics.Libraryexpands string definitions and month constants for field access.ParsedDocumentcan retain raw entry, field, value, comment, preamble, string, and failed-block text whenpreserve_raw()is enabled.- Source columns are 1-based Unicode scalar columns. Byte spans are also exposed for exact source slicing.
- Writer defaults preserve source order. Sorting, alignment, trailing commas, and normalized output are explicit choices.
Reproducing Benchmarks
The repository includes Criterion benchmarks for parser throughput, tolerant parsing, source-preserving parsing, streaming, writing, corpus parsing, common library operations, and memory-oriented workloads.
Run the parser table:
Run the writing table:
Python benchmark notes are in PYTHON.md.
Local Development
This repository uses a Guix manifest for local development:
Common gates:
Build and test the Python wheel:
Python Package
The Python package is citerra. See PYTHON.md.
Release Process
The release workflow builds the Rust crate, Python source distribution, and
ABI3 wheels for Linux, macOS, and Windows. Tagging vX.Y.Z runs release
validation, creates a GitHub release, publishes to crates.io, and publishes to
PyPI when the required repository environments and secrets are configured.
See RELEASE_CHECKLIST.md for the exact release gates and setup requirements.
License
Licensed under either of Apache-2.0 or MIT, at your option.