# citerra
[](https://github.com/b-vitamins/bibtex-parser/actions/workflows/ci.yml)
[](https://pypi.org/project/citerra/)
[](https://pypi.org/project/citerra/)
[](https://github.com/b-vitamins/bibtex-parser/blob/master/LICENSE)
BibTeX parser for Python.
`citerra` parses, validates, edits, and writes BibTeX documents. It supports
strict parsing by default, opt-in tolerant recovery, diagnostics with source
locations, raw-text retention, source-preserving writes, name/date/identifier
helpers, and plain-record projection for application code.
The package is distributed as ABI3 wheels for Python 3.8 and newer.
## Performance Snapshot
Measured on `tests/fixtures/tugboat.bib`: 2,701,551 bytes, 73,993 lines, and
3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured
on 2026-05-13 with Python `3.11.14`; throughput is input-size normalized.
The comparison used `citerra` 0.2.2, `bibtexparser` 1.4.4,
`bibtexparser` 2.0.0b9, and `pybtex` 0.26.1.
`citerra` structured parse disables source capture and raw preservation for the
closest parser-output comparison. `citerra` source-preserving parse includes
raw source text, source locations, diagnostics, and source-order blocks.
| `citerra` structured parse | 0.2.2 | 0.058 s | 44.3 MiB/s | 1.0x |
| `citerra` source-preserving parse | 0.2.2 | 0.065 s | 39.9 MiB/s | 1.1x |
| `bibtexparser` parse | 2.0.0b9 | 0.372 s | 6.9 MiB/s | 6.4x |
| `pybtex` parse | 0.26.1 | 0.859 s | 3.0 MiB/s | 14.8x |
| `bibtexparser` parse | 1.4.4 | 10.483 s | 0.2 MiB/s | 180.1x |
| `citerra` raw-preserving write | 0.2.2 | 0.003 s | 953.2 MiB/s | 1.0x |
| `citerra` normalized write | 0.2.2 | 0.014 s | 181.3 MiB/s | 5.3x |
| `bibtexparser` write | 1.4.4 | 0.106 s | 24.3 MiB/s | 39.2x |
| `bibtexparser` write | 2.0.0b9 | 0.493 s | 5.2 MiB/s | 182.2x |
| `pybtex` write | 0.26.1 | 3.942 s | 0.7 MiB/s | 1458.5x |
Reproduction commands are listed in [Reproducing Benchmarks](#reproducing-benchmarks).
## Install
```sh
pip install citerra
```
The distribution name and import name are both `citerra`:
```python
import citerra
```
## Parse
```python
import citerra
document = citerra.parse(
'@article{paper, author = "Jane Doe", title = "Example Paper", year = 2026}',
expand_values=True,
)
entry = document.entry("paper")
assert entry is not None
assert entry.entry_type == "article"
assert entry.get("title") == "Example Paper"
assert entry.date_parts().year == 2026
```
File helpers are available:
```python
from pathlib import Path
import citerra
document = citerra.parse_path("references.bib", tolerant=True)
Path("normalized.bib").write_text(citerra.dumps(document), encoding="utf-8")
```
File-like helpers are also available:
```python
with open("references.bib", encoding="utf-8") as handle:
document = citerra.load(handle, tolerant=True)
text = citerra.dumps(document)
```
## Document Model
- `Document` contains entries, comments, preambles, string definitions,
source-order blocks, diagnostics, and validation helpers.
- `Entry` exposes the citation key, entry type, fields, source text, semantic
helpers, and field mutation methods.
- `Field` exposes the original field name, parsed value, optional raw source
text, and optional source location.
- `Value` represents string literals, numbers, variables, and concatenations.
- `Diagnostic` reports parse or validation problems with stable codes and
source locations when available.
## Tolerant Parsing And Diagnostics
```python
document = citerra.parse(
text,
tolerant=True,
capture_source=True,
preserve_raw=True,
source="refs/main.bib",
)
if document.status != "ok":
for diagnostic in document.diagnostics:
span = diagnostic.source
if span is None:
print(diagnostic.code, diagnostic.message)
else:
print(diagnostic.code, span.line, span.column, diagnostic.message)
```
## Raw Text And Source-Preserving Writes
```python
document = citerra.parse(
text,
tolerant=True,
capture_source=True,
preserve_raw=True,
)
entry = document.entry("paper")
if entry is not None:
print(entry.raw)
print(entry.field("title").raw_value)
```
Use `WriterConfig(preserve_raw=True)` for low-churn output that reuses retained
source text where possible. Use `WriterConfig(preserve_raw=False)` for
normalized structured output.
```python
document.rename_key("paper", "paper-v2")
document.set_field("paper-v2", "note", "accepted")
document.remove_export_fields(["abstract", "keywords"])
config = citerra.WriterConfig(
preserve_raw=True,
trailing_comma=True,
)
output = document.write(config)
```
## Plain Records
Some application code wants ordinary dictionaries for filtering, indexing, or
bulk transforms. `citerra` provides explicit helpers for that shape without
changing the document model:
```python
document = citerra.parse_path("references.bib")
records = citerra.document_to_dicts(document)
selected = [record for record in records if record.get("year") == "2026"]
text = citerra.write_entries(
selected,
field_order=["author", "title", "journal", "year", "doi"],
sort_by=["ID"],
trailing_comma=True,
)
```
Plain records use `ENTRYTYPE` and `ID` keys for the entry type and citation key.
## Helpers
```python
assert citerra.normalize_doi("https://doi.org/10.1000/XYZ.") == "10.1000/xyz"
assert citerra.latex_to_unicode("Jos\\'e") == "José"
names = citerra.parse_names("Jane Doe and {Research Group}")
assert names[1].literal == "Research Group"
date = citerra.parse_date("2026-05-13")
assert (date.year, date.month, date.day) == (2026, 5, 13)
```
## Reproducing Benchmarks
The comparison script uses whichever optional packages are installed in the
active environment:
```sh
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --write
```
## Implementation
`citerra` is implemented as a native extension. Wheels include the parser
engine, so ordinary Python installs do not require a Rust toolchain.
## Local Build
Use the project manifest for local development:
```sh
guix shell -m manifest.scm -- maturin build --release --out target/wheels
```
For local tests without installing into the user environment, unpack the built
wheel into a temporary import directory and run pytest with that directory on
`PYTHONPATH`:
```sh
rm -rf target/python-test
python3 - <<'PY'
from pathlib import Path
from zipfile import ZipFile
wheel = sorted(Path("target/wheels").glob("citerra-*.whl"))[-1]
target = Path("target/python-test")
target.mkdir(parents=True, exist_ok=True)
with ZipFile(wheel) as archive:
archive.extractall(target)
PY
guix shell -m manifest.scm -- env PYTHONPATH=target/python-test python3 -m pytest tests/python
```
## License
Licensed under either of Apache-2.0 or MIT, at your option.