marc-rs
Rust library for reading and writing bibliographic records in MARC21, UNIMARC, and MARC XML formats.
What the library does
.mrc / .xml file
│
▼
RawRecord ← zero-copy view over raw ISO2709 bytes
│ JSON dictionary (marc21.json / unimarc.json)
▼
Record ← structured semantic model, serializable to JSON
The format (MARC21 or UNIMARC) and the character encoding are auto-detected on read.
The reverse conversion (Record → binary) is also supported.
Features
- Read and write binary MARC21, binary UNIMARC, MARC XML
- Auto-detection of format and encoding
- High-level
Recordmodel organized by blocks (0XX–9XX) - Helpers on
Record:title_main(),authors(),isbn(),media_type(), etc. - Serde support: native JSON serialization/deserialization
- Encodings: UTF-8, MARC-8, ISO-5426, ISO-8859-*
Installation
[]
= "0.0.5"
Usage
Read records from a file
use MarcReader;
let reader = from_file?;
let records = reader.into_records?;
for record in &records
Read from bytes
use ;
// Auto-detect format + encoding
let records = parse_records?;
// Force encoding if the record is incorrectly declared
let records = from_bytes?
.with_encoding
.into_records?;
Convert a Record to binary MARC21
use ;
let format = Marc21;
let raw = format.to_raw?;
write?;
JSON serialization
let json = to_string_pretty?;
let record: Record = from_str?;
How dictionaries work
The translation between raw MARC fields and the Record model is driven by two JSON files:
resources/marc21.json and resources/unimarc.json. These files are compiled into the binary via include_str!.
Dictionary structure
leader
List of positions within the 24 bytes of the ISO2709 leader. Each entry extracts one or more bytes and translates them to a field of the model.
→ Byte 6 of the leader becomes record.leader.record_type.
encoding_indicator
Indicates where to read the character encoding:
- MARC21: byte 9 of the leader (
"a"= UTF-8,= MARC-8) - UNIMARC: subfield
$aof field 100, positions 26–28 ("50"= UTF-8,"01"= ISO-5426)
rules
Named reusable translation tables. Example: the "languages" table is shared by all language fields to translate "fre" → "french", "eng" → "english", etc.
blocks
List of field blocks (0XX, 1XX, 2XX…). Each block contains fields (FieldDef) with their subfields (SubfieldBinding).
A subfield binding maps a MARC subfield code to a dotted path in the Record model:
For fields with fixed-length subfields (such as UNIMARC 100$a), a "slice" allows extracting a specific position:
Format auto-detection
On read, the engine checks whether the record contains field 200 (UNIMARC title) without field 245 (MARC21 title). If so → UNIMARC; otherwise → MARC21.
Adding new fields
To map a field not yet supported, simply add an entry in the appropriate block of the JSON file — no Rust code to modify.
Record model
Record
├── leader (RecordType, BibliographicLevel, RecordStatus…)
├── identification (ISBN, ISSN, LCCN, control numbers…)
├── coded (languages, country, target audience, dates…)
├── description (title, edition, publication, physical description…)
├── notes (general notes, summary, table of contents…)
├── links (links to other records)
├── associated_titles
├── indexing (subjects, classifications, uncontrolled terms)
├── responsibility (main and added entries)
├── international (cataloging sources, locations, electronic access)
└── local (specimens)
Available helpers on Record:
| Method | Return |
|---|---|
media_type() |
&RecordType (text, video, sound…) |
authors() |
Iterator<Item = &Agent> |
titles() |
Vec<&Title> |
title_main() |
Option<&str> |
isbn() |
&[Isbn] |
isbn_string() |
Option<String> |
languages() |
&[Language] |
lang_primary() |
Option<&Language> |
lang_original() |
Option<&Language> |
audience() |
Option<&TargetAudience> |
subject_main() |
Option<&str> |
keywords() |
&[String] |
publication_date() |
Option<&str> |
abstract_text() |
Option<&str> |
specimens() |
&[Specimen] |
Command-line tool
# Display fields in human-readable mode (auto-detection)
# JSON output
# XML output
# Convert to MARC21 UTF-8
# Convert to UNIMARC ISO-5426
# Force input encoding
Supported encodings
| Identifier | Description |
|---|---|
utf8 |
Unicode UTF-8 |
marc8 |
MARC-8 (fallback to Windows-1252) |
iso5426 |
ISO-5426 (extended bibliographic Latin) |
iso8859_2 to iso8859_5 |
Latin-2, Latin-3, Cyrillic… |
References
License
MIT OR Apache-2.0