marc-rs

Rust library for reading and writing bibliographic records in MARC21, UNIMARC, and MARC XML formats.

What the library does

.mrc / .xml file
      │
      ▼
 RawRecord          ← zero-copy view over raw ISO2709 bytes
      │  JSON dictionary (marc21.json / unimarc.json)
      ▼
   Record           ← structured semantic model, serializable to JSON

The format (MARC21 or UNIMARC) and the character encoding are auto-detected on read. The reverse conversion (Record → binary) is also supported.

Features

Read and write binary MARC21, binary UNIMARC, MARC XML
Auto-detection of format and encoding
High-level Record model organized by blocks (0XX–9XX)
Helpers on Record: title_main(), authors(), isbn(), media_type(), etc.
Serde support: native JSON serialization/deserialization
Encodings: UTF-8, MARC-8, ISO-5426, ISO-8859-*

Installation

[dependencies]
marc-rs = "0.0.5"

Usage

Read records from a file

use marc_rs::MarcReader;

let reader = MarcReader::from_file("records.mrc".as_ref())?;
let records = reader.into_records()?;

for record in &records {
    println!("{:?}", record.title_main());
    println!("{:?}", record.media_type());   // RecordType: LanguageMaterial, Video, Sound…
    println!("{:?}", record.authors().collect::<Vec<_>>());
}

Read from bytes

use marc_rs::{parse_records, MarcReader, Encoding};

// Auto-detect format + encoding
let records = parse_records(&data)?;

// Force encoding if the record is incorrectly declared
let records = MarcReader::from_bytes(data)?
    .with_encoding(Encoding::Iso5426)
    .into_records()?;

Convert a Record to binary MARC21

use marc_rs::{MarcFormat, Encoding};

let format = MarcFormat::Marc21(Encoding::Utf8);
let raw = format.to_raw(&record)?;
std::fs::write("out.mrc", raw.as_bytes())?;

JSON serialization

let json = serde_json::to_string_pretty(&record)?;
let record: marc_rs::Record = serde_json::from_str(&json)?;

How dictionaries work

The translation between raw MARC fields and the Record model is driven by two JSON files: resources/marc21.json and resources/unimarc.json. These files are compiled into the binary via include_str!.

Dictionary structure

{
  "name": "marc21",
  "leader": [ ... ],
  "encoding_indicator": { ... },
  "rules": { ... },
  "blocks": [ ... ]
}

`leader`

List of positions within the 24 bytes of the ISO2709 leader. Each entry extracts one or more bytes and translates them to a field of the model.

{ "position": 6, "target": "record_type", "rules": [
    { "raw": "a", "value": "languageMaterial" },
    { "raw": "g", "value": "projectedMedium" }
]}

→ Byte 6 of the leader becomes record.leader.record_type.

`encoding_indicator`

Indicates where to read the character encoding:

MARC21: byte 9 of the leader ("a" = UTF-8, = MARC-8)
UNIMARC: subfield $a of field 100, positions 26–28 ("50" = UTF-8, "01" = ISO-5426)

`rules`

Named reusable translation tables. Example: the "languages" table is shared by all language fields to translate "fre" → "french", "eng" → "english", etc.

`blocks`

List of field blocks (0XX, 1XX, 2XX…). Each block contains fields (FieldDef) with their subfields (SubfieldBinding).

A subfield binding maps a MARC subfield code to a dotted path in the Record model:

{
  "tag": "245",
  "subfields": [
    { "code": "a", "target": "description.title.main" },
    { "code": "b", "target": "description.title.subtitle" },
    { "code": "c", "target": "description.title.responsibility" }
  ]
}

For fields with fixed-length subfields (such as UNIMARC 100$a), a "slice" allows extracting a specific position:

{ "code": "a", "target": "coded.date_entered_on_file", "slice": { "offset": 0, "length": 8 } }

Format auto-detection

On read, the engine checks whether the record contains field 200 (UNIMARC title) without field 245 (MARC21 title). If so → UNIMARC; otherwise → MARC21.

Adding new fields

To map a field not yet supported, simply add an entry in the appropriate block of the JSON file — no Rust code to modify.

`Record` model

Record
├── leader          (RecordType, BibliographicLevel, RecordStatus…)
├── identification  (ISBN, ISSN, LCCN, control numbers…)
├── coded           (languages, country, target audience, dates…)
├── description     (title, edition, publication, physical description…)
├── notes           (general notes, summary, table of contents…)
├── links           (links to other records)
├── associated_titles
├── indexing        (subjects, classifications, uncontrolled terms)
├── responsibility  (main and added entries)
├── international   (cataloging sources, locations, electronic access)
└── local           (specimens)

Available helpers on Record:

Method	Return
`media_type()`	`&RecordType` (text, video, sound…)
`authors()`	`Iterator<Item = &Agent>`
`titles()`	`Vec<&Title>`
`title_main()`	`Option<&str>`
`isbn()`	`&[Isbn]`
`isbn_string()`	`Option<String>`
`languages()`	`&[Language]`
`lang_primary()`	`Option<&Language>`
`lang_original()`	`Option<&Language>`
`audience()`	`Option<&TargetAudience>`
`subject_main()`	`Option<&str>`
`keywords()`	`&[String]`
`publication_date()`	`Option<&str>`
`abstract_text()`	`Option<&str>`
`specimens()`	`&[Specimen]`

Command-line tool

# Display fields in human-readable mode (auto-detection)
cargo run --bin marc-rs -- records.mrc

# JSON output
cargo run --bin marc-rs -- records.mrc json

# XML output
cargo run --bin marc-rs -- records.mrc xml

# Convert to MARC21 UTF-8
cargo run --bin marc-rs -- records.mrc marc21-utf8 > out.mrc

# Convert to UNIMARC ISO-5426
cargo run --bin marc-rs -- records.mrc unimarc-iso5426 > out.mrc

# Force input encoding
cargo run --bin marc-rs -- --encoding utf8 records.mrc fields

Supported encodings

Identifier	Description
`utf8`	Unicode UTF-8
`marc8`	MARC-8 (fallback to Windows-1252)
`iso5426`	ISO-5426 (extended bibliographic Latin)
`iso8859_2` to `iso8859_5`	Latin-2, Latin-3, Cyrillic…

References

License

MIT OR Apache-2.0

marc-rs 0.0.6