# ged_io
**A fast, full-featured GEDCOM parser and writer for Rust**
[](https://crates.io/crates/ged_io)
[](https://docs.rs/ged_io)
[](https://opensource.org/licenses/MIT)
---
## What is ged_io?
`ged_io` is a Rust library for reading and writing [GEDCOM](https://en.wikipedia.org/wiki/GEDCOM) files - the universal standard for exchanging genealogical data between family tree software.
Whether you're building a genealogy application, migrating data between platforms, or analyzing family history datasets, `ged_io` provides a robust, type-safe API to work with GEDCOM data.
### Key Features
| **Dual Format Support** | Full support for both GEDCOM 5.5.1 and GEDCOM 7.0 specifications |
| **Read & Write** | Parse GEDCOM files into Rust structs, modify them, and write back |
| **Streaming Parser** | Memory-efficient iterator-based parsing for large files |
| **GEDZIP Support** | Read/write `.gdz` archives bundling GEDCOM data with media files |
| **Multiple Encodings** | UTF-8, UTF-16, ISO-8859-1, ISO-8859-15 (Latin-9), ANSEL |
| **JSON Export** | Optional serde integration for JSON serialization |
| **Type Safe** | Strongly-typed Rust structs for all GEDCOM record types |
| **Compatible** | Relax rules to be compatible with most of GEDCOM files |
---
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
ged_io = "0.11"
```
### Optional Features
```toml
# JSON serialization support
ged_io = { version = "0.11", features = ["json"] }
# GEDZIP archive support (.gdz files)
ged_io = { version = "0.11", features = ["gedzip"] }
# Enable all features
ged_io = { version = "0.11", features = ["json", "gedzip"] }
```
---
## Quick Start
### Parse a GEDCOM File
```rust
use ged_io::GedcomBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let content = std::fs::read_to_string("family.ged")?;
let data = GedcomBuilder::new().build_from_str(&content)?;
println!("GEDCOM version: {:?}", data.gedcom_version());
println!("Individuals: {}", data.individuals.len());
println!("Families: {}", data.families.len());
for person in &data.individuals {
if let Some(name) = person.full_name() {
println!(" - {}", name);
}
}
Ok(())
}
```
### Write a GEDCOM File
```rust
use ged_io::{GedcomBuilder, GedcomWriter};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Parse existing file
let content = std::fs::read_to_string("input.ged")?;
let data = GedcomBuilder::new().build_from_str(&content)?;
// Write to new file
let writer = GedcomWriter::new();
let output = writer.write_to_string(&data)?;
std::fs::write("output.ged", output)?;
Ok(())
}
```
---
## Use Cases
### 1. Family Tree Application Backend
Build genealogy software with full GEDCOM import/export:
```rust
use ged_io::{GedcomBuilder, GedcomWriter};
// Import from any genealogy software
let data = GedcomBuilder::new()
.validate_references(true) // Ensure data integrity
.build_from_str(&content)?;
// Access family relationships
for family in &data.families {
let parents = data.get_parents(family);
let children = data.get_children(family);
// Build your family tree UI...
}
// Export back to GEDCOM
let writer = GedcomWriter::new();
std::fs::write("export.ged", writer.write_to_string(&data)?)?;
```
### 2. Data Migration Between Platforms
Convert GEDCOM files between formats or migrate to JSON:
```rust
use ged_io::GedcomBuilder;
// Read GEDCOM 5.5.1 file
let data = GedcomBuilder::new().build_from_str(&old_content)?;
// Check version and migrate
if data.is_gedcom_5() {
println!("Migrating from GEDCOM 5.5.1...");
}
// Export as JSON (requires "json" feature)
#[cfg(feature = "json")]
{
let json = serde_json::to_string_pretty(&data)?;
std::fs::write("family.json", json)?;
}
```
### 3. Genealogy Data Analysis
Analyze family history datasets:
```rust
use ged_io::GedcomBuilder;
let data = GedcomBuilder::new().build_from_str(&content)?;
// Find all people with a specific surname
let smiths = data.search_individuals_by_name("Smith");
println!("Found {} Smiths", smiths.len());
// Analyze source citations
let citation_stats = data.count_source_citations();
println!("Total citations: {}", citation_stats.total);
// Find birth/death statistics
for person in &data.individuals {
if let (Some(birth), Some(death)) = (person.birth_date(), person.death_date()) {
println!("{}: {} - {}",
person.full_name().unwrap_or_default(),
birth, death);
}
}
```
### 4. GEDZIP Archive Processing
Work with GEDCOM 7.0 bundled archives:
```rust
use ged_io::GedcomBuilder;
use ged_io::gedzip::{GedzipReader, write_gedzip_with_media};
use std::collections::HashMap;
// Read GEDZIP with embedded photos
let bytes = std::fs::read("family.gdz")?;
let data = GedcomBuilder::new().build_from_gedzip(&bytes)?;
// Extract media files
let cursor = std::io::Cursor::new(&bytes);
let mut reader = GedzipReader::new(cursor)?;
for filename in reader.media_files() {
let media_bytes = reader.read_media_file(filename)?;
std::fs::write(format!("extracted/{}", filename), media_bytes)?;
}
// Create new GEDZIP with media
let mut media = HashMap::new();
media.insert("photos/grandpa.jpg".to_string(), std::fs::read("grandpa.jpg")?);
let archive = write_gedzip_with_media(&data, &media)?;
std::fs::write("new_family.gdz", archive)?;
```
### 5. Streaming Large Files
Process large GEDCOM files without loading everything into memory:
```rust
use ged_io::{GedcomStreamParser, GedcomRecord, GedcomData};
use std::fs::File;
use std::io::BufReader;
// Stream records one at a time
let file = File::open("huge_family.ged")?;
let reader = BufReader::new(file);
let parser = GedcomStreamParser::new(reader)?;
for result in parser {
match result? {
GedcomRecord::Individual(indi) => {
println!("Found: {}", indi.full_name().unwrap_or_default());
}
GedcomRecord::Family(fam) => {
println!("Family: {}", fam.xref.as_deref().unwrap_or("?"));
}
_ => {} // Handle other record types
}
}
// Or collect into GedcomData when needed
let file = File::open("huge_family.ged")?;
let reader = BufReader::new(file);
let parser = GedcomStreamParser::new(reader)?;
let data: GedcomData = parser
.collect::<Result<Vec<_>, _>>()?
.into_iter()
.collect();
```
Note: The streaming parser requires UTF-8 input. For files with other encodings,
read and convert to UTF-8 first, or use `GedcomBuilder::build_from_str()` which
handles encoding detection automatically.
---
## API Overview
### Core Types
| `GedcomData` | The root container holding all parsed records |
| `Individual` | A person record (INDI) |
| `Family` | A family unit record (FAM) |
| `Source` | A source citation record (SOUR) |
| `Repository` | A repository record (REPO) |
| `Multimedia` | A multimedia object record (OBJE) |
| `SharedNote` | A shared note record (SNOTE) - GEDCOM 7.0 |
### Builder Configuration
```rust
let data = GedcomBuilder::new()
.strict_mode(false) // Lenient parsing (default)
.validate_references(true) // Check cross-reference integrity
.ignore_unknown_tags(false) // Report unknown tags
.max_file_size(Some(50_000_000)) // 50 MB limit
.build_from_str(&content)?;
```
Lenient parsing policy (default):
- Accepts common real-world quirks (UTF-8 BOM, CRLF line endings, trailing newline at EOF).
- Allows missing `HEAD` and/or `TRLR` records (the parser stops cleanly at EOF).
- Keeps writing strict: the writer always emits valid GEDCOM output (including `0 TRLR` without a final newline and using `CONT`/`CONC` for multiline text).
| `strict_mode` | `false` | Fail on non-standard tags |
| `validate_references` | `false` | Validate all cross-references exist |
| `ignore_unknown_tags` | `false` | Silently skip unknown tags |
| `max_file_size` | `None` | Maximum file size in bytes |
### Convenience Methods
```rust
// Find records by cross-reference ID
let person = data.find_individual("@I1@");
let family = data.find_family("@F1@");
let source = data.find_source("@S1@");
// Navigate relationships
let families = data.get_families_as_spouse("@I1@");
let parents = data.get_parents(family);
let children = data.get_children(family);
let spouse = data.get_spouse("@I1@", family);
// Search
let matches = data.search_individuals_by_name("Smith");
// Statistics
let total = data.total_records();
let is_empty = data.is_empty();
```
### Indexed Lookups (O(1) Performance)
For large files with frequent lookups:
```rust
use ged_io::indexed::IndexedGedcomData;
let indexed = IndexedGedcomData::from(data);
// O(1) lookups instead of O(n) linear search
let person = indexed.find_individual("@I1@");
let family = indexed.find_family("@F1@");
```
---
## Supported GEDCOM Tags
The library provides full support for GEDCOM 5.5.1 and 7.0 specifications. Tags marked with **7.0** are new in GEDCOM 7.0.
### Records (Level 0)
| HEAD | Header with file metadata | ✅ | ✅ |
| INDI | Individual person record | ✅ | ✅ |
| FAM | Family group record | ✅ | ✅ |
| SOUR | Source record | ✅ | ✅ |
| REPO | Repository record | ✅ | ✅ |
| OBJE | Multimedia object record | ✅ | ✅ |
| SUBM | Submitter record | ✅ | ✅ |
| SUBN | Submission record | ✅ | - |
| SNOTE | Shared note record | - | ✅ |
| TRLR | Trailer (end of file) | ✅ | ✅ |
### Individual Events
| ADOP | Adoption | ✅ | ✅ |
| BAPM | Baptism | ✅ | ✅ |
| BARM | Bar Mitzvah | ✅ | ✅ |
| BASM | Bas Mitzvah | ✅ | ✅ |
| BIRT | Birth | ✅ | ✅ |
| BLES | Blessing | ✅ | ✅ |
| BURI | Burial/Depositing remains | ✅ | ✅ |
| CENS | Census | ✅ | ✅ |
| CHR | Christening | ✅ | ✅ |
| CHRA | Adult christening | ✅ | ✅ |
| CONF | Confirmation | ✅ | ✅ |
| CREM | Cremation | ✅ | ✅ |
| DEAT | Death | ✅ | ✅ |
| EMIG | Emigration | ✅ | ✅ |
| EVEN | Generic event | ✅ | ✅ |
| FCOM | First communion | ✅ | ✅ |
| GRAD | Graduation | ✅ | ✅ |
| IMMI | Immigration | ✅ | ✅ |
| NATU | Naturalization | ✅ | ✅ |
| ORDN | Ordination | ✅ | ✅ |
| PROB | Probate | ✅ | ✅ |
| RETI | Retirement | ✅ | ✅ |
| WILL | Will | ✅ | ✅ |
### Family Events
| ANUL | Annulment | ✅ | ✅ |
| CENS | Census | ✅ | ✅ |
| DIV | Divorce | ✅ | ✅ |
| DIVF | Divorce filed | ✅ | ✅ |
| ENGA | Engagement | ✅ | ✅ |
| EVEN | Generic event | ✅ | ✅ |
| MARB | Marriage bann | ✅ | ✅ |
| MARC | Marriage contract | ✅ | ✅ |
| MARL | Marriage license | ✅ | ✅ |
| MARR | Marriage | ✅ | ✅ |
| MARS | Marriage settlement | ✅ | ✅ |
| RESI | Residence | ✅ | ✅ |
| SEP | Separation | - | ✅ |
### Individual Attributes
| CAST | Caste name | ✅ | ✅ |
| DSCR | Physical description | ✅ | ✅ |
| EDUC | Education | ✅ | ✅ |
| FACT | Generic fact | ✅ | ✅ |
| IDNO | National ID number | ✅ | ✅ |
| NATI | Nationality | ✅ | ✅ |
| NCHI | Number of children | ✅ | ✅ |
| NMR | Number of marriages | ✅ | ✅ |
| OCCU | Occupation | ✅ | ✅ |
| PROP | Property/Possessions | ✅ | ✅ |
| RELI | Religion | ✅ | ✅ |
| RESI | Residence | ✅ | ✅ |
| SSN | Social Security Number | ✅ | ✅ |
| TITL | Title/Nobility | ✅ | ✅ |
### Name Structure
| NAME | Personal name | ✅ | ✅ |
| GIVN | Given name | ✅ | ✅ |
| NICK | Nickname | ✅ | ✅ |
| NPFX | Name prefix | ✅ | ✅ |
| NSFX | Name suffix | ✅ | ✅ |
| SPFX | Surname prefix | ✅ | ✅ |
| SURN | Surname | ✅ | ✅ |
| TYPE | Name type | ✅ | ✅ |
| FONE | Phonetic variation | ✅ | ✅ |
| ROMN | Romanized variation | ✅ | ✅ |
| TRAN | Translation | - | ✅ |
### Family Links
| CHIL | Child | ✅ | ✅ |
| FAMC | Family as child | ✅ | ✅ |
| FAMS | Family as spouse | ✅ | ✅ |
| HUSB | Husband/Partner | ✅ | ✅ |
| WIFE | Wife/Partner | ✅ | ✅ |
| PEDI | Pedigree linkage type | ✅ | ✅ |
| STAT | Status | ✅ | ✅ |
| ALIA | Alias/Alternate ID | ✅ | ✅ |
| ASSO | Association | ✅ | ✅ |
### Source & Citation
| SOUR | Source citation | ✅ | ✅ |
| ABBR | Abbreviation | ✅ | ✅ |
| AUTH | Author | ✅ | ✅ |
| CALN | Call number | ✅ | ✅ |
| DATA | Data | ✅ | ✅ |
| PAGE | Page/Location | ✅ | ✅ |
| PUBL | Publication info | ✅ | ✅ |
| QUAY | Quality assessment | ✅ | ✅ |
| REPO | Repository reference | ✅ | ✅ |
| ROLE | Role in event | ✅ | ✅ |
| TEXT | Text from source | ✅ | ✅ |
| TITL | Title | ✅ | ✅ |
### Date & Time
| DATE | Date | ✅ | ✅ |
| TIME | Time | ✅ | ✅ |
| CHAN | Change date | ✅ | ✅ |
| CREA | Creation date | - | ✅ |
| SDATE | Sort date | - | ✅ |
| PHRASE | Free-text phrase | - | ✅ |
### Place & Address
| PLAC | Place | ✅ | ✅ |
| ADDR | Address | ✅ | ✅ |
| ADR1 | Address line 1 | ✅ | ✅ |
| ADR2 | Address line 2 | ✅ | ✅ |
| ADR3 | Address line 3 | ✅ | ✅ |
| CITY | City | ✅ | ✅ |
| STAE | State/Province | ✅ | ✅ |
| POST | Postal code | ✅ | ✅ |
| CTRY | Country | ✅ | ✅ |
| MAP | Map coordinates | ✅ | ✅ |
| LATI | Latitude | ✅ | ✅ |
| LONG | Longitude | ✅ | ✅ |
| FONE | Phonetic variation | ✅ | ✅ |
| ROMN | Romanized variation | ✅ | ✅ |
| FORM | Place hierarchy format | ✅ | ✅ |
### Multimedia
| OBJE | Multimedia link | ✅ | ✅ |
| FILE | File reference | ✅ | ✅ |
| FORM | Media format/type | ✅ | ✅ |
| TITL | Title | ✅ | ✅ |
| MEDI | Medium type | ✅ | ✅ |
| BLOB | Binary data (5.5.1 only) | ✅ | - |
| CROP | Image crop region | - | ✅ |
| TOP | Crop top | - | ✅ |
| LEFT | Crop left | - | ✅ |
| HEIGHT | Crop height | - | ✅ |
| WIDTH | Crop width | - | ✅ |
### Notes
| NOTE | Note | ✅ | ✅ |
| SNOTE | Shared note reference | - | ✅ |
| CONT | Line continuation | ✅ | ✅ |
| CONC | Line concatenation | ✅ | - |
| MIME | MIME type | - | ✅ |
| LANG | Language | ✅ | ✅ |
| TRAN | Translation | - | ✅ |
### Header & Metadata
| GEDC | GEDCOM info | ✅ | ✅ |
| VERS | Version | ✅ | ✅ |
| FORM | Format | ✅ | ✅ |
| CHAR | Character encoding | ✅ | - |
| DEST | Destination | ✅ | ✅ |
| COPR | Copyright | ✅ | ✅ |
| CORP | Corporation | ✅ | ✅ |
| SCHMA | Extension schema | - | ✅ |
| TAG | Tag definition | - | ✅ |
### Contact Information
| PHON | Phone | ✅ | ✅ |
| EMAIL | Email | ✅ | ✅ |
| FAX | Fax | ✅ | ✅ |
| WWW | Website | ✅ | ✅ |
### Identifiers
| REFN | User reference number | ✅ | ✅ |
| RIN | Record ID number | ✅ | ✅ |
| AFN | Ancestral File Number | ✅ | ✅ |
| UID | Unique identifier | ✅ | ✅ |
| EXID | External identifier | - | ✅ |
### Event Details
| TYPE | Event/Fact type | ✅ | ✅ |
| AGE | Age at event | ✅ | ✅ |
| AGNC | Agency | ✅ | ✅ |
| CAUS | Cause | ✅ | ✅ |
| RESN | Restriction notice | ✅ | ✅ |
| NO | Non-event assertion | - | ✅ |
### LDS Ordinances
| BAPL | Baptism, LDS | ✅ | ✅ |
| CONL | Confirmation, LDS | ✅ | ✅ |
| ENDL | Endowment, LDS | ✅ | ✅ |
| SLGC | Sealing, child to parents | ✅ | ✅ |
| SLGS | Sealing, spouse | ✅ | ✅ |
| INIL | Initiatory, LDS | - | ✅ |
| TEMP | Temple | ✅ | ✅ |
| STAT | Ordinance status | ✅ | ✅ |
### Submitter & Submission
| SUBM | Submitter reference | ✅ | ✅ |
| ANCI | Ancestor interest | ✅ | ✅ |
| DESI | Descendant interest | ✅ | ✅ |
### Other
| SEX | Sex/Gender | ✅ | ✅ |
### Date Formats
All standard GEDCOM date formats are preserved:
- **Exact**: `15 MAR 1950`
- **Range**: `BET 1900 AND 1910`, `BEF 1900`, `AFT 1900`
- **Period**: `FROM 1900 TO 1910`
- **Approximate**: `ABT 1900`, `CAL 1900`, `EST 1900`
### Calendars
- Gregorian (`@#DGREGORIAN@`)
- Julian (`@#DJULIAN@`)
- Hebrew (`@#DHEBREW@`)
- French Republican (`@#DFRENCH R@`)
### Character Encodings
- UTF-8 (with/without BOM)
- UTF-16 LE/BE
- ANSEL (Z39.47, legacy GEDCOM 5.x encoding)
- ISO-8859-1 (Latin-1)
- ISO-8859-15 (Latin-9)
- ASCII
---
## Command Line Tool
A CLI tool is included for quick GEDCOM inspection:
```bash
# Install
cargo install ged_io
# Help
ged_io --help
ged_io - GEDCOM inspection tool
USAGE:
ged_io <file.ged>
ged_io --individual <XREF> <file.ged>
ged_io --individual-lastname <LASTNAME> <file.ged>
ged_io --individual-firstname <FIRSTNAME> <file.ged>
OPTIONS:
-h, --help Print this help
--individual <XREF> Display a single individual (e.g. @I1@)
--individual-lastname <LASTNAME> Filter individuals by last name (case-insensitive)
--individual-firstname <FIRSTNAME> Filter individuals by first name (case-insensitive)
NOTES:
If both --individual-lastname and --individual-firstname are set,
individuals matching BOTH filters are listed.
```
Example with one file:
```bash
# Analyze a file
ged_io family.ged
```
Output (example `tests/fixtures/sample.ged`):
```
----------------------
submissions: 0
submitters: 1
individuals: 3
families: 2
repositories: 1
sources (records): 1
source citations: 1
multimedia: 0
shared_notes: 0
----------------------
on individuals: 0
on events: 1
on attributes: 0
on families: 0
on names: 0
on other: 0
----------------------
```
---
## Building from Source
```bash
# Clone the repository
git clone https://github.com/ge3224/ged_io.git
cd ged_io
# Build
cargo build --release
# Run tests
cargo test --all-features
# Run benchmarks
cargo bench
# Check code quality
cargo clippy --all-targets --all-features -- -D warnings
```
---
## Performance
Criterion benchmarks (`cargo bench`) on this repo's fixtures:
| `tests/fixtures/simple.ged` | `GedcomBuilder::build_from_str` | ~10.58 µs |
| `tests/fixtures/sample.ged` | `GedcomBuilder::build_from_str` | ~22.27 µs |
| `tests/fixtures/washington.ged` | `GedcomBuilder::build_from_str` | ~2.93 ms |
Notes:
- The "original" API (`Gedcom::new`) is faster in parsing-only benches (~8.30 µs / ~18.06 µs / ~2.74 ms) because it does less validation/configuration work.
- Round-tripping (parse + write) is benchmarked separately in `benches/memory.rs`.
- Numbers vary by CPU, Rust version, and enabled features.
---
## Documentation
- [API Documentation](https://docs.rs/ged_io) - Full API reference
- [MIGRATION.md](MIGRATION.md) - GEDCOM 5.5.1 to 7.0 migration guide
- [ROADMAP.md](ROADMAP.md) - Project roadmap and planned features
- GEDCOM specifications (bundled in this repo):
- [GEDCOM 7.0 Specification (PDF)](docs/FamilySearchGEDCOMv7.pdf)
- [GEDCOM 5.5.1 Specification (PDF)](docs/ged551.pdf)
---
## Contributing
Contributions are welcome! Areas where help is appreciated:
- Bug reports and feature requests
- Additional test cases and edge cases
- Documentation improvements
- Performance optimizations
Please feel free to open issues or submit pull requests.
---
## License
This project is licensed under the [MIT License](LICENSE).
---
## Acknowledgments
Originally forked from [`pirtleshell/rust-gedcom`](https://github.com/pirtleshell/rust-gedcom).
GEDCOM is a specification maintained by [FamilySearch](https://www.familysearch.org/).