unimorph-core 0.1.2

Core library for UniMorph morphological data
Documentation

unimorph-rs

Crates.io Documentation License

A Rust toolkit for working with UniMorph morphological data.

What is UniMorph?

UniMorph provides morphological paradigm data for 169+ languages in a unified annotation format. Each entry is a triple of lemma, inflected form, and morphological features:

lemma       form        features
parlare     parlo       V;IND;PRS;1;SG
parlare     parlato     V.PTCP;PST
essere      sono        V;IND;PRS;1;SG

Installation

Homebrew (macOS/Linux)

brew tap joshrotenberg/brew
brew install unimorph

Cargo

cargo install unimorph-cli

Docker

docker pull ghcr.io/joshrotenberg/unimorph-rs:latest

# Run with persistent data cache
docker run -v ~/.cache/unimorph:/data ghcr.io/joshrotenberg/unimorph-rs download ita
docker run -v ~/.cache/unimorph:/data ghcr.io/joshrotenberg/unimorph-rs inflect ita parlare

From source

git clone https://github.com/joshrotenberg/unimorph-rs
cd unimorph-rs
cargo install --path crates/unimorph-cli

Quick Start

# Download Italian dataset
unimorph download ita

# Look up all forms of a verb
unimorph inflect ita parlare

# Analyze a surface form (reverse lookup)
unimorph analyze ita parlo

# Search with filters
unimorph search ita --lemma "parl*" --contains V,IND

# Dataset statistics
unimorph stats ita

# Export to JSON Lines
unimorph export ita -f jsonl -o italian.jsonl

Library Usage

use unimorph_core::{Store, Repository, LangCode};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Download dataset if needed
    let repo = Repository::new()?;
    let lang: LangCode = "ita".parse()?;
    repo.ensure_dataset(&lang).await?;

    // Query the data
    let store = repo.store()?;
    
    // Get all forms of a lemma
    for entry in store.inflect(&lang, "parlare")? {
        println!("{} -> {} [{}]", entry.lemma, entry.form, entry.features);
    }

    // Reverse lookup: find lemmas for a surface form
    for entry in store.analyze(&lang, "parlo")? {
        println!("{} <- {} [{}]", entry.form, entry.lemma, entry.features);
    }

    Ok(())
}

Documentation

Full documentation is available at joshrotenberg.github.io/unimorph-rs, including:

Project Structure

unimorph-rs/
├── crates/
│   ├── unimorph-core/   # Core library: types, SQLite store, repository
│   └── unimorph-cli/    # Command-line interface
└── docs/                # mdBook documentation

References

License

Apache-2.0