unimorph-core 0.1.1

Core library for UniMorph morphological data
Documentation

Core library for working with UniMorph morphological data.

This crate provides types, storage, and query capabilities for UniMorph datasets. It is designed for high-performance lookups (conjugation/declension queries) while also supporting bulk export for ML pipelines.

Quick Start

use unimorph_core::{Repository, Store};

#[tokio::main]
async fn main() -> unimorph_core::Result<()> {
    // Initialize repository (handles downloads and caching)
    let repo = Repository::new()?;

    // Ensure Italian dataset is available
    repo.ensure("ita").await?;

    // Open the store for queries
    let store = repo.store()?;

    // Look up all forms of "parlare"
    for entry in store.inflect("ita", "parlare")? {
        println!("{} -> {} ({})", entry.lemma, entry.form, entry.features);
    }

    // Reverse lookup: what lemmas produce "parlo"?
    for entry in store.analyze("ita", "parlo")? {
        println!("{} <- {} ({})", entry.form, entry.lemma, entry.features);
    }
    Ok(())
}

Architecture

  • SQLite backend: All data stored in a single file at ~/.cache/unimorph/datasets.db
  • Pre-computed stats: Aggregate statistics cached at import time
  • Iterator-based queries: Results stream from SQLite, won't OOM on large datasets
  • Parquet export: For users who need DataFrame integration