Crate unimorph_core

Crate unimorph_core 

Source
Expand description

Core library for working with UniMorph morphological data.

This crate provides types, storage, and query capabilities for UniMorph datasets. It is designed for high-performance lookups (conjugation/declension queries) while also supporting bulk export for ML pipelines.

§Quick Start

use unimorph_core::{Repository, Store};

#[tokio::main]
async fn main() -> unimorph_core::Result<()> {
    // Initialize repository (handles downloads and caching)
    let repo = Repository::new()?;

    // Ensure Italian dataset is available
    repo.ensure("ita").await?;

    // Open the store for queries
    let store = repo.store()?;

    // Look up all forms of "parlare"
    for entry in store.inflect("ita", "parlare")? {
        println!("{} -> {} ({})", entry.lemma, entry.form, entry.features);
    }

    // Reverse lookup: what lemmas produce "parlo"?
    for entry in store.analyze("ita", "parlo")? {
        println!("{} <- {} ({})", entry.form, entry.lemma, entry.features);
    }
    Ok(())
}

§Architecture

  • SQLite backend: All data stored in a single file at ~/.cache/unimorph/datasets.db
  • Pre-computed stats: Aggregate statistics cached at import time
  • Iterator-based queries: Results stream from SQLite, won’t OOM on large datasets
  • Parquet export: For users who need DataFrame integration

Re-exports§

pub use error::Error;
pub use error::Result;
pub use query::QueryBuilder;
pub use repository::DownloadPhase;
pub use repository::DownloadProgress;
pub use repository::Repository;
pub use store::Store;
pub use types::CompressionFormat;
pub use types::DatasetStats;
pub use types::Entry;
pub use types::FeatureBundle;
pub use types::LangCode;
pub use types::MalformedEntry;
pub use types::ParseReport;

Modules§

error
Error types for unimorph-core.
export
Export functionality for UniMorph data.
query
Fluent query builder for UniMorph data.
repository
Repository for downloading and caching UniMorph datasets.
store
SQLite-based storage for UniMorph data.
types
Core types for UniMorph data.