Expand description
Core library for working with UniMorph morphological data.
This crate provides types, storage, and query capabilities for UniMorph datasets. It is designed for high-performance lookups (conjugation/declension queries) while also supporting bulk export for ML pipelines.
§Quick Start
ⓘ
use unimorph_core::{Repository, Store};
#[tokio::main]
async fn main() -> unimorph_core::Result<()> {
// Initialize repository (handles downloads and caching)
let repo = Repository::new()?;
// Ensure Italian dataset is available
repo.ensure("ita").await?;
// Open the store for queries
let store = repo.store()?;
// Look up all forms of "parlare"
for entry in store.inflect("ita", "parlare")? {
println!("{} -> {} ({})", entry.lemma, entry.form, entry.features);
}
// Reverse lookup: what lemmas produce "parlo"?
for entry in store.analyze("ita", "parlo")? {
println!("{} <- {} ({})", entry.form, entry.lemma, entry.features);
}
Ok(())
}§Architecture
- SQLite backend: All data stored in a single file at
~/.cache/unimorph/datasets.db - Pre-computed stats: Aggregate statistics cached at import time
- Iterator-based queries: Results stream from SQLite, won’t OOM on large datasets
- Parquet export: For users who need DataFrame integration
Re-exports§
pub use error::Error;pub use error::Result;pub use query::QueryBuilder;pub use repository::DownloadPhase;pub use repository::DownloadProgress;pub use repository::Repository;pub use store::Store;pub use types::CompressionFormat;pub use types::DatasetStats;pub use types::Entry;pub use types::FeatureBundle;pub use types::LangCode;pub use types::MalformedEntry;pub use types::ParseReport;