Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
udpipe-rs
Rust bindings for UDPipe — a trainable pipeline for tokenization, tagging, lemmatization, and dependency parsing using Universal Dependencies.
Features
- Full parsing pipeline: Tokenization, POS tagging, lemmatization, and dependency parsing
- Universal Dependencies: Output follows the UD annotation scheme
- Model download utility: Easy download of pre-trained models for 65+ languages (optional)
- Thread-safe: Models can be shared across threads
Installation
Add to your Cargo.toml:
[]
= "0.1"
Or install via command line:
Usage
Download and load a model
use ;
Output:
1 The DET the 2 <- det
2 quick ADJ quick 5 <- amod
3 brown ADJ brown 5 <- amod
4 fox NOUN fox 5 <- nsubj
5 jumps VERB jump 0 <- root
6 over ADP over 9 <- case
7 the DET the 9 <- det
8 lazy ADJ lazy 9 <- amod
9 dog NOUN dog 5 <- obl
10 . PUNCT . 5 <- punct
Available languages
Pre-trained models are available for 65+ languages. Use udpipe_rs::AVAILABLE_MODELS to see the full list:
// Some examples:
// "english-ewt", "english-gum", "english-lines", "english-partut"
// "german-gsd", "german-hdt"
// "french-gsd", "french-sequoia", "french-spoken"
// "spanish-ancora", "spanish-gsd"
// "dutch-alpino", "dutch-lassysmall"
// "chinese-gsd", "japanese-gsd", "korean-gsd"
// ... and many more
for lang in AVAILABLE_MODELS
Working with morphological features
use Model;
Working with sentence structure
use Model;
Download from custom URL
If you need to download from a different source:
use download_model_from_url;
download_model_from_url.expect;
API Reference
Word struct
Each parsed word contains:
| Field | Type | Description |
|---|---|---|
form |
String |
The surface form (actual text) |
lemma |
String |
The lemma (dictionary form) |
upostag |
String |
Universal POS tag (NOUN, VERB, ADJ, etc.) |
xpostag |
String |
Language-specific POS tag |
feats |
String |
Morphological features (e.g., "Mood=Imp|VerbForm=Fin") |
deprel |
String |
Dependency relation (root, nsubj, obj, etc.) |
misc |
String |
Miscellaneous annotations (e.g., "SpaceAfter=No") |
id |
i32 |
1-based index of this word within its sentence |
head |
i32 |
Index of head word (0 = root of sentence) |
sentence_id |
i32 |
0-based index of the sentence this word belongs to |
Helper methods on Word
has_feature(key, value)— Check if a morphological feature is presentget_feature(key)— Get the value of a morphological featureis_verb()— Returns true for VERB or AUX tagsis_noun()— Returns true for NOUN or PROPN tagsis_adjective()— Returns true for ADJ tagis_punct()— Returns true for PUNCT tagis_root()— Returns true if this word is the sentence rootspace_after()— Returns true if there's a space after this word (default)
Examples
# Download a model
# Parse text
Models
Pre-trained models for 100+ treebanks are available from the LINDAT/CLARIAH-CZ repository. The download_model function fetches from this repository automatically.
Build requirements
- C++ compiler with C++11 support
The build script automatically downloads the UDPipe source code and compiles it as a static library. No external tools are required.
License
This crate is dual-licensed under MIT OR Apache-2.0.
UDPipe itself is licensed under the Mozilla Public License 2.0.