Expand description
A library for parsing the CoNNL-U format.
§Basic Usage
Parse a sentence in CoNNL-U format and iterate over the
containing Token
elements.
Example taken from CoNLL-U format description.
use rs_conllu::{parse_sentence, TokenID};
let s = "# sent_id = 1
# text = They buy and sell books.
1 They they PRON PRP Case=Nom|Number=Plur 2 nsubj 2:nsubj|4:nsubj _
2 buy buy VERB VBP Number=Plur|Person=3|Tense=Pres 0 root 0:root _
3 and and CCONJ CC _ 4 cc 4:cc _
4 sell sell VERB VBP Number=Plur|Person=4|Tense=Pres 2 conj 0:root|2:conj _
6 books book NOUN NNS Number=Plur 2 obj 2:obj|4:obj SpaceAfter=No
7 . . PUNCT . _ 2 punct 2:punct _
";
let sentence = parse_sentence(s).unwrap();
let mut token_iter = sentence.into_iter();
assert_eq!(token_iter.next().unwrap().id, TokenID::Single(1));
assert_eq!(token_iter.next().unwrap().form, "buy".to_owned());
Re-exports§
pub use crate::parsers::parse_file;
pub use crate::parsers::parse_sentence;
pub use crate::parsers::parse_token;
Modules§
Structs§
Enums§
- The set of Universal POS tags according to UD version 2.