Crate parse_wiktionary_de
source ·Expand description
Parse dictionary pages from the German language edition of Wiktionary into structured data.
For general information about Parse Wiktionary, see the readme file.
Examples
This example prints all usage examples found in an article, together with the language and the first part of speech of the entry.
let title = "Ausstellungswohnen";
let wiki_text = concat!(
"==Ausstellungswohnen ({{Sprache|Deutsch}})==\n",
"==={{Wortart|Substantiv|Deutsch}}===\n",
"{{Beispiele}}\n",
r#":Das Haus sei geeignet bloß zum „Ausstellungswohnen", vermuteten Kritiker, "#,
"die es meist nur in schwarz-weißen Zeitschriftenfotos betreten hatten."
);
let configuration = parse_wiktionary_de::create_configuration();
let parsed_wiki_text = configuration.parse(wiki_text);
let parsed_article = parse_wiktionary_de::parse(title, wiki_text, &parsed_wiki_text.nodes);
for language_entry in parsed_article.language_entries {
for pos_entry in language_entry.pos_entries {
for example in pos_entry.examples {
println!(
"The word '{title}' of language {language:?} and part of speech {pos:?} has the example: {example}",
title = title,
language = language_entry.language,
pos = pos_entry.pos,
example = &example.example.iter().map(|node| match node {
parse_wiktionary_de::Flowing::Text { value } => value,
_ => ""
}).collect::<String>()
);
}
}
}
Limitations
Parameters of overview templates are transferred to the output with minimal validation and processing. Due to the wide variety of overview templates that take parameters in highly complicated and inconsistent formats, fully validating and parsing these parameters is not feasible.
The translations in the template Vorlage:Ü-Tabelle
in the section Übersetzungen
are not parsed. Due to the highly complicated format of translations, it’s better not to even try parsing them than try and get an inconsistent result. Due to the common presence of translation tables that contain empty translations, it’s not even indicated whether an entry has translations.
The templates Ähnlichkeiten 1
and Ähnlichkeiten 2
are not parsed, because it’s unclear what purpose they have and what format their parameters must have.