Expand description
Parse dictionary pages from the German language edition of Wiktionary into structured data.
For general information about Parse Wiktionary, see the readme file.
§Examples
This example prints all usage examples found in an article, together with the language and the first part of speech of the entry.
let title = "Ausstellungswohnen";
let wiki_text = concat!(
"==Ausstellungswohnen ({{Sprache|Deutsch}})==\n",
"==={{Wortart|Substantiv|Deutsch}}===\n",
"{{Beispiele}}\n",
r#":Das Haus sei geeignet bloß zum „Ausstellungswohnen", vermuteten Kritiker, "#,
"die es meist nur in schwarz-weißen Zeitschriftenfotos betreten hatten."
);
let configuration = parse_wiktionary_de::create_configuration();
let parsed_wiki_text = configuration.parse(wiki_text);
let parsed_article = parse_wiktionary_de::parse(title, wiki_text, &parsed_wiki_text.nodes);
for language_entry in parsed_article.language_entries {
for pos_entry in language_entry.pos_entries {
for example in pos_entry.examples {
println!(
"The word '{title}' of language {language:?} and part of speech {pos:?} has the example: {example}",
title = title,
language = language_entry.language,
pos = pos_entry.pos,
example = &example.example.iter().map(|node| match node {
parse_wiktionary_de::Flowing::Text { value } => value,
_ => ""
}).collect::<String>()
);
}
}
}
§Limitations
Parameters of overview templates are transferred to the output with minimal validation and processing. Due to the wide variety of overview templates that take parameters in highly complicated and inconsistent formats, fully validating and parsing these parameters is not feasible.
The translations in the template Vorlage:Ü-Tabelle
in the section Übersetzungen
are not parsed. Due to the highly complicated format of translations, it’s better not to even try parsing them than try and get an inconsistent result. Due to the common presence of translation tables that contain empty translations, it’s not even indicated whether an entry has translations.
The templates Ähnlichkeiten 1
and Ähnlichkeiten 2
are not parsed, because it’s unclear what purpose they have and what format their parameters must have.
Structs§
- Example
- Usage example.
- Language
Entry - Dictionary entry for a single language.
- Output
- Output of parsing a page.
- Overview
- Information from the overview template in the POS entry.
- PosEntry
- The entry for a part of speech within the entry for a language.
- Warning
- Warning from the parser telling that something is not well-formed.
Enums§
- Flowing
- An element in a sequence that allows different kinds of elements.
- Language
- Identifier for a language.
- Pos
- Part of speech.
- Warning
Message - Identifier for a kind of warning from the parser.
Functions§
- create_
configuration - Allocates and returns a configuration for Parse Wiki Text suitable for parsing the German language edition of Wiktionary.
- parse
- Parses an article from the German language version of Wiktionary into structured data.