Expand description

Parse dictionary pages from the German language edition of Wiktionary into structured data.

For general information about Parse Wiktionary, see the readme file.

Examples

This example prints all usage examples found in an article, together with the language and the first part of speech of the entry.

let title = "Ausstellungswohnen";
let wiki_text = concat!(
    "==Ausstellungswohnen ({{Sprache|Deutsch}})==\n",
    "==={{Wortart|Substantiv|Deutsch}}===\n",
    "{{Beispiele}}\n",
    r#":Das Haus sei geeignet bloß zum „Ausstellungswohnen", vermuteten Kritiker, "#,
    "die es meist nur in schwarz-weißen Zeitschriftenfotos betreten hatten."
);
let configuration = parse_wiktionary_de::create_configuration();
let parsed_wiki_text = configuration.parse(wiki_text);
let parsed_article = parse_wiktionary_de::parse(title, wiki_text, &parsed_wiki_text.nodes);
for language_entry in parsed_article.language_entries {
    for pos_entry in language_entry.pos_entries {
        for example in pos_entry.examples {
            println!(
                "The word '{title}' of language {language:?} and part of speech {pos:?} has the example: {example}",
                title = title,
                language = language_entry.language,
                pos = pos_entry.pos,
                example = &example.example.iter().map(|node| match node {
                    parse_wiktionary_de::Flowing::Text { value } => value,
                    _ => ""
                }).collect::<String>()
            );
        }
    }
}

Limitations

Parameters of overview templates are transferred to the output with minimal validation and processing. Due to the wide variety of overview templates that take parameters in highly complicated and inconsistent formats, fully validating and parsing these parameters is not feasible.

The translations in the template Vorlage:Ü-Tabelle in the section Übersetzungen are not parsed. Due to the highly complicated format of translations, it’s better not to even try parsing them than try and get an inconsistent result. Due to the common presence of translation tables that contain empty translations, it’s not even indicated whether an entry has translations.

The templates Ähnlichkeiten 1 and Ähnlichkeiten 2 are not parsed, because it’s unclear what purpose they have and what format their parameters must have.

Structs

Usage example.
Dictionary entry for a single language.
Output of parsing a page.
Information from the overview template in the POS entry.
The entry for a part of speech within the entry for a language.
Warning from the parser telling that something is not well-formed.

Enums

An element in a sequence that allows different kinds of elements.
Identifier for a language.
Part of speech.
Identifier for a kind of warning from the parser.

Functions

Allocates and returns a configuration for Parse Wiki Text suitable for parsing the German language edition of Wiktionary.
Parses an article from the German language version of Wiktionary into structured data.