Crate wikidump

source ·
Expand description

This crate can process Mediawiki dump (backup) files in XML format and allow you to extract whatever data you desire.

Example

use wikidump::{config, Parser};

let parser = Parser::new().use_config(config::wikipedia::english());
let site = parser
    .parse_file("tests/enwiki-articles-partial.xml")
    .expect("Could not parse wikipedia dump file.");

assert_eq!(site.name, "Wikipedia");
assert_eq!(site.url, "https://en.wikipedia.org/wiki/Main_Page");
assert!(!site.pages.is_empty());

for page in site.pages {
    println!("\nTitle: {}", page.title);

    for revision in page.revisions {
        println!("\t{}", revision.text);
    }
}

Modules

  • Wiki text parsing configurations for Mediawiki sites and languages.c

Structs

  • Represents a wiki page.
  • Represents a specific revision of a page. This means a certain version of the page a specific time with some text contents which was created by some contributor.
  • A parser which can process uncompressed Mediawiki XML dumps (backups).
  • Represents a Mediawiki website, like Wikipedia, for example.