pub struct Parser { /* private fields */ }
Expand description
A parser which can process uncompressed Mediawiki XML dumps (backups).
Implementations§
source§impl Parser
impl Parser
sourcepub fn process_text(self, value: bool) -> Self
pub fn process_text(self, value: bool) -> Self
Sets whether the parser should process wiki text or leave it as-is. For best results, it is recommended you use a wiki config which matches the website you are parsing from. It may still work otherwise, but the results might be something unexpected.
Wiki text parsing is enabled by default.
See use_config and config.
Example
use wikidump::{Parser, config};
let parser = Parser::new()
.use_config(config::wikipedia::english())
.process_text(false); // Disable wiki text parsing
sourcepub fn exclude_pages(self, value: bool) -> Self
pub fn exclude_pages(self, value: bool) -> Self
Sets whether the parser should ignore pages in namespaces that are not articles, such as Talk, Special, or User. If enabled, then any page which is not an article will be skipped by the parser.
Excluding pages in these namespaces is enabled by default.
Example
use wikidump::{Parser, config};
let parser = Parser::new()
.use_config(config::wikipedia::english())
.exclude_pages(false); // Disable page exclusion
sourcepub fn remove_newlines(self, value: bool) -> Self
pub fn remove_newlines(self, value: bool) -> Self
Sets whether the parser should remove newlines or turn them into normal newline characters. This will only have an effect if processing wiki text is enabled.
Removing newlines is turned off by default.
Example
use wikidump::{Parser, config};
let parser = Parser::new()
.use_config(config::wikipedia::english())
.remove_newlines(true) // Enable newline removal
.process_text(true);
sourcepub fn use_config(self, config_source: ConfigurationSource<'_>) -> Self
pub fn use_config(self, config_source: ConfigurationSource<'_>) -> Self
Sets the wiki text parser configuration options. For best results of processing wiki text, it is recommended to use the type of configuration that matches the website and language you are processing.
See config.
Example
use wikidump::{Parser, config};
let parser = Parser::new()
.use_config(config::wikipedia::english());
sourcepub fn parse_file<P>(&self, dump: P) -> Result<Site, Box<dyn Error + 'static>>
pub fn parse_file<P>(&self, dump: P) -> Result<Site, Box<dyn Error + 'static>>
Returns all of the parsed data contained in a particular wiki dump file. This includes the name of the website, a list of pages, their respective contents, and other properties.
Example
use wikidump::Parser;
let parser = Parser::new();
let site = parser.parse_file("tests/enwiki-articles-partial.xml");
sourcepub fn parse_str(&self, text: &str) -> Result<Site, Box<dyn Error + 'static>>
pub fn parse_str(&self, text: &str) -> Result<Site, Box<dyn Error + 'static>>
Returns all of the parsed data contained in a particular wiki dump file. This includes the name of the website, a list of pages, their respective contents, and other properties.
Example
use wikidump::Parser;
use std::fs;
let parser = Parser::new();
let contents = fs::read_to_string("tests/enwiki-articles-partial.xml").unwrap();
let site = parser.parse_str(contents.as_str());