[][src]Struct wikidump::Parser

pub struct Parser { /* fields omitted */ }

A parser which can process uncompressed Mediawiki XML dumps (backups).

Methods

impl Parser[src]

pub fn new<'c>() -> Parser[src]

Construct a new parser with the default settings.

pub fn process_text(self, value: bool) -> Self[src]

Sets whether the parser should process wiki text or leave it as-is. For best results, it is recommended you use a wiki config which matches the website you are parsing from. It may still work otherwise, but the results might be something unexpected.

Wiki text parsing is enabled by default.

See use_config and config.

Example

use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english())
    .process_text(false); // Disable wiki text parsing

pub fn exclude_pages(self, value: bool) -> Self[src]

Sets whether the parser should ignore pages in namespaces that are not articles, such as Talk, Special, or User. If enabled, then any page which is not an article will be skipped by the parser.

Excluding pages in these namespaces is enabled by default.

Example

use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english())
    .exclude_pages(false); // Disable page exclusion

pub fn remove_newlines(self, value: bool) -> Self[src]

Sets whether the parser should remove newlines or turn them into normal newline characters. This will only have an effect if processing wiki text is enabled.

Removing newlines is turned off by default.

Example

use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english())
    .remove_newlines(true) // Enable newline removal
    .process_text(true);

pub fn use_config(self, config_source: ConfigurationSource) -> Self[src]

Sets the wiki text parser configuration options. For best results of processing wiki text, it is recommended to use the type of configuration that matches the website and language you are processing.

See config.

Example

use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english());

pub fn parse_file<P>(&self, dump: P) -> Result<Site, Box<dyn Error + 'static>> where
    P: AsRef<Path>, 
[src]

Returns all of the parsed data contained in a particular wiki dump file. This includes the name of the website, a list of pages, their respective contents, and other properties.

Example

use wikidump::Parser;

let parser = Parser::new();
let site = parser.parse_file("tests/enwiki-articles-partial.xml");

pub fn parse_str(&self, text: &str) -> Result<Site, Box<dyn Error + 'static>>[src]

Returns all of the parsed data contained in a particular wiki dump file. This includes the name of the website, a list of pages, their respective contents, and other properties.

Example

use wikidump::Parser;
use std::fs;

let parser = Parser::new();
let contents = fs::read_to_string("tests/enwiki-articles-partial.xml").unwrap();
let site = parser.parse_str(contents.as_str());

Auto Trait Implementations

impl Send for Parser

impl Unpin for Parser

impl Sync for Parser

impl UnwindSafe for Parser

impl RefUnwindSafe for Parser

Blanket Implementations

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T> From<T> for T[src]

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> Any for T where
    T: 'static + ?Sized
[src]