Struct wikidump::Parser

source ·
pub struct Parser { /* private fields */ }
Expand description

A parser which can process uncompressed Mediawiki XML dumps (backups).

Implementations§

source§

impl Parser

source

pub fn new<'c>() -> Parser

Construct a new parser with the default settings.

source

pub fn process_text(self, value: bool) -> Self

Sets whether the parser should process wiki text or leave it as-is. For best results, it is recommended you use a wiki config which matches the website you are parsing from. It may still work otherwise, but the results might be something unexpected.

Wiki text parsing is enabled by default.

See use_config and config.

Example
use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english())
    .process_text(false); // Disable wiki text parsing
source

pub fn exclude_pages(self, value: bool) -> Self

Sets whether the parser should ignore pages in namespaces that are not articles, such as Talk, Special, or User. If enabled, then any page which is not an article will be skipped by the parser.

Excluding pages in these namespaces is enabled by default.

Example
use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english())
    .exclude_pages(false); // Disable page exclusion
source

pub fn remove_newlines(self, value: bool) -> Self

Sets whether the parser should remove newlines or turn them into normal newline characters. This will only have an effect if processing wiki text is enabled.

Removing newlines is turned off by default.

Example
use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english())
    .remove_newlines(true) // Enable newline removal
    .process_text(true);
source

pub fn use_config(self, config_source: ConfigurationSource<'_>) -> Self

Sets the wiki text parser configuration options. For best results of processing wiki text, it is recommended to use the type of configuration that matches the website and language you are processing.

See config.

Example
use wikidump::{Parser, config};

let parser = Parser::new()
    .use_config(config::wikipedia::english());
source

pub fn parse_file<P>(&self, dump: P) -> Result<Site, Box<dyn Error + 'static>>
where P: AsRef<Path>,

Returns all of the parsed data contained in a particular wiki dump file. This includes the name of the website, a list of pages, their respective contents, and other properties.

Example
use wikidump::Parser;

let parser = Parser::new();
let site = parser.parse_file("tests/enwiki-articles-partial.xml");
source

pub fn parse_str(&self, text: &str) -> Result<Site, Box<dyn Error + 'static>>

Returns all of the parsed data contained in a particular wiki dump file. This includes the name of the website, a list of pages, their respective contents, and other properties.

Example
use wikidump::Parser;
use std::fs;

let parser = Parser::new();
let contents = fs::read_to_string("tests/enwiki-articles-partial.xml").unwrap();
let site = parser.parse_str(contents.as_str());

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.