Struct rammer::BagOfWords[][src]

pub struct BagOfWords { /* fields omitted */ }
Expand description

A BagOfWords, also referred to as a bow, is a frequency map of words. Read more about the BagOfWords model here: BagOfWords Wikipedia. BagOfWords works with Unicode Words. Words are defined by as between UAX#29 word boundaries. BagOfWords is serializable using one of the serde serialization crates

use rammer::BagOfWords;
use serde_json;
let singly_trained_bow = BagOfWords::from_file("test_resources/test_data/unicode_and_ascii.txt").expect("File not found");
let big_bow = BagOfWords::from_folder("data/train/ham").expect("Folder not found");
let com_bow = singly_trained_bow.combine(big_bow);

Implementations

Return a new BagOfWords with an empty Frequency Map.

let empty_bow = BagOfWords::new();

Create a BagOfWords from a text file. This file should already be known to be ham or spam. The text file will be the basis of a new HSModel’s Ham/Spam BagOfWords

let spam_bow = BagOfWords::from_file("test_resources/test_data/unicode_and_ascii.txt").unwrap();

Create a BagOfWords from a folder containing either spam training text files, or ham training text files.

let spam_bow = BagOfWords::from_folder("data/train/spam");

Combines two BagOfWords into a new BagOfWords. Freqencies of words found in both bags are additive. This operation is commutative and associative. These properties can be used to dynamically grow your training BagOfWords.

let ham_bow_1 = BagOfWords::from("Hello there world"); // Creates: {HELLO: 1, THERE: 1, WORLD: 1}
let ham_bow_2 = BagOfWords::from("howdy there guy"); // Creates: {HOWDY: 1, THERE: 1, GUY: 1}
let com_bow = ham_bow_1.combine(ham_bow_2); // Combines to: {HELLO: 1, THERE: 2, HOWDY: 1, ...}

Get the sum of all the Counts in a BagOfWords. Used internally for frequency calculations.

ham_bow.total_word_count(); // returns a sum of Counts.

Calculates the Frequency of a word in the BagOfWords by taking count_of_a_word / total_word_count. This will return None, if the word slice passed contains multiple words.

let ham_bow = BagOfWords::from("hello there how are you");
ham_bow.word_frequency("hello"); //returns 0.2
ham_bow.word_frequency("hello there"); //returns None

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Deserialize this value from the given Serde deserializer. Read more

Converts a &str to a bag of words. This to create BagOfWord models, consider using from_file or from_folder instead.

let bow = BagOfWords::from("hello world WOrLD"); // creates {HELLO: 1, WORLD: 2}

Performs the conversion.

Use .collect() over an iterator of BagOfWords to additively combine them with combine

let bow: BagOfWords = vec![
    BagOfWords::from("hi"),
    BagOfWords::new(),
    BagOfWords::from("Big sale!")]
    .into_iter().collect();

Creates a value from an iterator. Read more

Use .collect() over a parallel iterator of BagOfWords to additively combine them with combine use rayon crate to make .into_par_iter() available.

use rayon::prelude::*;
let bow: BagOfWords = vec![
    BagOfWords::from("hi"),
    BagOfWords::new(),
    BagOfWords::from("Big sale!")]
    .into_par_iter().collect();

Creates an instance of the collection from the parallel iterator par_iter. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

recently added

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.