Skip to main content

Dataset

Struct Dataset 

Source
pub struct Dataset {
    pub data: Vec<Vec<f32>>,
    pub labels: Vec<u8>,
    pub features: Vec<Bytes>,
    pub ftype: FileType,
}
Expand description

A dataset contains data for training or inference, training requires labels

Fields§

§data: Vec<Vec<f32>>

Data used for training a model or calculating predictions

§labels: Vec<u8>

Data labels, can be empty if only used for inference

§features: Vec<Bytes>

N-gram byte features

§ftype: FileType

The type of file represented

Implementations§

Source§

impl Dataset

Source

pub fn load<P: AsRef<Path>>(path: P) -> Result<Dataset>

Load a file

§Errors

An error results if the file type can’t be determined, is incorrectly determined, or if the file isn’t a supported format.

Source

pub fn from_csv_file<P: AsRef<Path>>( path: P, data_length: usize, ) -> Result<Self>

Create a dataset struct from a CSV file

§Errors

Returns an error if:

  • The file can’t be read
  • The data contained isn’t numeric
  • Feature data is missing
  • The expected amount of data isn’t encountered
Source

pub fn from_csv_file_assume_data_length<P: AsRef<Path>>(path: P) -> Result<Self>

Create a dataset struct from a CSV file

§Errors

Returns an error if:

  • The file can’t be read
  • The data contained isn’t a float
  • Feature data is missing
  • The amount of columns can’t be determined
Source

pub fn from_csv_string(contents: &str, data_length: usize) -> Result<Self>

Create a dataset struct from a CSV string

§Errors

Returns an error if:

  • The data contained isn’t numeric
  • Feature data is missing
  • The expected amount of data isn’t encountered
Source

pub fn from_arff_file<P: AsRef<Path>>(path: P) -> Result<Self>

Create a dataset struct from an ARFF string

§Errors

Returns an error if:

  • The file can’t be read
  • The data contained isn’t numeric
  • Feature data is missing
  • The expected amount of data isn’t encountered
Source

pub fn from_arff_string(contents: &str) -> Result<Self>

Create a dataset struct from an ARFF string

§Errors

Returns an error if:

  • The data contained isn’t numeric
  • Feature data is missing
  • The expected amount of data isn’t encountered
Source

pub fn from_libsvm_file<P: AsRef<Path>>(path: P) -> Result<Self>

Create a dataset struct from a libsvm file

§Errors

Returns an error if:

  • The file can’t be read
  • Feature data is missing
  • The data isn’t in the expected format
  • The expected amount of data isn’t encountered
Source

pub fn from_libsvm_string(contents: &str) -> Result<Self>

Create a dataset from a libsvm string

§Errors

Returns an error if the file doesn’t contain the expected format or is missing features

Source

pub fn create_save_from_benign_malicious_files_and_ngrams<P: AsRef<Path>>( malicious_dir: P, benign_dir: P, ngrams_file: P, output_file: P, ) -> Result<()>

Given paths to malicious files, benign files, and n-grams (features), get a Dataset object.

§Errors

This will fail if:

  • The directories for benign or malicious files don’t exist or are empty.
  • The n-gram feature file doesn’t exist, is empty, or doesn’t have hexidecimal-encoded features
Source

pub fn save_csv<P: AsRef<Path>>(&self, path: P) -> Result<()>

Save a dataset as a CSV

§Errors

An error will result if the file can’t be opened for writing

Source

pub fn save_arff<P: AsRef<Path>>(&self, path: P) -> Result<()>

Save a dataset as an ARFF file

§Errors

An error will result if the file can’t be opened for writing

Source

pub fn save_libsvm<P: AsRef<Path>>(&self, path: P) -> Result<()>

Save a dataset as a libsvm file

§Errors

An error will result if the file can’t be opened for writing

Source

pub fn save<P: AsRef<Path>>(&self, path: P) -> Result<()>

Save the dataset using the file extension to determine data format

§Errors

There’s an error if the file can’t be written or if the format can’t be determined

Source

pub fn len(&self) -> usize

Return dataset size

Source

pub fn is_empty(&self) -> bool

Indicate if the dataset is empty

Source

pub fn validate(&self) -> bool

Ensure the dataset is valid

  • Same size data columns
  • If present, the amount of data rows equals the amount of labels
Source

pub fn shuffle(&mut self)

Shuffle the data, using roughly 10 X log10(size). So 10 records = 10 iterations, 1,000 records gets 30 iterations

Source

pub fn shuffle_iterations(&mut self, iterations: u32)

Shuffle the data with a specified amount of iterations, ensures that the labels are swapped with the data, if present

Source

pub fn split(&mut self, ratio: f32) -> Self

Split the dataset, ideally into train/test datasets. The ratio indicates how much data is kept, the remaining size is shed and returned.

Source

pub fn reduce(&mut self, model: &LogisticRegression) -> Result<Vec<usize>>

The model training allows for the algorithm to not only train a model but determine the features most useful for determining benign vs. malicious. This action removes the features deemed unneeded.

§Errors

If the model would remove all features, an error is returned as an empty dataset isn’t useful, and it’s instead likely the modal and dataset weren’t for the same data collection.

Source

pub fn column_iter(&self, index: usize) -> Option<ColumnIterator<'_>>

Returns an iterator over a column

Trait Implementations§

Source§

impl Clone for Dataset

Source§

fn clone(&self) -> Dataset

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Dataset

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for Dataset

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for Dataset

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for Dataset

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,