Struct askalono::TextData [] [src]

pub struct TextData { /* fields omitted */ }

A structure representing compiled text/matching data.

This is the key structure used to compare two texts against one another. It handles pre-processing the text to n-grams, scoring, and optimizing the result to try to identify specific details about a match.

Examples

Basic scoring of two texts:

use askalono::TextData;

let license = TextData::from("My First License");
let sample = TextData::from(
  "copyright 20xx me irl\n\n //  my   first license"
);
assert_eq!(sample.match_score(&license), 1.0);

The above example is a perfect match, as identifiable copyright statements are stripped out during pre-processing.

Building on that, TextData is able to tell you where in the text a license is located:

let sample = TextData::from(
  "copyright 20xx me irl\n// My First License\nfn hello() {\n ..."
);
let (optimized, score) = sample.optimize_bounds(&license);
assert_eq!((1, 2), optimized.lines_view());
assert!(score > 0.99f32, "license within text matches");

Methods

impl TextData
[src]

[src]

Create a new TextData structure from a string.

The given text will be normalized, then smashed down into n-grams for matching. By default, the normalized text is stored inside the structure for future diagnostics. This is necessary for optimizing a match and for diffing against other texts. If you don't want this extra data, you can call without_text throw it out. Generally, as a user of this library you want to keep the text data, but askalono will throw it away in its own Store as it's not needed.

[src]

Consume this TextData, returning one without normalized/processed text stored.

Unless you know you don't want the text, you probably don't want to use this. Other methods on TextData require that text is present.

[src]

Get the bounds of the active line view.

This represents the "active" region of lines that matches are generated from. The bounds are a 0-indexed (start, end) tuple, with inclusive indices (line numbers). See optimize_bounds.

This is largely for informational purposes; other methods in TextView, such as lines and match_score, will already account for the line range. However, it's useful to call it after running optimize_bounds to discover where the input text was discovered.

[src]

Get a slice of the normalized lines in this TextData.

If the text was discarded with without_text, this returns None.

[src]

Compare this TextData with another, returning a similarity score.

This is what's used during analysis to rank licenses.

[src]

Attempt to optimize a known match to locate possible line ranges.

Returns a new TextData struct and a score. The returned struct is a clone of self, with its view set to the best match against other.

Note that this won't be 100% optimal if there are blank lines surrounding the actual match, since successive blank lines in a range will likely have the same score.

You should check the value of lines_view on the returned struct to find the line ranges.

Trait Implementations

impl Debug for TextData
[src]

[src]

Formats the value using the given formatter. Read more

impl<'a> From<&'a str> for TextData
[src]

[src]

Performs the conversion.

impl<'a> From<String> for TextData
[src]

[src]

Performs the conversion.

Auto Trait Implementations

impl Send for TextData

impl Sync for TextData