pub struct TextData { /* private fields */ }
Expand description

A structure representing compiled text/matching data.

This is the key structure used to compare two texts against one another. It handles pre-processing the text to n-grams, scoring, and optimizing the result to try to identify specific details about a match.

Examples

Basic scoring of two texts:

use askalono::TextData;

let license = TextData::from("My First License");
let sample = TextData::from("copyright 20xx me irl\n\n //  my   first license");
assert_eq!(sample.match_score(&license), 1.0);

The above example is a perfect match, as identifiable copyright statements are stripped out during pre-processing.

Building on that, TextData is able to tell you where in the text a license is located:

let sample = TextData::from("copyright 20xx me irl\n// My First License\nfn hello() {\n ...");
let (optimized, score) = sample.optimize_bounds(&license);
assert_eq!((1, 2), optimized.lines_view());
assert!(score > 0.99f32, "license within text matches");

Implementations

Create a new TextData structure from a string.

The given text will be normalized, then smashed down into n-grams for matching. By default, the normalized text is stored inside the structure for future diagnostics. This is necessary for optimizing a match and for diffing against other texts. If you don’t want this extra data, you can call without_text throw it out. Generally, as a user of this library you want to keep the text data, but askalono will throw it away in its own Store as it’s not needed.

Consume this TextData, returning one without normalized/processed text stored.

Unless you know you don’t want the text, you probably don’t want to use this. Other methods on TextData require that text is present.

Get the bounds of the active line view.

This represents the “active” region of lines that matches are generated from. The bounds are a 0-indexed (start, end) tuple, with inclusive start and exclusive end indicies. See optimize_bounds.

This is largely for informational purposes; other methods in TextView, such as lines and match_score, will already account for the line range. However, it’s useful to call it after running optimize_bounds to discover where the input text was discovered.

Clone this TextView, creating a copy with the given view.

This will re-generate match data for the given view. It’s used in optimize_bounds to shrink/expand the view of the text to discover bounds.

Other methods on TextView respect this boundary, so it’s not needed outside this struct.

“Erase” the current lines in view and restore the view to its original bounds.

For example, consider a file with two licenses in it. One was identified (and located) with optimize_bounds. Now you want to find the other: white-out the matched lines, and re-run the overall search to find a new high score.

Get a slice of the normalized lines in this TextData.

Compare this TextData with another, returning a similarity score.

This is what’s used during analysis to rank licenses.

Attempt to optimize a known match to locate possible line ranges.

Returns a new TextData struct and a score. The returned struct is a clone of self, with its view set to the best match against other.

This will respect any views set on the TextData (an optimized result won’t go outside the original view).

Note that this won’t be 100% optimal if there are blank lines surrounding the actual match, since successive blank lines in a range will likely have the same score.

You should check the value of lines_view on the returned struct to find the line ranges.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Deserialize this value from the given Serde deserializer. Read more

Converts to this type from the input type.

Converts to this type from the input type.

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.