Struct askalono::TextData
[−]
[src]
pub struct TextData { /* fields omitted */ }
A structure representing compiled text/matching data.
This is the key structure used to compare two texts against one another. It handles pre-processing the text to n-grams, scoring, and optimizing the result to try to identify specific details about a match.
Examples
Basic scoring of two texts:
use askalono::TextData; let license = TextData::from("My First License"); let sample = TextData::from( "copyright 20xx me irl\n\n // my first license" ); assert_eq!(sample.match_score(&license), 1.0);
The above example is a perfect match, as identifiable copyright statements are stripped out during pre-processing.
Building on that, TextData is able to tell you where in the text a license is located:
let sample = TextData::from( "copyright 20xx me irl\n// My First License\nfn hello() {\n ..." ); let (optimized, score) = sample.optimize_bounds(&license); assert_eq!((1, 2), optimized.lines_view()); assert!(score > 0.99f32, "license within text matches");
Methods
impl TextData
[src]
pub fn new(text: &str) -> TextData
[src]
Create a new TextData structure from a string.
The given text will be normalized, then smashed down into n-grams for
matching. By default, the normalized text is stored inside the
structure for future diagnostics. This is necessary for optimizing a
match and for diffing against other texts. If you don't want this extra
data, you can call without_text
throw it out. Generally, as a user of
this library you want to keep the text data, but askalono will throw it
away in its own Store
as it's not needed.
pub fn without_text(self) -> Self
[src]
Consume this TextData
, returning one without normalized/processed
text stored.
Unless you know you don't want the text, you probably don't want to use
this. Other methods on TextData
require that text is present.
pub fn lines_view(&self) -> (usize, usize)
[src]
Get the bounds of the active line view.
This represents the "active" region of lines that matches are generated
from. The bounds are a 0-indexed (start, end)
tuple, with inclusive
indices (line numbers). See optimize_bounds
.
This is largely for informational purposes; other methods in
TextView
, such as lines
and match_score
, will already account for
the line range. However, it's useful to call it after running
optimize_bounds
to discover where the input text was discovered.
pub fn lines(&self) -> Option<&[String]>
[src]
Get a slice of the normalized lines in this TextData
.
If the text was discarded with without_text
, this returns None
.
pub fn match_score(&self, other: &TextData) -> f32
[src]
Compare this TextData
with another, returning a similarity score.
This is what's used during analysis to rank licenses.
pub fn optimize_bounds(&self, other: &TextData) -> (Self, f32)
[src]
Attempt to optimize a known match to locate possible line ranges.
Returns a new TextData
struct and a score. The returned struct is a
clone of self
, with its view set to the best match against other
.
Note that this won't be 100% optimal if there are blank lines surrounding the actual match, since successive blank lines in a range will likely have the same score.
You should check the value of lines_view
on the returned struct to
find the line ranges.
Trait Implementations
impl Debug for TextData
[src]
fn fmt(&self, __arg_0: &mut Formatter) -> Result
[src]
Formats the value using the given formatter. Read more