Struct Query

Source

pub struct Query<'a> {
    pub text: String,
    pub tokens: Vec<TokenId>,
    pub line_by_pos: Vec<usize>,
    pub unknowns_by_pos: HashMap<Option<usize>, usize>,
    pub stopwords_by_pos: HashMap<Option<usize>, usize>,
    pub shorts_and_digits_pos: PositionSet,
    pub high_matchables: PositionSet,
    pub low_matchables: PositionSet,
    pub is_binary: bool,
    pub spdx_lines: Vec<(String, usize, usize)>,
    pub index: &'a LicenseIndex,
    /* private fields */
}

Expand description

Query holds:

Known token IDs (tokens existing in the index dictionary)
Token positions and their corresponding line numbers (line_by_pos)
Unknown tokens (tokens not in dictionary) tracked per position
Stopwords tracked per position
Positions with short/digit-only tokens
High and low matchable token positions (for tracking what’s been matched)

Based on Python Query class at: reference/scancode-toolkit/src/licensedcode/query.py (lines 155-295)

Fields§

§text: String

The original input text.

Corresponds to Python: self.query_string (line 215)

§tokens: Vec<TokenId>

Token IDs for known tokens (tokens found in the index dictionary)

Corresponds to Python: self.tokens = [] (line 228)

§line_by_pos: Vec<usize>

Mapping from token position to line number (1-based)

Each token position in self.tokens maps to the line number where it appears. This is used for match position reporting.

Corresponds to Python: self.line_by_pos = [] (line 231)

§unknowns_by_pos: HashMap<Option<usize>, usize>

Mapping from token position to count of unknown tokens after that position

Unknown tokens are those not found in the dictionary. We track them by counting how many unknown tokens appear after each known position. Unknown tokens before the first known token are tracked with the key None.

Corresponds to Python: self.unknowns_by_pos = {} (line 236)

§stopwords_by_pos: HashMap<Option<usize>, usize>

Mapping from token position to count of stopwords after that position

Similar to unknown_tokens, but for stopwords.

Corresponds to Python: self.stopwords_by_pos = {} (line 244)

§shorts_and_digits_pos: PositionSet

Set of positions with single-character or digit-only tokens

These tokens have special handling in matching.

Corresponds to Python: self.shorts_and_digits_pos = set() (line 249)

§high_matchables: PositionSet

High-value matchable token positions (legalese tokens)

These are tokens with ID < len_legalese.

Corresponds to Python: self.high_matchables (line 293)

§low_matchables: PositionSet

Low-value matchable token positions (non-legalese tokens)

These are tokens with ID >= len_legalese.

Corresponds to Python: self.low_matchables (line 294)

§is_binary: bool

True if the query is detected as binary content

Corresponds to Python: self.is_binary = False (line 225)

§spdx_lines: Vec<(String, usize, usize)>

SPDX-License-Identifier lines found during tokenization.

Each tuple is (spdx_text, start_token_pos, end_token_pos). Used for creating LicenseMatches with correct token positions.

Corresponds to Python: self.spdx_lines = [] (line 507)

§index: &'a LicenseIndex

Reference to the license index for dictionary access and metadata

Implementations§

Source §

impl<'a> Query<'a>

Source

pub fn query_runs(&self) -> Vec<QueryRun<'_>>

Iterate over query runs.

Corresponds to Python: query.query_runs property iteration

Source

pub fn line_for_pos(&self, pos: usize) -> Option<usize>

Get the length of the query in tokens.

Get the line number for a token position.

§Arguments

pos - The token position

§Returns

The line number (1-based)

Source

pub fn is_empty(&self) -> bool

Check if the query is empty (no known tokens).

Source

pub fn whole_query_run(&self) -> QueryRun<'a>

Get a query run covering the entire query.

Corresponds to Python: whole_query_run() method (lines 306-317)

Source

pub fn subtract(&mut self, span: &PositionSpan)

Subtract matched span positions from matchables.

This removes the positions from both high and low matchables.

§Arguments

span - The span of positions to subtract

Corresponds to Python: subtract() method (lines 328-334)

Source

pub fn matched_text(&self, start_line: usize, end_line: usize) -> String

Extract matched text for a given line range.

Returns the text from the original input between start_line and end_line (both inclusive, 1-indexed).

§Arguments

start_line - Starting line number (1-indexed)
end_line - Ending line number (1-indexed)

§Returns

The matched text, or empty string if lines are out of range

Corresponds to Python: matched_text() method in match.py (lines 757-795)

Trait Implementations§

Source §

impl<'a> Debug for Query<'a>

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'a> UnwindSafe for Query<'a>

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> ArchivePointee for T

Source §

type ArchivedMetadata = ()

The archived version of the pointer metadata for this type.

Source §

fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata

Converts some archived metadata to the pointer metadata for itself.

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §