pub struct Query<'a> {
pub text: String,
pub tokens: Vec<TokenId>,
pub line_by_pos: Vec<usize>,
pub unknowns_by_pos: HashMap<Option<i32>, usize>,
pub stopwords_by_pos: HashMap<Option<i32>, usize>,
pub shorts_and_digits_pos: HashSet<usize>,
pub high_matchables: BitSet,
pub low_matchables: BitSet,
pub is_binary: bool,
pub spdx_lines: Vec<(String, usize, usize)>,
pub index: &'a LicenseIndex,
/* private fields */
}Expand description
Query holds:
- Known token IDs (tokens existing in the index dictionary)
- Token positions and their corresponding line numbers (line_by_pos)
- Unknown tokens (tokens not in dictionary) tracked per position
- Stopwords tracked per position
- Positions with short/digit-only tokens
- High and low matchable token positions (for tracking what’s been matched)
Based on Python Query class at: reference/scancode-toolkit/src/licensedcode/query.py (lines 155-295)
Fields§
§text: StringThe original input text.
Corresponds to Python: self.query_string (line 215)
tokens: Vec<TokenId>Token IDs for known tokens (tokens found in the index dictionary)
Corresponds to Python: self.tokens = [] (line 228)
line_by_pos: Vec<usize>Mapping from token position to line number (1-based)
Each token position in self.tokens maps to the line number where it appears.
This is used for match position reporting.
Corresponds to Python: self.line_by_pos = [] (line 231)
unknowns_by_pos: HashMap<Option<i32>, usize>Mapping from token position to count of unknown tokens after that position
Unknown tokens are those not found in the dictionary. We track them by
counting how many unknown tokens appear after each known position.
Unknown tokens before the first known token are tracked at position -1
(using the key None in Rust).
Corresponds to Python: self.unknowns_by_pos = {} (line 236)
stopwords_by_pos: HashMap<Option<i32>, usize>Mapping from token position to count of stopwords after that position
Similar to unknown_tokens, but for stopwords.
Corresponds to Python: self.stopwords_by_pos = {} (line 244)
shorts_and_digits_pos: HashSet<usize>Set of positions with single-character or digit-only tokens
These tokens have special handling in matching.
Corresponds to Python: self.shorts_and_digits_pos = set() (line 249)
high_matchables: BitSetHigh-value matchable token positions (legalese tokens)
These are tokens with ID < len_legalese.
Corresponds to Python: self.high_matchables (line 293)
low_matchables: BitSetLow-value matchable token positions (non-legalese tokens)
These are tokens with ID >= len_legalese.
Corresponds to Python: self.low_matchables (line 294)
is_binary: boolTrue if the query is detected as binary content
Corresponds to Python: self.is_binary = False (line 225)
spdx_lines: Vec<(String, usize, usize)>SPDX-License-Identifier lines found during tokenization.
Each tuple is (spdx_text, start_token_pos, end_token_pos). Used for creating LicenseMatches with correct token positions.
Corresponds to Python: self.spdx_lines = [] (line 507)
index: &'a LicenseIndexReference to the license index for dictionary access and metadata
Implementations§
Source§impl<'a> Query<'a>
impl<'a> Query<'a>
pub fn from_extracted_text( text: &str, index: &'a LicenseIndex, binary_derived: bool, ) -> Result<Self, Error>
Sourcepub fn query_runs(&self) -> Vec<QueryRun<'_>>
pub fn query_runs(&self) -> Vec<QueryRun<'_>>
Iterate over query runs.
Corresponds to Python: query.query_runs property iteration
Sourcepub fn line_for_pos(&self, pos: usize) -> Option<usize>
pub fn line_for_pos(&self, pos: usize) -> Option<usize>
Sourcepub fn whole_query_run(&self) -> QueryRun<'a>
pub fn whole_query_run(&self) -> QueryRun<'a>
Get a query run covering the entire query.
Corresponds to Python: whole_query_run() method (lines 306-317)
Sourcepub fn subtract(&mut self, span: &PositionSpan)
pub fn subtract(&mut self, span: &PositionSpan)
Subtract matched span positions from matchables.
This removes the positions from both high and low matchables.
§Arguments
span- The span of positions to subtract
Corresponds to Python: subtract() method (lines 328-334)
Sourcepub fn matched_text(&self, start_line: usize, end_line: usize) -> String
pub fn matched_text(&self, start_line: usize, end_line: usize) -> String
Extract matched text for a given line range.
Returns the text from the original input between start_line and end_line (both inclusive, 1-indexed).
§Arguments
start_line- Starting line number (1-indexed)end_line- Ending line number (1-indexed)
§Returns
The matched text, or empty string if lines are out of range
Corresponds to Python: matched_text() method in match.py (lines 757-795)
Trait Implementations§
Auto Trait Implementations§
impl<'a> Freeze for Query<'a>
impl<'a> RefUnwindSafe for Query<'a>
impl<'a> Send for Query<'a>
impl<'a> Sync for Query<'a>
impl<'a> Unpin for Query<'a>
impl<'a> UnsafeUnpin for Query<'a>
impl<'a> UnwindSafe for Query<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more