Skip to main content

Query

Struct Query 

Source
pub struct Query<'a> {
    pub text: String,
    pub tokens: Vec<TokenId>,
    pub line_by_pos: Vec<usize>,
    pub unknowns_by_pos: HashMap<Option<i32>, usize>,
    pub stopwords_by_pos: HashMap<Option<i32>, usize>,
    pub shorts_and_digits_pos: HashSet<usize>,
    pub high_matchables: BitSet,
    pub low_matchables: BitSet,
    pub is_binary: bool,
    pub spdx_lines: Vec<(String, usize, usize)>,
    pub index: &'a LicenseIndex,
    /* private fields */
}
Expand description

Query holds:

  • Known token IDs (tokens existing in the index dictionary)
  • Token positions and their corresponding line numbers (line_by_pos)
  • Unknown tokens (tokens not in dictionary) tracked per position
  • Stopwords tracked per position
  • Positions with short/digit-only tokens
  • High and low matchable token positions (for tracking what’s been matched)

Based on Python Query class at: reference/scancode-toolkit/src/licensedcode/query.py (lines 155-295)

Fields§

§text: String

The original input text.

Corresponds to Python: self.query_string (line 215)

§tokens: Vec<TokenId>

Token IDs for known tokens (tokens found in the index dictionary)

Corresponds to Python: self.tokens = [] (line 228)

§line_by_pos: Vec<usize>

Mapping from token position to line number (1-based)

Each token position in self.tokens maps to the line number where it appears. This is used for match position reporting.

Corresponds to Python: self.line_by_pos = [] (line 231)

§unknowns_by_pos: HashMap<Option<i32>, usize>

Mapping from token position to count of unknown tokens after that position

Unknown tokens are those not found in the dictionary. We track them by counting how many unknown tokens appear after each known position. Unknown tokens before the first known token are tracked at position -1 (using the key None in Rust).

Corresponds to Python: self.unknowns_by_pos = {} (line 236)

§stopwords_by_pos: HashMap<Option<i32>, usize>

Mapping from token position to count of stopwords after that position

Similar to unknown_tokens, but for stopwords.

Corresponds to Python: self.stopwords_by_pos = {} (line 244)

§shorts_and_digits_pos: HashSet<usize>

Set of positions with single-character or digit-only tokens

These tokens have special handling in matching.

Corresponds to Python: self.shorts_and_digits_pos = set() (line 249)

§high_matchables: BitSet

High-value matchable token positions (legalese tokens)

These are tokens with ID < len_legalese.

Corresponds to Python: self.high_matchables (line 293)

§low_matchables: BitSet

Low-value matchable token positions (non-legalese tokens)

These are tokens with ID >= len_legalese.

Corresponds to Python: self.low_matchables (line 294)

§is_binary: bool

True if the query is detected as binary content

Corresponds to Python: self.is_binary = False (line 225)

§spdx_lines: Vec<(String, usize, usize)>

SPDX-License-Identifier lines found during tokenization.

Each tuple is (spdx_text, start_token_pos, end_token_pos). Used for creating LicenseMatches with correct token positions.

Corresponds to Python: self.spdx_lines = [] (line 507)

§index: &'a LicenseIndex

Reference to the license index for dictionary access and metadata

Implementations§

Source§

impl<'a> Query<'a>

Source

pub fn from_extracted_text( text: &str, index: &'a LicenseIndex, binary_derived: bool, ) -> Result<Self, Error>

Source

pub fn query_runs(&self) -> Vec<QueryRun<'_>>

Iterate over query runs.

Corresponds to Python: query.query_runs property iteration

Source

pub fn line_for_pos(&self, pos: usize) -> Option<usize>

Get the length of the query in tokens.

Get the line number for a token position.

§Arguments
  • pos - The token position
§Returns

The line number (1-based)

Source

pub fn is_empty(&self) -> bool

Check if the query is empty (no known tokens).

Source

pub fn whole_query_run(&self) -> QueryRun<'a>

Get a query run covering the entire query.

Corresponds to Python: whole_query_run() method (lines 306-317)

Source

pub fn subtract(&mut self, span: &PositionSpan)

Subtract matched span positions from matchables.

This removes the positions from both high and low matchables.

§Arguments
  • span - The span of positions to subtract

Corresponds to Python: subtract() method (lines 328-334)

Source

pub fn matched_text(&self, start_line: usize, end_line: usize) -> String

Extract matched text for a given line range.

Returns the text from the original input between start_line and end_line (both inclusive, 1-indexed).

§Arguments
  • start_line - Starting line number (1-indexed)
  • end_line - Ending line number (1-indexed)
§Returns

The matched text, or empty string if lines are out of range

Corresponds to Python: matched_text() method in match.py (lines 757-795)

Trait Implementations§

Source§

impl<'a> Debug for Query<'a>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'a> Freeze for Query<'a>

§

impl<'a> RefUnwindSafe for Query<'a>

§

impl<'a> Send for Query<'a>

§

impl<'a> Sync for Query<'a>

§

impl<'a> Unpin for Query<'a>

§

impl<'a> UnsafeUnpin for Query<'a>

§

impl<'a> UnwindSafe for Query<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T, U> ExactFrom<T> for U
where U: TryFrom<T>,

Source§

fn exact_from(value: T) -> U

Source§

impl<T, U> ExactInto<U> for T
where U: ExactFrom<T>,

Source§

fn exact_into(self) -> U

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> OverflowingInto<U> for T
where U: OverflowingFrom<T>,

Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> RoundingInto<U> for T
where U: RoundingFrom<T>,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> SaturatingInto<U> for T
where U: SaturatingFrom<T>,

Source§

impl<T> ToDebugString for T
where T: Debug,

Source§

fn to_debug_string(&self) -> String

Returns the String produced by Ts Debug implementation.

§Examples
use malachite_base::strings::ToDebugString;

assert_eq!([1, 2, 3].to_debug_string(), "[1, 2, 3]");
assert_eq!(
    [vec![2, 3], vec![], vec![4]].to_debug_string(),
    "[[2, 3], [], [4]]"
);
assert_eq!(Some(5).to_debug_string(), "Some(5)");
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T, U> WrappingInto<U> for T
where U: WrappingFrom<T>,

Source§

fn wrapping_into(self) -> U