pub struct TokenDictionary { /* private fields */ }Expand description
Token dictionary mapping token strings to unique integer IDs.
Token IDs are assigned as follows:
- IDs 0 to len_legalese-1: Reserved for legalese tokens (high-value words)
- IDs len_legalese and above: Assigned to other tokens as encountered
The len_legalese delimiter allows the matching engine to distinguish
between high-value (legalese) tokens and regular tokens.
Based on the Python ScanCode Toolkit implementation at: reference/scancode-toolkit/src/licensedcode/index.py
Implementations§
Source§impl TokenDictionary
impl TokenDictionary
Sourcepub fn new_with_legalese(legalese_entries: &[(&str, u16)]) -> Self
pub fn new_with_legalese(legalese_entries: &[(&str, u16)]) -> Self
Create a new token dictionary initialized with legalese tokens.
This follows the Python ScanCode TorchToolkit pattern where the dictionary starts with pre-defined legalese words that get low IDs (high value).
§Arguments
legalese_entries- Slice of (word, token_id) pairs for legalese words
§Returns
A new TokenDictionary instance with legalese tokens pre-populated
pub fn intern(&mut self, token: &str) -> KnownToken
pub fn lookup(&self, token: &str) -> Option<KnownToken>
pub fn classify_query_token(&self, token: &str) -> QueryToken
pub fn token_kind(&self, token_id: TokenId) -> TokenKind
pub fn is_digit_only_token(&self, token_id: TokenId) -> bool
Sourcepub fn get_token_id(&self, token: &str) -> Option<TokenId>
pub fn get_token_id(&self, token: &str) -> Option<TokenId>
Sourcepub fn get(&self, token: &str) -> Option<TokenId>
pub fn get(&self, token: &str) -> Option<TokenId>
Get the token ID (alias for backward compatibility).
Sourcepub const fn legalese_count(&self) -> usize
pub const fn legalese_count(&self) -> usize
Get the number of legalese tokens.
Trait Implementations§
Source§impl Clone for TokenDictionary
impl Clone for TokenDictionary
Source§fn clone(&self) -> TokenDictionary
fn clone(&self) -> TokenDictionary
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreSource§impl Debug for TokenDictionary
impl Debug for TokenDictionary
Auto Trait Implementations§
impl Freeze for TokenDictionary
impl RefUnwindSafe for TokenDictionary
impl Send for TokenDictionary
impl Sync for TokenDictionary
impl Unpin for TokenDictionary
impl UnsafeUnpin for TokenDictionary
impl UnwindSafe for TokenDictionary
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more