Skip to main content

Token

Struct Token 

Source
pub struct Token<'a> {
    pub surface: Cow<'a, str>,
    pub byte_start: usize,
    pub byte_end: usize,
    pub position: usize,
    pub position_length: usize,
    pub word_id: WordId,
    pub dictionary: &'a Dictionary,
    pub user_dictionary: Option<&'a UserDictionary>,
    pub details: Option<Vec<Cow<'a, str>>>,
}

Fields§

§surface: Cow<'a, str>

The text content of the token, which is a copy-on-write string slice. This allows for efficient handling of both owned and borrowed string data.

§byte_start: usize

The starting byte position of the token in the original text. This indicates where the token begins in the input string.

§byte_end: usize

The ending byte position of the token in the original text. This indicates the position immediately after the last byte of the token.

§position: usize

This field represents the starting byte position of the token within the original input text. It is useful for mapping the token back to its location in the input.

§position_length: usize

The length of the token’s position in the text. This indicates how many characters the token spans.

§word_id: WordId

The identifier for the word, used to uniquely distinguish it within the context of the application.

§dictionary: &'a Dictionary

A reference to the dictionary used for tokenization.

The dictionary contains the data necessary for the tokenization process, including word entries and their associated metadata. This reference allows the tokenizer to access and utilize the dictionary during the tokenization of input text.

§user_dictionary: Option<&'a UserDictionary>

An optional reference to a user-defined dictionary.

This dictionary can be used to add custom words or override existing words in the default dictionary. If None, the default dictionary is used.

§details: Option<Vec<Cow<'a, str>>>

An optional vector containing detailed information about the token. Each element in the vector is a Cow (Copy-On-Write) type, which allows for efficient handling of both owned and borrowed string data.

§Note

This field is optional and may be None if no detailed information is available.

Implementations§

Source§

impl<'a> Token<'a>

Source

pub fn new( surface: Cow<'a, str>, start: usize, end: usize, position: usize, word_id: WordId, dictionary: &'a Dictionary, user_dictionary: Option<&'a UserDictionary>, ) -> Self

Creates a new Token instance with the provided parameters.

§Arguments
  • text - A Cow<'a, str> representing the text of the token. This can be either a borrowed or owned string.
  • start - The byte position where the token starts in the original text.
  • end - The byte position where the token ends in the original text.
  • position - The position of the token in the sequence of tokens (usually an index).
  • word_id - The WordId associated with the token, identifying the token in the dictionary.
  • dictionary - A reference to the Dictionary that contains information about the token.
  • user_dictionary - An optional reference to a UserDictionary, which may provide additional user-defined tokens.
§Returns

Returns a new Token instance initialized with the provided values.

§Details
  • The token’s text can be a borrowed reference or an owned string, thanks to the use of Cow<'a, str>.
  • byte_start and byte_end are used to define the token’s byte offset within the original text.
  • position marks the token’s place in the overall tokenized sequence.
  • position_length is set to 1 by default.
  • word_id is used to identify the token in the dictionary, and the dictionaries (both dictionary and user_dictionary) provide additional details about the token.
Source

pub fn details(&mut self) -> Vec<&str>

Retrieves the details of the token, either from the dictionary or the user-defined dictionary.

§Returns

Returns a Vec<&str> containing the token’s details. These details are typically part-of-speech information or other metadata about the token.

§Process
  1. Check if details are already set:
    • If self.details is None, the method will attempt to fetch the details from either the system dictionary or the user dictionary.
    • If the word_id is unknown, a default value UNK is returned.
  2. Fetch details from dictionaries:
    • If the word_id corresponds to a system dictionary entry, details are fetched from self.dictionary.
    • If the word_id corresponds to a user-defined dictionary, details are fetched from self.user_dictionary.
  3. Store details:
    • The fetched details are stored in self.details as Some(Vec<Cow<str>>) to avoid recalculating them in subsequent calls.
  4. Return details as &str:
    • The Cow<str> values stored in self.details are converted to &str and returned.
§Notes
  • The first time this method is called, it fetches the details from the dictionary (or user dictionary), but on subsequent calls, it returns the cached details in self.details.
  • If the token is unknown and no details can be retrieved, a default value (UNK) is used.
Source

pub fn get_detail(&mut self, index: usize) -> Option<&str>

Retrieves the token’s detail at the specified index, if available.

§Arguments
  • index - The index of the detail to retrieve.
§Returns

Returns an Option<&str> that contains the detail at the specified index. If the index is out of bounds or no details are available, None is returned.

§Details
  • This method first ensures that the token’s details are populated by calling self.details().
  • If details are available and the provided index is valid, the detail at the specified index is returned as Some(&str).
  • If the index is out of range, None is returned.
Source

pub fn set_detail(&mut self, index: usize, detail: Cow<'a, str>)

Sets the token’s detail at the specified index with the provided value.

§Arguments
  • index - The index of the detail to set. This specifies which detail to update.
  • detail - A Cow<'a, str> representing the new detail value to set. It can either be a borrowed or owned string.
§Details
  • If the token’s details have already been populated (self.details is Some), this method updates the detail at the specified index.
  • If the provided index is valid (within bounds of the details vector), the detail at that index is replaced by the new detail value.
  • If the details have not been set (self.details is None), this method does nothing.
  • This method does not handle index out-of-bounds errors explicitly, so it assumes that the index provided is valid.
§Notes
  • The Cow<'a, str> type allows flexibility, as it can handle either borrowed or owned strings.
  • This method does not initialize the details if they are not already set. To ensure the details are set, details() can be called prior to calling this method.
Source

pub fn get(&mut self, field_name: &str) -> Option<&str>

Retrieves the token’s detail by field name.

§Arguments
  • field_name - The name of the field to retrieve.
§Returns

Returns an Option<&str> containing the value of the specified field. If the field name is not found or the schema is not available, None is returned.

§Example
let base_form = token.get("base_form");
let pos = token.get("major_pos");
Source

pub fn as_value(&mut self) -> Value

Returns all token fields as a JSON Value.

§Returns

Returns a serde_json::Value containing all available fields and their values. Numeric fields (byte_start, byte_end, word_id) are represented as numbers, while text fields remain as strings.

§Example
let value = token.as_value();
println!("Surface: {}", value["surface"]);
println!("Byte start: {}", value["byte_start"]); // This is a number
println!("Word ID: {}", value["word_id"]); // This is a number

Trait Implementations§

Source§

impl<'a> Clone for Token<'a>

Source§

fn clone(&self) -> Token<'a>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Auto Trait Implementations§

§

impl<'a> Freeze for Token<'a>

§

impl<'a> RefUnwindSafe for Token<'a>

§

impl<'a> Send for Token<'a>

§

impl<'a> Sync for Token<'a>

§

impl<'a> Unpin for Token<'a>

§

impl<'a> UnwindSafe for Token<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> ArchivePointee for T

Source§

type ArchivedMetadata = ()

The archived version of the pointer metadata for this type.
Source§

fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata

Converts some archived metadata to the pointer metadata for itself.
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> LayoutRaw for T

Source§

fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>

Returns the layout of the type.
Source§

impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
where T: SharedNiching<N1, N2>, N1: Niching<T>, N2: Niching<T>,

Source§

unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool

Returns whether the given value has been niched. Read more
Source§

fn resolve_niched(out: Place<NichedOption<T, N1>>)

Writes data to out indicating that a T is niched.
Source§

impl<T> Pointee for T

Source§

type Metadata = ()

The metadata type for pointers and references to this type.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.