pub struct Token<'a> {
pub surface: Cow<'a, str>,
pub byte_start: usize,
pub byte_end: usize,
pub position: usize,
pub position_length: usize,
pub word_id: WordId,
pub dictionary: &'a Dictionary,
pub user_dictionary: Option<&'a UserDictionary>,
pub details: Option<Vec<Cow<'a, str>>>,
}Fields§
§surface: Cow<'a, str>The text content of the token, which is a copy-on-write string slice. This allows for efficient handling of both owned and borrowed string data.
byte_start: usizeThe starting byte position of the token in the original text. This indicates where the token begins in the input string.
byte_end: usizeThe ending byte position of the token in the original text. This indicates the position immediately after the last byte of the token.
position: usizeThis field represents the starting byte position of the token within the original input text. It is useful for mapping the token back to its location in the input.
position_length: usizeThe length of the token’s position in the text. This indicates how many characters the token spans.
word_id: WordIdThe identifier for the word, used to uniquely distinguish it within the context of the application.
dictionary: &'a DictionaryA reference to the dictionary used for tokenization.
The dictionary contains the data necessary for the tokenization process, including word entries and their associated metadata. This reference allows the tokenizer to access and utilize the dictionary during the tokenization of input text.
user_dictionary: Option<&'a UserDictionary>An optional reference to a user-defined dictionary.
This dictionary can be used to add custom words or override existing words
in the default dictionary. If None, the default dictionary is used.
details: Option<Vec<Cow<'a, str>>>An optional vector containing detailed information about the token.
Each element in the vector is a Cow (Copy-On-Write) type, which allows
for efficient handling of both owned and borrowed string data.
§Note
This field is optional and may be None if no detailed information is available.
Implementations§
Source§impl<'a> Token<'a>
impl<'a> Token<'a>
Sourcepub fn new(
surface: Cow<'a, str>,
start: usize,
end: usize,
position: usize,
word_id: WordId,
dictionary: &'a Dictionary,
user_dictionary: Option<&'a UserDictionary>,
) -> Self
pub fn new( surface: Cow<'a, str>, start: usize, end: usize, position: usize, word_id: WordId, dictionary: &'a Dictionary, user_dictionary: Option<&'a UserDictionary>, ) -> Self
Creates a new Token instance with the provided parameters.
§Arguments
text- ACow<'a, str>representing the text of the token. This can be either a borrowed or owned string.start- The byte position where the token starts in the original text.end- The byte position where the token ends in the original text.position- The position of the token in the sequence of tokens (usually an index).word_id- TheWordIdassociated with the token, identifying the token in the dictionary.dictionary- A reference to theDictionarythat contains information about the token.user_dictionary- An optional reference to aUserDictionary, which may provide additional user-defined tokens.
§Returns
Returns a new Token instance initialized with the provided values.
§Details
- The token’s
textcan be a borrowed reference or an owned string, thanks to the use ofCow<'a, str>. byte_startandbyte_endare used to define the token’s byte offset within the original text.positionmarks the token’s place in the overall tokenized sequence.position_lengthis set to1by default.word_idis used to identify the token in the dictionary, and the dictionaries (bothdictionaryanduser_dictionary) provide additional details about the token.
Sourcepub fn details(&mut self) -> Vec<&str>
pub fn details(&mut self) -> Vec<&str>
Retrieves the details of the token, either from the dictionary or the user-defined dictionary.
§Returns
Returns a Vec<&str> containing the token’s details. These details are typically part-of-speech information or other metadata about the token.
§Process
- Check if details are already set:
- If
self.detailsisNone, the method will attempt to fetch the details from either the system dictionary or the user dictionary. - If the
word_idis unknown, a default valueUNKis returned.
- If
- Fetch details from dictionaries:
- If the
word_idcorresponds to a system dictionary entry, details are fetched fromself.dictionary. - If the
word_idcorresponds to a user-defined dictionary, details are fetched fromself.user_dictionary.
- If the
- Store details:
- The fetched details are stored in
self.detailsasSome(Vec<Cow<str>>)to avoid recalculating them in subsequent calls.
- The fetched details are stored in
- Return details as
&str:- The
Cow<str>values stored inself.detailsare converted to&strand returned.
- The
§Notes
- The first time this method is called, it fetches the details from the dictionary (or user dictionary), but on subsequent calls, it returns the cached details in
self.details. - If the token is unknown and no details can be retrieved, a default value (
UNK) is used.
Sourcepub fn get_detail(&mut self, index: usize) -> Option<&str>
pub fn get_detail(&mut self, index: usize) -> Option<&str>
Retrieves the token’s detail at the specified index, if available.
§Arguments
index- The index of the detail to retrieve.
§Returns
Returns an Option<&str> that contains the detail at the specified index.
If the index is out of bounds or no details are available, None is returned.
§Details
- This method first ensures that the token’s details are populated by calling
self.details(). - If details are available and the provided index is valid, the detail at the specified index is returned as
Some(&str). - If the index is out of range,
Noneis returned.
Sourcepub fn set_detail(&mut self, index: usize, detail: Cow<'a, str>)
pub fn set_detail(&mut self, index: usize, detail: Cow<'a, str>)
Sets the token’s detail at the specified index with the provided value.
§Arguments
index- The index of the detail to set. This specifies which detail to update.detail- ACow<'a, str>representing the new detail value to set. It can either be a borrowed or owned string.
§Details
- If the token’s details have already been populated (
self.detailsisSome), this method updates the detail at the specified index. - If the provided index is valid (within bounds of the
detailsvector), the detail at that index is replaced by the newdetailvalue. - If the details have not been set (
self.detailsisNone), this method does nothing. - This method does not handle index out-of-bounds errors explicitly, so it assumes that the index provided is valid.
§Notes
- The
Cow<'a, str>type allows flexibility, as it can handle either borrowed or owned strings. - This method does not initialize the details if they are not already set. To ensure the details are set,
details()can be called prior to calling this method.
Sourcepub fn get(&mut self, field_name: &str) -> Option<&str>
pub fn get(&mut self, field_name: &str) -> Option<&str>
Retrieves the token’s detail by field name.
§Arguments
field_name- The name of the field to retrieve.
§Returns
Returns an Option<&str> containing the value of the specified field.
If the field name is not found or the schema is not available, None is returned.
§Example
let base_form = token.get("base_form");
let pos = token.get("major_pos");Sourcepub fn as_value(&mut self) -> Value
pub fn as_value(&mut self) -> Value
Returns all token fields as a JSON Value.
§Returns
Returns a serde_json::Value containing all available fields and their values.
Numeric fields (byte_start, byte_end, word_id) are represented as numbers,
while text fields remain as strings.
§Example
let value = token.as_value();
println!("Surface: {}", value["surface"]);
println!("Byte start: {}", value["byte_start"]); // This is a number
println!("Word ID: {}", value["word_id"]); // This is a numberTrait Implementations§
Auto Trait Implementations§
impl<'a> Freeze for Token<'a>
impl<'a> RefUnwindSafe for Token<'a>
impl<'a> Send for Token<'a>
impl<'a> Sync for Token<'a>
impl<'a> Unpin for Token<'a>
impl<'a> UnwindSafe for Token<'a>
Blanket Implementations§
Source§impl<T> ArchivePointee for T
impl<T> ArchivePointee for T
Source§type ArchivedMetadata = ()
type ArchivedMetadata = ()
Source§fn pointer_metadata(
_: &<T as ArchivePointee>::ArchivedMetadata,
) -> <T as Pointee>::Metadata
fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> LayoutRaw for T
impl<T> LayoutRaw for T
Source§fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
Source§impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
Source§unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
Source§fn resolve_niched(out: Place<NichedOption<T, N1>>)
fn resolve_niched(out: Place<NichedOption<T, N1>>)
out indicating that a T is niched.