pub trait Tokenizer {
// Required methods
fn encode(&self, input: &str) -> Result<Vec<String>>;
fn decode(&self, tokens: Vec<String>) -> Result<String>;
}Expand description
Defines the necessary functions for a tokenizer.
This trait provides the core functionality needed to convert strings to sequences of tokens and vice versa. It is essential for text processing tasks such as natural language processing, where text needs to be broken down into manageable pieces or reconstructed from tokenized forms.
Required Methods§
Sourcefn encode(&self, input: &str) -> Result<Vec<String>>
fn encode(&self, input: &str) -> Result<Vec<String>>
Encodes a given string into a sequence of tokens.
This function takes a reference to a string and returns a vector of token strings resulting from the tokenization process.
§Arguments
input- A reference to the string to be tokenized.
§Returns
A Result containing either the vector of tokens if successful or an error if the
tokenization fails.
Sourcefn decode(&self, tokens: Vec<String>) -> Result<String>
fn decode(&self, tokens: Vec<String>) -> Result<String>
Decodes a given sequence of tokens back into a single string.
This function takes a vector of token strings and reconstructs the original string.
§Arguments
tokens- A vector of strings representing the tokens to be decoded.
§Returns
A Result containing either the reconstructed string if successful or an error if the
decoding fails.
Dyn Compatibility§
This trait is dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".