Skip to main content

Tokenizer

Struct Tokenizer 

Source
pub struct Tokenizer { /* private fields */ }
Expand description

High-level tokenizer wrapper for DNN usage.

Provides a simple API to encode and decode tokens for LLMs. Models are loaded via Tokenizer::load().

using namespace cv::dnn;
Tokenizer tok = Tokenizer::load("/path/to/model/");
std::vector<int> ids = tok.encode("hello world");
std::string text = tok.decode(ids);

Implementations§

Source§

impl Tokenizer

Source

pub fn default() -> Result<Tokenizer>

Construct a tokenizer with a given method default BPE. For BPE method you normally call Tokenizer::load() to initialize model data.

Source

pub fn load(model_config: &str) -> Result<Tokenizer>

Load a tokenizer from a model directory.

Expects the directory to contain:

  • config.json with field model_type with value “gpt2” or “gpt4”.
  • tokenizer.json produced by the corresponding model family.

The argument is a path prefix; this function concatenates file names directly (e.g. model_dir + “config.json”), so model_dir must end with an appropriate path separator.

§Parameters
  • model_config: Path to config.json for model.
§Returns

A Tokenizer ready for use. Throws cv::Exception if files are missing or model_type is unsupported.

Trait Implementations§

Source§

impl Boxed for Tokenizer

Source§

unsafe fn from_raw(ptr: <Tokenizer as OpenCVFromExtern>::ExternReceive) -> Self

Wrap the specified raw pointer Read more
Source§

fn into_raw(self) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSendMut

Return the underlying raw pointer while consuming this wrapper. Read more
Source§

fn as_raw(&self) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSend

Return the underlying raw pointer. Read more
Source§

fn as_raw_mut( &mut self, ) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSendMut

Return the underlying mutable raw pointer Read more
Source§

impl Clone for Tokenizer

Source§

fn clone(&self) -> Self

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Tokenizer

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Drop for Tokenizer

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more
Source§

impl Send for Tokenizer

Source§

impl TokenizerTrait for Tokenizer

Source§

fn as_raw_mut_Tokenizer(&mut self) -> *mut c_void

Source§

fn encode(&mut self, text: &str) -> Result<Vector<i32>>

Encode UTF-8 text to token ids (special tokens currently disabled). Read more
Source§

fn decode(&mut self, tokens: &Vector<i32>) -> Result<String>

Source§

impl TokenizerTraitConst for Tokenizer

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<Mat> ModifyInplace for Mat
where Mat: Boxed,

Source§

unsafe fn modify_inplace<Res>( &mut self, f: impl FnOnce(&Mat, &mut Mat) -> Res, ) -> Res

Helper function to call OpenCV functions that allow in-place modification of a Mat or another similar object. By passing a mutable reference to the Mat to this function your closure will get called with the read reference and a write references to the same Mat. This is unsafe in a general case as it leads to having non-exclusive mutable access to the internal data, but it can be useful for some performance sensitive operations. One example of an OpenCV function that allows such in-place modification is imgproc::threshold. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.