pub struct Tokenizer { /* private fields */ }Expand description
High-level tokenizer wrapper for DNN usage.
Provides a simple API to encode and decode tokens for LLMs. Models are loaded via Tokenizer::load().
using namespace cv::dnn;
Tokenizer tok = Tokenizer::load("/path/to/model/");
std::vector<int> ids = tok.encode("hello world");
std::string text = tok.decode(ids);Implementations§
Source§impl Tokenizer
impl Tokenizer
Sourcepub fn default() -> Result<Tokenizer>
pub fn default() -> Result<Tokenizer>
Construct a tokenizer with a given method default BPE. For BPE method you normally call Tokenizer::load() to initialize model data.
Sourcepub fn load(model_config: &str) -> Result<Tokenizer>
pub fn load(model_config: &str) -> Result<Tokenizer>
Load a tokenizer from a model directory.
Expects the directory to contain:
config.jsonwith fieldmodel_typewith value “gpt2” or “gpt4”.tokenizer.jsonproduced by the corresponding model family.
The argument is a path prefix; this function concatenates file
names directly (e.g. model_dir + “config.json”), so model_dir must
end with an appropriate path separator.
§Parameters
- model_config: Path to config.json for model.
§Returns
A Tokenizer ready for use. Throws cv::Exception if files are missing or model_type is unsupported.
Trait Implementations§
Source§impl Boxed for Tokenizer
impl Boxed for Tokenizer
Source§unsafe fn from_raw(ptr: <Tokenizer as OpenCVFromExtern>::ExternReceive) -> Self
unsafe fn from_raw(ptr: <Tokenizer as OpenCVFromExtern>::ExternReceive) -> Self
Wrap the specified raw pointer Read more
Source§fn into_raw(self) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSendMut
fn into_raw(self) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSendMut
Return the underlying raw pointer while consuming this wrapper. Read more
Source§fn as_raw(&self) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSend
fn as_raw(&self) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSend
Return the underlying raw pointer. Read more
Source§fn as_raw_mut(
&mut self,
) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSendMut
fn as_raw_mut( &mut self, ) -> <Tokenizer as OpenCVTypeExternContainer>::ExternSendMut
Return the underlying mutable raw pointer Read more
impl Send for Tokenizer
Source§impl TokenizerTrait for Tokenizer
impl TokenizerTrait for Tokenizer
Source§impl TokenizerTraitConst for Tokenizer
impl TokenizerTraitConst for Tokenizer
fn as_raw_Tokenizer(&self) -> *const c_void
Auto Trait Implementations§
impl !Sync for Tokenizer
impl Freeze for Tokenizer
impl RefUnwindSafe for Tokenizer
impl Unpin for Tokenizer
impl UnsafeUnpin for Tokenizer
impl UnwindSafe for Tokenizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Mat> ModifyInplace for Matwhere
Mat: Boxed,
impl<Mat> ModifyInplace for Matwhere
Mat: Boxed,
Source§unsafe fn modify_inplace<Res>(
&mut self,
f: impl FnOnce(&Mat, &mut Mat) -> Res,
) -> Res
unsafe fn modify_inplace<Res>( &mut self, f: impl FnOnce(&Mat, &mut Mat) -> Res, ) -> Res
Helper function to call OpenCV functions that allow in-place modification of a
Mat or another similar object. By passing
a mutable reference to the Mat to this function your closure will get called with the read reference and a write references
to the same Mat. This is unsafe in a general case as it leads to having non-exclusive mutable access to the internal data,
but it can be useful for some performance sensitive operations. One example of an OpenCV function that allows such in-place
modification is imgproc::threshold. Read more