pub struct TokenChunker { /* private fields */ }
Expand description
§Chunkers
Module to contain all the methods of chunking allowing for prepping text before embedding and storing it.
§TokenChunker
This struct allows you to do fixed size chunking based on the number of tokens in each chunk. We build around specific embedding models and based on which embedding model is being used we will use the correlating tokenizer.
§Examples
use rag_toolchain::chunkers::*;
use rag_toolchain::common::*;
use std::num::NonZeroUsize;
fn generate_chunks() {
let raw_text: &str = "This is a test string";
let window_size: usize = 1;
let chunk_size: NonZeroUsize = NonZeroUsize::new(2).unwrap();
const EMBEDDING_MODEL: OpenAIEmbeddingModel = OpenAIEmbeddingModel::TextEmbedding3Small;
let chunker: TokenChunker = TokenChunker::try_new(
chunk_size,
window_size,
EMBEDDING_MODEL,
)
.unwrap();
let chunks: Chunks = chunker.generate_chunks(raw_text).unwrap();
}
Implementations§
Source§impl TokenChunker
impl TokenChunker
Sourcepub fn try_new(
chunk_size: NonZeroUsize,
chunk_overlap: usize,
embedding_model: impl EmbeddingModel,
) -> Result<Self, TokenChunkingError>
pub fn try_new( chunk_size: NonZeroUsize, chunk_overlap: usize, embedding_model: impl EmbeddingModel, ) -> Result<Self, TokenChunkingError>
§TokenChunker::try_new
§Arguments
chunk_size
:NonZeroUsize
- The size in tokens of each chunkchunk_overlap
:usize
- The number of tokens that overlap between each chunkembedding_model
: implEmbeddingModel
- The embedding model to use, this tells us what tokenizer to use
§Errors
- [
ChunkingError::InvalidChunkSize
] - Chunk size must be smaller than the maximum number of tokens - [
ChunkingError::ChunkOverlapTooLarge
] - Chunk overlap must be smaller than chunk size
§Returns
TokenChunker
- The token chunker
Trait Implementations§
Source§impl Chunker for TokenChunker
impl Chunker for TokenChunker
Auto Trait Implementations§
impl Freeze for TokenChunker
impl !RefUnwindSafe for TokenChunker
impl !Send for TokenChunker
impl !Sync for TokenChunker
impl Unpin for TokenChunker
impl !UnwindSafe for TokenChunker
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more