Struct TokenChunker

Source
pub struct TokenChunker { /* private fields */ }
Expand description

§Chunkers

Module to contain all the methods of chunking allowing for prepping text before embedding and storing it.

§TokenChunker

This struct allows you to do fixed size chunking based on the number of tokens in each chunk. We build around specific embedding models and based on which embedding model is being used we will use the correlating tokenizer.

§Examples

use rag_toolchain::chunkers::*;
use rag_toolchain::common::*;
use std::num::NonZeroUsize;

fn generate_chunks() {
    let raw_text: &str = "This is a test string";
    let window_size: usize = 1;
    let chunk_size: NonZeroUsize = NonZeroUsize::new(2).unwrap();

    const EMBEDDING_MODEL: OpenAIEmbeddingModel = OpenAIEmbeddingModel::TextEmbedding3Small;

    let chunker: TokenChunker = TokenChunker::try_new(
        chunk_size,
        window_size,
        EMBEDDING_MODEL,
    )
    .unwrap();

    let chunks: Chunks = chunker.generate_chunks(raw_text).unwrap();
}

Implementations§

Source§

impl TokenChunker

Source

pub fn try_new( chunk_size: NonZeroUsize, chunk_overlap: usize, embedding_model: impl EmbeddingModel, ) -> Result<Self, TokenChunkingError>

§TokenChunker::try_new
§Arguments
  • chunk_size: NonZeroUsize - The size in tokens of each chunk
  • chunk_overlap: usize - The number of tokens that overlap between each chunk
  • embedding_model: impl EmbeddingModel - The embedding model to use, this tells us what tokenizer to use
§Errors
  • [ChunkingError::InvalidChunkSize] - Chunk size must be smaller than the maximum number of tokens
  • [ChunkingError::ChunkOverlapTooLarge] - Chunk overlap must be smaller than chunk size
§Returns

Trait Implementations§

Source§

impl Chunker for TokenChunker

Source§

fn generate_chunks(&self, raw_text: &str) -> Result<Chunks, Self::ErrorType>

§TokenChunker::generate_chunks

function to generate chunks from raw text

§Arguments
  • raw_text: &str - The raw text to generate chunks from
§Errors
  • [ChunkingError::TokenizationError] - Unable to tokenize text
§Returns

Chunks - The generated chunks

Source§

type ErrorType = TokenChunkingError

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,