Struct text_splitter::ChunkCapacity

source ·
pub struct ChunkCapacity { /* private fields */ }
Expand description

Describes the valid chunk size(s) that can be generated.

The desired size is the target size for the chunk. In most cases, this will also serve as the maximum size of the chunk. It is always possible that a chunk may be returned that is less than the desired value, as adding the next piece of text may have made it larger than the desired capacity.

The max size is the maximum possible chunk size that can be generated. By setting this to a larger value than desired, it means that the chunk should be as close to desired as possible, but can be larger if it means staying at a larger semantic level.

The splitter will consume text until at maxumum somewhere between desired and max, if they differ, but never above max.

If you need to ensure a fixed size, set desired and max to the same value. For example, if you are trying to maximize the context window for an embedding.

If you are loosely targeting a size, but have some extra room, for example in a RAG use case where you roughly want a certain part of a document, you can set max to your absolute maxumum, and the splitter can stay at a higher semantic level when determining the chunk.

Implementations§

source§

impl ChunkCapacity

source

pub fn new(size: usize) -> Self

Create a new ChunkCapacity with the same desired and max size.

source

pub fn desired(&self) -> usize

The desired size is the target size for the chunk. In most cases, this will also serve as the maximum size of the chunk. It is always possible that a chunk may be returned that is less than the desired value, as adding the next piece of text may have made it larger than the desired capacity.

source

pub fn max(&self) -> usize

The max size is the maximum possible chunk size that can be generated. By setting this to a larger value than desired, it means that the chunk should be as close to desired as possible, but can be larger if it means staying at a larger semantic level.

source

pub fn with_max(self, max: usize) -> Result<Self, ChunkCapacityError>

If you need to ensure a fixed size, set desired and max to the same value. For example, if you are trying to maximize the context window for an embedding.

If you are loosely targeting a size, but have some extra room, for example in a RAG use case where you roughly want a certain part of a document, you can set max to your absolute maxumum, and the splitter can stay at a higher semantic level when determining the chunk.

§Errors

If the max size is less than the desired size, an error is returned.

source

pub fn fits(&self, chunk_size: usize) -> Ordering

Validate if a given chunk fits within the capacity

  • Ordering::Less indicates more could be added
  • Ordering::Equal indicates the chunk is within the capacity range
  • Ordering::Greater indicates the chunk is larger than the capacity

Trait Implementations§

source§

impl Clone for ChunkCapacity

source§

fn clone(&self) -> ChunkCapacity

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for ChunkCapacity

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl From<Range<usize>> for ChunkCapacity

source§

fn from(range: Range<usize>) -> Self

Converts to this type from the input type.
source§

impl From<RangeFrom<usize>> for ChunkCapacity

source§

fn from(range: RangeFrom<usize>) -> Self

Converts to this type from the input type.
source§

impl From<RangeFull> for ChunkCapacity

source§

fn from(_: RangeFull) -> Self

Converts to this type from the input type.
source§

impl From<RangeInclusive<usize>> for ChunkCapacity

source§

fn from(range: RangeInclusive<usize>) -> Self

Converts to this type from the input type.
source§

impl From<RangeTo<usize>> for ChunkCapacity

source§

fn from(range: RangeTo<usize>) -> Self

Converts to this type from the input type.
source§

impl From<RangeToInclusive<usize>> for ChunkCapacity

source§

fn from(range: RangeToInclusive<usize>) -> Self

Converts to this type from the input type.
source§

impl From<usize> for ChunkCapacity

source§

fn from(size: usize) -> Self

Converts to this type from the input type.
source§

impl PartialEq for ChunkCapacity

source§

fn eq(&self, other: &ChunkCapacity) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Copy for ChunkCapacity

source§

impl StructuralPartialEq for ChunkCapacity

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T> Pointable for T

source§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

source§

fn vzip(self) -> V