Skip to main content

Slab

Struct Slab 

Source
pub struct Slab {
    pub text: String,
    pub start: usize,
    pub end: usize,
    pub char_start: Option<usize>,
    pub char_end: Option<usize>,
    pub index: usize,
}
Expand description

A chunk of text with its position in the original document.

The name “slab” evokes a physical slice of material—concrete, wood, stone. Each slab is a self-contained piece that can be embedded, indexed, and retrieved independently.

§Offsets

Primary offsets (start/end) are byte offsets into the original text, matching Rust’s string slicing semantics:

use code_chunker::Slab;

let text = "Hello, world!";
let slab = Slab::new("world", 7, 12, 0);

// The offsets let you recover the original position
assert_eq!(&text[slab.start..slab.end], "world");

Character offsets (char_start/char_end) are automatically populated when using Chunker::chunk. They count Unicode scalar values (chars), useful for NLP systems that index by character position. Only None when using Chunker::chunk_bytes directly.

§Overlap Handling

When chunks overlap, adjacent slabs share some text. The index field identifies each slab’s position in the sequence:

Original: "The quick brown fox"
Slab 0:   "The quick b"     [0..11]
Slab 1:   "ck brown fox"    [8..19]  <- overlaps with slab 0
               ^
           overlap region [8..11]

Fields§

§text: String

The chunk text.

§start: usize

Byte offset where this chunk starts in the original document.

§end: usize

Byte offset where this chunk ends (exclusive) in the original document.

§char_start: Option<usize>

Character offset where this chunk starts (Unicode scalar values). None until with_char_offsets or compute_char_offsets is called.

§char_end: Option<usize>

Character offset where this chunk ends (exclusive, Unicode scalar values).

§index: usize

Zero-based index of this chunk in the sequence.

Implementations§

Source§

impl Slab

Source

pub fn new( text: impl Into<String>, start: usize, end: usize, index: usize, ) -> Self

Create a new slab (byte offsets only; char offsets unset).

Source

pub fn with_char_offsets(self, char_start: usize, char_end: usize) -> Self

Set character offsets on this slab.

Source

pub fn len(&self) -> usize

The length of this chunk in bytes.

Source

pub fn char_len(&self) -> usize

The length of this chunk in characters (Unicode scalar values).

Source

pub fn is_empty(&self) -> bool

Whether this chunk is empty.

Source

pub fn span(&self) -> Range<usize>

The byte span of this chunk in the original document.

Source

pub fn char_span(&self) -> Option<Range<usize>>

The character span, if computed.

Trait Implementations§

Source§

impl Clone for Slab

Source§

fn clone(&self) -> Slab

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Slab

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for Slab

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for Slab

Source§

fn eq(&self, other: &Slab) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for Slab

Source§

impl StructuralPartialEq for Slab

Auto Trait Implementations§

§

impl Freeze for Slab

§

impl RefUnwindSafe for Slab

§

impl Send for Slab

§

impl Sync for Slab

§

impl Unpin for Slab

§

impl UnsafeUnpin for Slab

§

impl UnwindSafe for Slab

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.