Skip to main content

Chunk

Struct Chunk 

Source
pub struct Chunk {
    pub id: Uuid,
    pub doc_id: Uuid,
    pub text: String,
    pub byte_offset: u64,
    pub byte_length: u64,
    pub sequence: u32,
    pub text_hash: [u8; 32],
}
Expand description

A chunk of text extracted from a document

Documents are split into overlapping chunks for embedding. Each chunk tracks its position within the source document.

Per CP-011: Uses byte-based offsets (not character-based) for accurate slicing back to original document content.

Per CP-001: Chunk ID is STABLE - ID = hash(doc_id + sequence) only. This ensures re-chunking with different parameters produces the same IDs.

Fields§

§id: Uuid

Unique identifier for this chunk (BLAKE3-16 of doc_id + sequence) - STABLE

§doc_id: Uuid

Parent document ID

§text: String

The actual text content (canonicalized)

§byte_offset: u64

Byte offset within the source document (u64 for large files)

§byte_length: u64

Length of this chunk in bytes (u64 for large files)

§sequence: u32

Sequence number within the document (0-indexed)

§text_hash: [u8; 32]

Hash of the canonicalized text content for verification

Implementations§

Source§

impl Chunk

Source

pub fn new(doc_id: Uuid, text: String, byte_offset: u64, sequence: u32) -> Self

Create a new chunk with automatic ID generation.

Per CP-001: Chunk ID is STABLE - does NOT include text. This ensures re-chunking with different parameters produces same IDs. Content is verified via text_hash field.

Source

pub fn text_hash_hex(&self) -> String

Get the text hash as a hex string

Source

pub fn approx_tokens(&self) -> usize

Approximate token count (rough estimate: 4 chars per token)

Trait Implementations§

Source§

impl Clone for Chunk

Source§

fn clone(&self) -> Chunk

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Chunk

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for Chunk

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for Chunk

Source§

fn eq(&self, other: &Chunk) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for Chunk

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl Eq for Chunk

Source§

impl StructuralPartialEq for Chunk

Auto Trait Implementations§

§

impl Freeze for Chunk

§

impl RefUnwindSafe for Chunk

§

impl Send for Chunk

§

impl Sync for Chunk

§

impl Unpin for Chunk

§

impl UnsafeUnpin for Chunk

§

impl UnwindSafe for Chunk

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,