Skip to main content

Chunk

canon_core::chunk

Struct Chunk

pub struct Chunk {
    pub id: Uuid,
    pub doc_id: Uuid,
    pub text: String,
    pub byte_offset: u64,
    pub byte_length: u64,
    pub sequence: u32,
    pub text_hash: [u8; 32],
}

Expand description

A chunk of text extracted from a document

Documents are split into overlapping chunks for embedding. Each chunk tracks its position within the source document.

Per CP-011: Uses byte-based offsets (not character-based) for accurate slicing back to original document content.

Per CP-001: Chunk ID is STABLE - ID = hash(doc_id + sequence) only. This ensures re-chunking with different parameters produces the same IDs.

Fields§

§id: Uuid

Unique identifier for this chunk (BLAKE3-16 of doc_id + sequence) - STABLE

§doc_id: Uuid

Parent document ID

§text: String

The actual text content (canonicalized)

§byte_offset: u64

Byte offset within the source document (u64 for large files)

§byte_length: u64

Length of this chunk in bytes (u64 for large files)

§sequence: u32

Sequence number within the document (0-indexed)

§text_hash: [u8; 32]

Hash of the canonicalized text content for verification

Implementations§

impl Chunk

pub fn new(doc_id: Uuid, text: String, byte_offset: u64, sequence: u32) -> Self

Create a new chunk with automatic ID generation.

Per CP-001: Chunk ID is STABLE - does NOT include text. This ensures re-chunking with different parameters produces same IDs. Content is verified via text_hash field.

pub fn text_hash_hex(&self) -> String

Get the text hash as a hex string

pub fn approx_tokens(&self) -> usize

Approximate token count (rough estimate: 4 chars per token)

Trait Implementations§

impl Clone for Chunk

fn clone(&self) -> Chunk

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Debug for Chunk

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl<'de> Deserialize<'de> for Chunk

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more

impl PartialEq for Chunk

fn eq(&self, other: &Chunk) -> bool

Tests for self and other values to be equal, and is used by ==.

1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

impl Serialize for Chunk

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

Serialize this value into the given Serde serializer. Read more

impl Eq for Chunk

impl StructuralPartialEq for Chunk

Auto Trait Implementations§

impl Freeze for Chunk

impl RefUnwindSafe for Chunk

impl Send for Chunk

impl Sync for Chunk

impl Unpin for Chunk

impl UnsafeUnpin for Chunk

impl UnwindSafe for Chunk

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> Same for T

type Output = T

Should always be Self

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,