pub struct Chunk {
pub id: Option<i64>,
pub buffer_id: i64,
pub content: String,
pub byte_range: Range<usize>,
pub index: usize,
pub metadata: ChunkMetadata,
}Expand description
Represents a chunk of text from a buffer.
Chunks are created by chunking strategies and contain a portion of buffer content along with metadata about their position and origin.
§Examples
use rlm_rs::core::Chunk;
let chunk = Chunk::new(
1,
"Hello, world!".to_string(),
0..13,
0,
);
assert_eq!(chunk.size(), 13);Fields§
§id: Option<i64>Unique identifier (assigned by storage layer).
buffer_id: i64ID of the buffer this chunk belongs to.
content: StringChunk content.
byte_range: Range<usize>Byte range in the original buffer.
index: usizeSequential index within the buffer (0-based).
metadata: ChunkMetadataChunk metadata.
Implementations§
Source§impl Chunk
impl Chunk
Sourcepub fn new(
buffer_id: i64,
content: String,
byte_range: Range<usize>,
index: usize,
) -> Self
pub fn new( buffer_id: i64, content: String, byte_range: Range<usize>, index: usize, ) -> Self
Creates a new chunk.
§Arguments
buffer_id- ID of the parent buffer.content- Chunk content.byte_range- Byte range in the original buffer.index- Sequential index within the buffer.
§Examples
use rlm_rs::core::Chunk;
let chunk = Chunk::new(1, "content".to_string(), 0..7, 0);
assert_eq!(chunk.buffer_id, 1);
assert_eq!(chunk.index, 0);Sourcepub fn with_strategy(
buffer_id: i64,
content: String,
byte_range: Range<usize>,
index: usize,
strategy: &str,
) -> Self
pub fn with_strategy( buffer_id: i64, content: String, byte_range: Range<usize>, index: usize, strategy: &str, ) -> Self
Creates a chunk with a specific strategy name.
§Arguments
buffer_id- ID of the parent buffer.content- Chunk content.byte_range- Byte range in the original buffer.index- Sequential index within the buffer.strategy- Name of the chunking strategy.
Sourcepub const fn range_size(&self) -> usize
pub const fn range_size(&self) -> usize
Returns the byte range size.
Sourcepub const fn set_token_count(&mut self, count: usize)
pub const fn set_token_count(&mut self, count: usize)
Sets the token count estimate.
Sourcepub const fn estimate_tokens(&self) -> usize
pub const fn estimate_tokens(&self) -> usize
Estimates token count using a simple heuristic.
Uses the approximation of ~4 characters per token for ASCII text.
For a more accurate estimate, use Self::estimate_tokens_accurate.
§Accuracy
This simple method is typically accurate within 20-30% for English text and code. It tends to undercount for text with many short words and overcount for text with long technical terms.
Sourcepub fn estimate_tokens_accurate(&self) -> usize
pub fn estimate_tokens_accurate(&self) -> usize
Estimates token count with improved accuracy.
Uses a more sophisticated heuristic that accounts for:
- Word boundaries (whitespace-separated tokens)
- Punctuation and operators (often separate tokens)
- Non-ASCII characters (typically 1-2 chars per token)
§Accuracy
This method is typically accurate within 10-15% for mixed content.
For production use requiring exact counts, consider integrating
a proper tokenizer like tiktoken-rs.
§Performance
This method iterates over the content string, so it’s O(n) where
n is the content length. For very large chunks, the simple
Self::estimate_tokens method may be preferred.
Sourcepub const fn set_line_range(&mut self, start_line: usize, end_line: usize)
pub const fn set_line_range(&mut self, start_line: usize, end_line: usize)
Sets the line range in the original buffer.
Sourcepub const fn set_has_overlap(&mut self, has_overlap: bool)
pub const fn set_has_overlap(&mut self, has_overlap: bool)
Marks this chunk as having overlap with the previous chunk.
Sourcepub fn compute_hash(&mut self)
pub fn compute_hash(&mut self)
Computes and sets the content hash.
Sourcepub fn preview(&self, max_len: usize) -> &str
pub fn preview(&self, max_len: usize) -> &str
Returns a preview of the chunk content (first N characters).
§Arguments
max_len- Maximum number of characters to include.
Sourcepub const fn overlaps_with(&self, other_range: &Range<usize>) -> bool
pub const fn overlaps_with(&self, other_range: &Range<usize>) -> bool
Checks if this chunk’s byte range overlaps with another range.
Sourcepub fn contains_offset(&self, offset: usize) -> bool
pub fn contains_offset(&self, offset: usize) -> bool
Checks if this chunk’s byte range contains a specific byte offset.
Trait Implementations§
Source§impl<'de> Deserialize<'de> for Chunk
impl<'de> Deserialize<'de> for Chunk
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
impl Eq for Chunk
impl StructuralPartialEq for Chunk
Auto Trait Implementations§
impl Freeze for Chunk
impl RefUnwindSafe for Chunk
impl Send for Chunk
impl Sync for Chunk
impl Unpin for Chunk
impl UnwindSafe for Chunk
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more