Skip to main content

Crate cognee_chunking

Crate cognee_chunking 

Source
Expand description

Text chunking for Cognee, ported from the Python chunking hierarchy.

Splits text through a word → sentence → paragraph hierarchy into token-bounded chunks. Zero-copy where possible (chunks borrow &str slices via byte-offset tracking).

Re-exports§

pub use chunk_by_row::chunk_by_row;
pub use cognify_pipeline::ExtractTextChunksPipeline;
pub use config::TokenCounterKind;
pub use cut_type::CutType;
pub use error::ChunkingError;
pub use text_chunker::NAMESPACE_OID;
pub use text_chunker::chunk_text;
pub use token_counter::TokenCounter;
pub use token_counter::WordCounter;

Modules§

chunk_by_paragraph
Paragraph-level text chunker.
chunk_by_row
Row-based chunking for CSV and DLT data.
chunk_by_sentence
Sentence-level text chunker.
chunk_by_word
Word-level text chunker.
cognify_pipeline
Extract text chunks pipeline.
config
Chunking configuration — tokenizer selection via environment variables.
cut_type
error
text_chunker
Top-level text chunker producing DocumentChunk output.
token_counter