Crate cognee_chunking

Expand description

Text chunking for Cognee, ported from the Python chunking hierarchy.

Splits text through a word → sentence → paragraph hierarchy into token-bounded chunks. Zero-copy where possible (chunks borrow &str slices via byte-offset tracking).

text_chunker / cognify_pipeline — the chunking entry points (the latter is a plain code span, not an intra-doc link: it is gated off wasm32, where the link would be unresolved on a --target wasm32 doc build)
token_counter — the token_counter::TokenCounter trait and its WordCounter / HuggingFaceTokenCounter / TikTokenCounter impls, selected by config (TokenCounterKind::from_env)

Re-exports§

pub use chunk_by_row::chunk_by_row;
pub use cognify_pipeline::ExtractTextChunksPipeline;
pub use config::TokenCounterKind;
pub use cut_type::CutType;
pub use error::ChunkingError;
pub use text_chunker::NAMESPACE_OID;
pub use text_chunker::chunk_text;
pub use token_counter::TokenCounter;
pub use token_counter::WordCounter;

Modules§

chunk_by_paragraph: Paragraph-level text chunker.
chunk_by_row: Row-based chunking for CSV and DLT data.
chunk_by_sentence: Sentence-level text chunker.
chunk_by_word: Word-level text chunker.
cognify_pipeline: Extract text chunks pipeline.
config: Chunking configuration — tokenizer selection via environment variables.
cut_type
error
text_chunker: Top-level text chunker producing DocumentChunk output.
token_counter

Crate cognee_chunking

Crate cognee_chunking Copy item path

Re-exports§

Modules§

Crate cognee_chunking