Skip to main content

Crate cognee_chunking

Crate cognee_chunking 

Source
Expand description

Text chunking for Cognee, ported from the Python chunking hierarchy.

Splits text through a word → sentence → paragraph hierarchy into token-bounded chunks. Zero-copy where possible (chunks borrow &str slices via byte-offset tracking).

  • text_chunker / cognify_pipeline — the chunking entry points (the latter is a plain code span, not an intra-doc link: it is gated off wasm32, where the link would be unresolved on a --target wasm32 doc build)
  • token_counter — the token_counter::TokenCounter trait and its WordCounter / HuggingFaceTokenCounter / TikTokenCounter impls, selected by config (TokenCounterKind::from_env)

Re-exports§

pub use chunk_by_row::chunk_by_row;
pub use cognify_pipeline::ExtractTextChunksPipeline;
pub use config::TokenCounterKind;
pub use cut_type::CutType;
pub use error::ChunkingError;
pub use text_chunker::NAMESPACE_OID;
pub use text_chunker::chunk_text;
pub use token_counter::TokenCounter;
pub use token_counter::WordCounter;

Modules§

chunk_by_paragraph
Paragraph-level text chunker.
chunk_by_row
Row-based chunking for CSV and DLT data.
chunk_by_sentence
Sentence-level text chunker.
chunk_by_word
Word-level text chunker.
cognify_pipeline
Extract text chunks pipeline.
config
Chunking configuration — tokenizer selection via environment variables.
cut_type
error
text_chunker
Top-level text chunker producing DocumentChunk output.
token_counter