Expand description
Text chunking for Cognee, ported from the Python chunking hierarchy.
Splits text through a word → sentence → paragraph hierarchy into
token-bounded chunks. Zero-copy where possible (chunks borrow &str slices
via byte-offset tracking).
text_chunker/cognify_pipeline— the chunking entry pointstoken_counter— thetoken_counter::TokenCountertrait and itsWordCounter/HuggingFaceTokenCounter/TikTokenCounterimpls, selected byconfig(TokenCounterKind::from_env)
Re-exports§
pub use chunk_by_row::chunk_by_row;pub use cognify_pipeline::ExtractTextChunksPipeline;pub use config::TokenCounterKind;pub use cut_type::CutType;pub use error::ChunkingError;pub use text_chunker::NAMESPACE_OID;pub use text_chunker::chunk_text;pub use token_counter::TokenCounter;pub use token_counter::WordCounter;
Modules§
- chunk_
by_ paragraph - Paragraph-level text chunker.
- chunk_
by_ row - Row-based chunking for CSV and DLT data.
- chunk_
by_ sentence - Sentence-level text chunker.
- chunk_
by_ word - Word-level text chunker.
- cognify_
pipeline - Extract text chunks pipeline.
- config
- Chunking configuration — tokenizer selection via environment variables.
- cut_
type - error
- text_
chunker - Top-level text chunker producing
DocumentChunkoutput. - token_
counter