character_text_splitter 0.1.3

A Rust library for splitting text into chunks with overlap, designed for handling large amounts of text efficiently. Implementation is identical to langchain's CharacterTextSplitter
Documentation
  • Coverage
  • 0%
    0 out of 7 items documented0 out of 6 items with examples
  • Size
  • Source code size: 7.28 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 1.25 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 10s Average build duration of successful builds.
  • all releases: 10s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • sarfraaz-talat/text-splitter-rust
    2 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • sarfraaz-talat

TextSplitter - Rust

TextSplitter is a Rust library for splitting a text into chunks of specified size with an option to overlap.

This library replicates the functionality of the CharacterTextSplitter found in the Python Langchain library, giving Rust users access to the same text splitting capabilities. It's designed to be straightforward to use and requires zero dependencies, making it a lightweight addition to any project.

Features

  • Split text into chunks of a specific size
  • Option to overlap chunks
  • Option to provide custom delimiter for splitting

Installation

Add the following to your Cargo.toml file:

[dependencies]
character_text_splitter = "0.1.2"

Usage

Import the library and use the CharacterTextSplitter struct to split your text.

use character_text_splitter::CharacterTextSplitter;

let text = "your text here...";

let splitter = CharacterTextSplitter::new();
let chunks = splitter.split_text(text);

for chunk in chunks {
    println!("{}", chunk);
}

You can also specify the chunk_size, chunk_overlap size or the separator you want to use for the library, like this

    let splitter = CharacterTextSplitter::new()
        .with_chunk_size(300)
        .with_chunk_overlap(50)
        .with_separator(". ");

Default value for chunk_size is 200, chunk_overlap is 40 and default separator is \n\n

License

This project is licensed under the MIT License - see the LICENSE file for details.