overlap-chunk
A Rust library for splitting text into chunks of specified size with adjustable overlap percentage.
Features
Current Features
- Basic functionality to split text into chunks of specified size
- Option to adjust the overlap percentage between chunks
- Command-line interface for easy text processing
Future Features
- Chunking that respects word boundaries and sentence boundaries
- Support for multilingual text
- Support for streaming input
Library Usage
use ChunkOptions;
use chunk_text;
Command Line Usage
The library includes a command-line interface for processing text files:
Usage: overlap-chunk [OPTIONS] [FILE]
If no file is specified, read from standard input
Options:
-h, --help Display this help message
-s, --size SIZE Specify chunk size (default: 100)
-o, --overlap PERCENT Specify overlap percentage between 0 and 90 (default: 0)
Examples
Process a file with default settings:
Process a file with custom chunk size and overlap:
Process standard input:
|
License
MIT License