llm-text 0.1.1

A Rust library for processing text for LLM consumption
Documentation

llm-text

A Rust library for processing text for LLM consumption. Provides utilities for cleaning, splitting, and extracting content from HTML and other sources.

Features

  • Text Cleaning: Clean and normalize text with configurable newline handling
  • HTML Processing: Extract clean text from HTML content
  • Link Extraction: Extract and analyze links from text
  • Text Splitting: Split large texts into manageable chunks

Installation

Add this to your Cargo.toml:

[dependencies]
llm-text = "0.1.0"

Usage

use llm_text::text::TextCleaner;

let dirty_text = "Hello\r\n\r\n\r\nWorld!";
let cleaner = TextCleaner::new().reduce_newlines_to_single_newline();
let clean = cleaner.clean(dirty_text);
assert_eq!(clean, "Hello\r\nWorld!");