List of all items
Structs
- chunking::Chunk
- chunking::ChunkerConfig
- chunking::ChunkerResult
- chunking::TextChunker
- clean_text::TextCleaner
- splitting::TextSplit
- splitting::TextSplitter
- test_text::Chunking
- test_text::ChunkingTestCase
- test_text::ChunkingTestCases
- test_text::Splitting
- test_text::Text
- test_text::chunking_small
- test_text::chunking_tiny
- test_text::graphemes_unicode
- test_text::html_short
- test_text::joining
- test_text::long
- test_text::many_subjects
- test_text::medium
- test_text::one_subject_many_topics
- test_text::one_subject_one_topic
- test_text::really_long
- test_text::sentences_rule_1
- test_text::sentences_rule_2
- test_text::sentences_rule_3
- test_text::sentences_rule_4
- test_text::sentences_unicode
- test_text::single_eol
- test_text::small
- test_text::smollest
- test_text::tiny
- test_text::two_plus_eol
- test_text::words_unicode
Enums
Traits
Functions
- chunking::chunk_text
- clean_html::clean_html
- clean_text::normalize_whitespace
- clean_text::reduce_to_single_whitespace
- clean_text::strip_unwanted_chars
- extract::extract_urls
- local_content::load_content_path
- splitting::rule_based::split_text_into_indices
- splitting::rule_based::split_text_into_sentences
Statics
- clean_text::CITATIONS_REGEX
- clean_text::END_OF_LINE_REGEX
- clean_text::END_OF_LINE_SEQUENCES
- clean_text::END_OF_PARAGRAPH_REGEX
- clean_text::END_OF_PARAGRAPH_SEQUENCES
- clean_text::SINGLE_NEWLINE_REGEX
- clean_text::SINGLE_SPACE_REGEX
- clean_text::TWO_PLUS_NEWLINE_REGEX
- clean_text::UNWANTED_CHARS_REGEX
- clean_text::WHITE_SPACE_REGEX
- clean_text::WHITE_SPACE_SEQUENCES
- splitting::SINGLE_NEWLINE_REGEX
- splitting::TWO_PLUS_NEWLINE_REGEX
- splitting::rule_based::HANDLE_FLOATS_WITHOUT_LEADING_ZERO
- splitting::rule_based::PAREN_REPAIR
- splitting::rule_based::QUOTE_REPAIR_REGEXES
- splitting::rule_based::QUOTE_TRANSFORMATIONS
- splitting::rule_based::REMOVE_ABBREVIATIONS
- splitting::rule_based::REMOVE_COMPOSITE_ABBREVIATIONS
- splitting::rule_based::REMOVE_FLOATING_POINT_NUMBERS
- splitting::rule_based::REMOVE_INITIALS
- splitting::rule_based::REMOVE_SENTENCE_ENDERS_BEFORE_PARENS
- splitting::rule_based::REMOVE_SUSPENSION_POINTS
- splitting::rule_based::UNSTICK_SENTENCES
- test_text::CHUNK_TESTS
- test_text::SPLIT_TESTS
- test_text::TEXT