Skip to main content

Module nlp

Module nlp 

Source
Expand description

Natural language processing utilities Advanced NLP Module

This module provides advanced natural language processing capabilities:

  • Multilingual support with automatic language detection
  • Semantic chunking algorithms
  • Custom NER training pipeline

§Features

§Multilingual Support

  • Automatic language detection using n-gram analysis
  • Support for 10+ languages (English, Spanish, French, German, Chinese, Japanese, Korean, Arabic, Russian, Portuguese)
  • Language-specific text normalization and tokenization

§Semantic Chunking

  • Multiple chunking strategies (sentence, paragraph, topic, semantic, hybrid)
  • Intelligent boundary detection
  • Coherence scoring
  • Configurable chunk sizes and overlap

§Custom NER

  • Pattern-based entity extraction
  • Dictionary/gazetteer matching
  • Rule-based extraction with priorities
  • Training dataset management
  • Active learning support

Re-exports§

pub use multilingual::Language;
pub use multilingual::LanguageDetector;
pub use multilingual::DetectionResult;
pub use multilingual::MultilingualProcessor;
pub use multilingual::ProcessedText;
pub use semantic_chunking::ChunkingStrategy;
pub use semantic_chunking::ChunkingConfig;
pub use semantic_chunking::SemanticChunk;
pub use semantic_chunking::SemanticChunker;
pub use semantic_chunking::ChunkingStats;
pub use custom_ner::EntityType;
pub use custom_ner::ExtractionRule;
pub use custom_ner::RuleType;
pub use custom_ner::CustomNER;
pub use custom_ner::ExtractedEntity;
pub use custom_ner::TrainingDataset;
pub use custom_ner::AnnotatedExample;
pub use custom_ner::DatasetStatistics;
pub use syntax_analyzer::POSTag;
pub use syntax_analyzer::DependencyRelation;
pub use syntax_analyzer::Token;
pub use syntax_analyzer::Dependency;
pub use syntax_analyzer::NounPhrase;
pub use syntax_analyzer::SyntaxAnalyzer;
pub use syntax_analyzer::SyntaxAnalyzerConfig;

Modules§

custom_ner
Custom NER Training Pipeline
multilingual
Multilingual Support
semantic_chunking
Semantic Chunking
syntax_analyzer
Rule-based Syntax Analysis