Expand description
Natural language processing utilities Advanced NLP Module
This module provides advanced natural language processing capabilities:
- Multilingual support with automatic language detection
- Semantic chunking algorithms
- Custom NER training pipeline
§Features
§Multilingual Support
- Automatic language detection using n-gram analysis
- Support for 10+ languages (English, Spanish, French, German, Chinese, Japanese, Korean, Arabic, Russian, Portuguese)
- Language-specific text normalization and tokenization
§Semantic Chunking
- Multiple chunking strategies (sentence, paragraph, topic, semantic, hybrid)
- Intelligent boundary detection
- Coherence scoring
- Configurable chunk sizes and overlap
§Custom NER
- Pattern-based entity extraction
- Dictionary/gazetteer matching
- Rule-based extraction with priorities
- Training dataset management
- Active learning support
Re-exports§
pub use multilingual::Language;pub use multilingual::LanguageDetector;pub use multilingual::DetectionResult;pub use multilingual::MultilingualProcessor;pub use multilingual::ProcessedText;pub use semantic_chunking::ChunkingStrategy;pub use semantic_chunking::ChunkingConfig;pub use semantic_chunking::SemanticChunk;pub use semantic_chunking::SemanticChunker;pub use semantic_chunking::ChunkingStats;pub use custom_ner::EntityType;pub use custom_ner::ExtractionRule;pub use custom_ner::RuleType;pub use custom_ner::CustomNER;pub use custom_ner::ExtractedEntity;pub use custom_ner::TrainingDataset;pub use custom_ner::AnnotatedExample;pub use custom_ner::DatasetStatistics;pub use syntax_analyzer::POSTag;pub use syntax_analyzer::DependencyRelation;pub use syntax_analyzer::Token;pub use syntax_analyzer::Dependency;pub use syntax_analyzer::NounPhrase;pub use syntax_analyzer::SyntaxAnalyzer;pub use syntax_analyzer::SyntaxAnalyzerConfig;
Modules§
- custom_
ner - Custom NER Training Pipeline
- multilingual
- Multilingual Support
- semantic_
chunking - Semantic Chunking
- syntax_
analyzer - Rule-based Syntax Analysis