🦀 vn-nlp
Vietnamese NLP library in pure Rust — tokenization, normalization, segmentation.
Zero-copy,no_stdcompatible (withalloc), zero-cost abstractions.
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
Tokenize
use tokenize;
let tokens = tokenize.unwrap;
assert_eq!;
assert_eq!;
Normalize
use normalize;
let clean = strip_diacritics;
assert_eq!;
Sentence Segmentation
use segment;
let sentences = segment.unwrap;
assert_eq!;
Feature Flags
| Feature | Default | Description |
|---|---|---|
tokenize |
✅ | Word tokenization |
normalize |
✅ | Unicode normalization & diacritics |
segment |
✅ | Sentence segmentation |
dictionary |
❌ | Dictionary-based word segmentation |
# Chỉ dùng tokenizer
= { = "0.1", = false, = ["tokenize"] }
Documentation
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.