🦀 vn-nlp

Vietnamese NLP library in pure Rust — tokenization, normalization, segmentation.
Zero-copy, no_std compatible (with alloc), zero-cost abstractions.

Quick Start

Add to your Cargo.toml:

[dependencies]
vn-nlp = "0.1"

Tokenize

use vn_nlp::tokenize;

let tokens = tokenize("Xin chào Việt Nam!").unwrap();
assert_eq!(tokens[0].text, "Xin");
assert_eq!(tokens[1].text, "chào");

Normalize

use vn_nlp::normalize;

let clean = normalize::strip_diacritics("Tiếng Việt");
assert_eq!(clean, "Tieng Viet");

Sentence Segmentation

use vn_nlp::segment;

let sentences = segment("Hôm nay trời đẹp. Tôi đi chơi.").unwrap();
assert_eq!(sentences.len(), 2);

Feature Flags

Feature	Default	Description
`tokenize`	✅	Word tokenization
`normalize`	✅	Unicode normalization & diacritics
`segment`	✅	Sentence segmentation
`dictionary`	❌	Dictionary-based word segmentation

# Chỉ dùng tokenizer
vn-nlp = { version = "0.1", default-features = false, features = ["tokenize"] }

Documentation

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT License (LICENSE-MIT)

at your option.

vn-nlp 0.1.3

🦀 vn-nlp

Quick Start

Tokenize

Normalize

Sentence Segmentation

Feature Flags

Documentation

License