budoux-phf-rs
Rust implementation of BudouX, the machine learning-based line break organizer tool.
Features
- Zero runtime dictionary loading: Uses PHF (Perfect Hash Functions) to embed dictionaries as compile-time lookup tables
- Fast and efficient: PHF provides O(1) lookup with minimal memory overhead
- No external dependencies at runtime: All data is baked into the binary
- Multiple language support: Japanese (ja), Simplified Chinese (zh-hans), Traditional Chinese (zh-hant), Thai (th)
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Usage
Basic Usage
use Parser;
Other Languages
use Parser;
Custom Model
use ;
// You can use `codegen` to convert from json to a model.
const MY_MODEL: Mode = Model ;
static UW1: ScoureMap = Map ;
static UW2: ScoureMap = Map ;
...
Feature Flags
By default, all language models are included. You can select specific languages to reduce binary size:
[]
# Include only Japanese
= { = "0.1", = false, = ["ja"] }
# Include Japanese and Simplified Chinese
= { = "0.1", = false, = ["ja", "zh_hans"] }
Available features:
| Feature | Language | Description |
|---|---|---|
ja |
Japanese | Japanese model |
zh_hans |
Simplified Chinese | Simplified Chinese model |
zh_hant |
Traditional Chinese | Traditional Chinese model |
th |
Thai | Thai model |
Build model
$ cargo run -p codegen <path/to/budoux/budoux/models> lib/src/
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.