budouy 0.2.1

Rust port of BudouX with optional HTML processing and CLI
Documentation

budouy

License Crates.io Docs.rs CI Demo

Rust port of BudouX with optional HTML processing, WebAssembly support, and a small CLI.

Try the live demo

Features

  • std: default feature for std-enabled builds.
  • alloc: no_std-compatible build using alloc and hashbrown.
  • vendored-models: bundles default Japanese, Simplified Chinese, Traditional Chinese, and Thai models.
  • html: enables HTML processing utilities based on kuchikikiki (requires std).
  • cli: enables the budouy CLI (requires std, implies vendored-models).
  • wasm: enables WebAssembly bindings via wasm-bindgen (implies alloc and vendored-models).

Note: std and alloc are mutually exclusive.

Usage

Library

Custom model:

use std::collections::HashMap;
use budouy::{Model, Parser};
use budouy::model::FeatureKey;

let mut model: Model = HashMap::new();
model.insert(FeatureKey::UW4, HashMap::from([("a".to_string(), 10_000)]));

let parser = Parser::new(model);
let chunks = parser.parse("abcdeabcd");
assert_eq!(chunks, vec!["abcde", "abcd"]);

Default model (requires vendored-models):

use budouy::model::load_default_japanese_parser;

let parser = load_default_japanese_parser();
let chunks = parser.parse("今日は良い天気です");
println!("{:?}", chunks);

HTML processing (requires html + vendored-models):

use budouy::HTMLProcessingParser;
use budouy::model::load_default_japanese_parser;

let parser = load_default_japanese_parser();
let html_parser = HTMLProcessingParser::new(parser, None);
let input = "今日は<strong>良い</strong>天気です";
let output = html_parser.translate_html_string(input);
println!("{}", output);

WebAssembly

Build for web (requires wasm-pack):

wasm-pack build --target web --no-default-features --features wasm

Use from JavaScript:

import init, { BudouY } from './pkg/budouy.js';

await init();

const parser = BudouY.japanese();
const chunks = parser.parse("今日は良い天気です");
console.log(chunks); // ["今日は", "良い", "天気です"]

// Other languages
const zhHans = BudouY.simplifiedChinese();
const zhHant = BudouY.traditionalChinese();
const thai = BudouY.thai();

CLI

Build and run the CLI (requires cli):

cargo run --features cli -- parse --lang ja "今日は良い天気です"

Use a custom model JSON:

cargo run --features cli -- parse --model ./model.json "今日は良い天気です"

Read from stdin:

echo "今日は良い天気です" | cargo run --features cli -- parse --lang ja

no_std

This crate supports no_std with alloc. Disable default features and enable alloc:

budouy = { version = "0.1", default-features = false, features = ["alloc"] }

std and alloc are mutually exclusive. The html and cli features require std.

Models

Vendored models in src/models/*.json are derived from the original BudouX project (Google) and are licensed under Apache-2.0. See LICENSE for details. This project is not affiliated with Google.

License

Apache-2.0. See LICENSE.