ferrous-opencc 0.1.2

A pure Rust implementation of Open Chinese Convert (OpenCC), for fast and reliable conversion between Traditional and Simplified Chinese.
Documentation

Ferrous-opencc

CI Status Crates.io Docs.rs

A pure Rust implementation of the OpenCC project, dedicated to providing high-performance and reliable conversion between Traditional and Simplified Chinese.

中文 README

Features

  • High-Performance: Utilizes FST (Finite State Transducers) for efficient dictionary lookups, significantly outperforming HashMap-based implementations.
  • Pure Rust: No C++ dependencies. Implemented entirely in Rust.
  • Extensible: Supports loading custom OpenCC configuration files and dictionaries.
  • Comprehensive Tooling: Includes a command-line tool to compile text dictionaries into an efficient .ocb binary format.

Quick Start

Add ferrous-opencc to your Cargo.toml:

[dependencies]

ferrous-opencc = "*"

Directory Structure

This library loads dictionaries and configuration files from the local filesystem. You can use the complete set of dictionary files I've prepared, or compile your own and place them in the assets/dictionaries/ folder.

your-project/
├── assets/
│   ├── dictionaries/
│   │   ├── STPhrases.txt
│   │   ├── STCharacters.txt
│   │   ├── TPhrases.txt
│   │   └── ... (other .txt dictionary files)
│   └── s2t.json
└── src/
    └── main.rs

You can obtain these dictionary and configuration files from the official OpenCC repository.

Example

A basic example of converting Simplified Chinese to Traditional Chinese.

use ferrous_opencc::{OpenCC, Result};

fn main() -> Result<()> {
    // Create an OpenCC instance with a specific configuration file.
    let opencc = OpenCC::new("assets/s2t.json")?;

    // Convert text.
    let text = "“开放中文转换”是完全由 Rust 实现的。";
    let converted = opencc.convert(text);

    println!("{}", converted);
    // Expected output: 「開放中文轉換」是完全由 Rust 實現的。

    assert_eq!(converted, "「開放中文轉換」是完全由 Rust 實現的。");
    Ok(())
}

Command-Line Tool

This library provides a dictionary compilation tool. You can install it by enabling the compiler-tools feature.

cargo install ferrous-opencc --features compiler-tools

Then, you can compile text dictionaries into the binary .ocb format:

opencc-dict-compiler /path/to/STCharacters.txt

This will generate an STCharacters.ocb file in the same directory. The library will automatically use these .ocb files as a cache to speed up initial loading.

License

This project is licensed under the Apache-2.0 license.