mecab-sys 0.1.0

FFI binding and safe wrappers of MeCab
Documentation
  • Coverage
  • 94.58%
    157 out of 166 items documented2 out of 8 items with examples
  • Size
  • Source code size: 7.9 MB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.33 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 15s Average build duration of successful builds.
  • all releases: 15s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • naughie/lita-tokenizers
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • naughie

mecab-sys

Rust FFI bindings for MeCab.

mecab-sys provides raw, unsafe FFI bindings to the MeCab (C++) library. This crate is designed to be completely self-contained: it bundles the MeCab source code, compiles it directly using the cc crate, and links it statically. You do not need to install MeCab on your system to use this crate.

Only 1-best prediction mode is supported now. If you want to train a model, or if you want to predict N-best solutions, still you have to use MeCab.

MeCab is configured with UTF-8 only.


Prerequisites

This crate does not bundle the MeCab model files. To actually use MeCab for text analysis, you must download (or train by yourself) a pre-trained model.

You will also need a standard C++ compiler (like g++ or clang++) installed on your system so the cc crate can build the bundled C++ source. To configure the compiler behavior, check the documentation of cc.

Installation

Add mecab-sys to your Cargo.toml:

[dependencies]
mecab-sys = "0.1.0"

Usage

Basic usage for Japanese morphological analysis:

use mecab_sys::Model;

fn main() {
    let model = Model::from_cli_arg(c"-d /path/to/your/dict -r /path/to/dictrc").unwrap();

    let tagger = model.new_tagger().unwrap();
    let mut lattice = model.new_lattice().unwrap();

    let mut lattice = lattice.set_sentence("すもももももももものうち");
    tagger.parse(&mut lattice).unwrap();

    for node in lattice.bos_node() {
        let surface = node.surface();
        let feat = node.feature();

        println!("{surface}: {feat}");
    }
}