Skip to main content

Crate lindera_wasm

Crate lindera_wasm 

Source
Expand description

§lindera-wasm

WebAssembly bindings for Lindera, a morphological analysis library.

This crate provides WASM bindings that enable Japanese, Korean, and Chinese text tokenization in web browsers and Node.js environments.

§Features

  • Multiple dictionaries: IPADIC, UniDic (Japanese), ko-dic (Korean), CC-CEDICT (Chinese)
  • Flexible tokenization modes: Normal and decompose modes
  • Character filters: Unicode normalization and more
  • Token filters: Lowercase, compound word handling, number normalization
  • Custom user dictionaries: Support for user-defined dictionaries

§Usage

§Web (Browser)

import __wbg_init, { TokenizerBuilder } from 'lindera-wasm-web-ipadic';

__wbg_init().then(() => {
    const builder = new TokenizerBuilder();
    builder.set_dictionary("embedded://ipadic");
    builder.set_mode("normal");

    const tokenizer = builder.build();
    const tokens = tokenizer.tokenize("関西国際空港");
    console.log(tokens);
});

§Node.js

const { TokenizerBuilder } = require('lindera-wasm-nodejs-ipadic');

const builder = new TokenizerBuilder();
builder.set_dictionary("embedded://ipadic");
builder.set_mode("normal");

const tokenizer = builder.build();
const tokens = tokenizer.tokenize("関西国際空港");
console.log(tokens);

Re-exports§

pub use crate::dictionary::JsDictionary as Dictionary;
pub use crate::dictionary::JsUserDictionary as UserDictionary;
pub use crate::error::JsLinderaError as LinderaError;
pub use crate::metadata::JsCompressionAlgorithm as CompressionAlgorithm;
pub use crate::metadata::JsMetadata as Metadata;
pub use crate::mode::JsMode as Mode;
pub use crate::mode::JsPenalty as Penalty;
pub use crate::schema::JsFieldDefinition as FieldDefinition;
pub use crate::schema::JsFieldType as FieldType;
pub use crate::schema::JsSchema as Schema;
pub use crate::segmenter::JsSegmenter as Segmenter;
pub use crate::token::JsToken as Token;
pub use crate::tokenizer::Tokenizer;
pub use crate::tokenizer::TokenizerBuilder;

Modules§

character_filter
dictionary
error
metadata
mode
schema
segmenter
token
token_filter
tokenizer

Functions§

get_version
Gets the version of the lindera-wasm library. Backward compatibility alias for version().
py_build_dictionary
py_build_user_dictionary
py_load_dictionary
py_load_user_dictionary
version
Returns the version of the lindera-wasm package.