Crate lindera_sqlite

Expand description

§lindera-sqlite

A SQLite FTS5 (Full-Text Search 5) tokenizer extension that provides support for Chinese, Japanese, and Korean (CJK) text analysis using the Lindera morphological analyzer.

§Features

CJK Language Support: Tokenizes Chinese, Japanese, and Korean text using Lindera
Multiple Dictionaries: Supports various embedded dictionaries (IPADIC, UniDic, ko-dic, CC-CEDICT)
Configurable: Uses YAML configuration for character filters and token filters
SQLite Integration: Seamlessly integrates with SQLite’s FTS5 full-text search

§Usage

§Building the Extension

cargo build --release --features=embedded-cjk

§Setting Up Configuration

Set the LINDERA_CONFIG_PATH environment variable to point to your Lindera configuration file:

export LINDERA_CONFIG_PATH=./resources/lindera.yml

§Loading in SQLite

.load ./target/release/liblindera_sqlite lindera_fts5_tokenizer_init

§Creating an FTS5 Table

CREATE VIRTUAL TABLE example USING fts5(content, tokenize='lindera_tokenizer');

§Searching

INSERT INTO example(content) VALUES ('日本語の全文検索');
SELECT * FROM example WHERE content MATCH '検索';

§Architecture

This library provides a C ABI interface for SQLite to use Lindera as a custom FTS5 tokenizer. The main components are:

load_tokenizer: Initializes a Lindera tokenizer with configuration
lindera_fts5_tokenize: C-compatible entry point for tokenization (called by SQLite)
Internal tokenization logic that converts text to tokens and calls back to SQLite

Structs§

Fts5Tokenizer: Wrapper for Lindera tokenizer used in FTS5.
TokenCallback: Convenience wrapper around SQLite’s token callback.

Constants§

SQLITE_INTERNAL: SQLite internal error status code.
SQLITE_MISUSE: SQLite misuse error status code.
SQLITE_OK: SQLite success status code.

Functions§

ffi_panic_boundary: Runs an operation behind a panic boundary suitable for the SQLite FFI.
lindera_fts5_tokenize: C-compatible FTS5 tokenization function.
load_tokenizer: Loads and initializes a Lindera tokenizer.

Type Aliases§

TokenFunction: Token callback function type.

Crate lindera_sqlite

Crate lindera_sqlite Copy item path

§lindera-sqlite

§Features

§Usage

§Building the Extension

§Setting Up Configuration

§Loading in SQLite

§Creating an FTS5 Table

§Searching

§Architecture

Structs§

Constants§

Functions§

Type Aliases§

Crate lindera_sqlite