opencc-sys 0.4.3+1.3.1

OpenCC bindings for Rust
Documentation
# OpenCC Plugins

`plugins/` contains segmentation plugins that are built and distributed
separately from the OpenCC core library.

The current plugin layout is:

- `plugins/<name>/src/`: plugin implementation and exported entry point
- `plugins/<name>/include/`: plugin-private headers
- `plugins/<name>/data/config/`: plugin-backed JSON configs
- `plugins/<name>/data/<resource-dir>/`: plugin resource files
- `plugins/<name>/tests/`: integration tests for built plugin artifacts

## Design Rules

- The OpenCC core keeps built-in algorithms only. `mmseg` remains built in.
- Any non-built-in `segmentation.type` is resolved through the plugin host.
- A single JSON config must stay platform-neutral. Config files must not embed
  `.so`, `.dylib`, `.dll`, or platform-specific install paths.
- Plugin resources belong to the plugin package, but resource names stay inside
  the normal OpenCC data layout.
- Plugin tests should validate real built artifacts instead of directly testing
  private implementation classes.

## Naming And Installation

Runtime naming follows the segmentation type:

- Linux: `libopencc-<type>.so`
- macOS: `libopencc-<type>.dylib`
- Windows: `opencc-<type>.dll`

Windows loaders also accept the MSYS/MinGW runtime name
`msys-opencc-<type>.dll` when that is the emitted DLL filename.
On Windows, plugins must be built with an ABI-compatible toolchain/runtime as
the host OpenCC binary. Mixing MSVC-built hosts with MinGW-built plugins, or
the reverse, is unsupported.

For the `jieba` plugin, that means:

- Linux: `libopencc-jieba.so`
- macOS: `libopencc-jieba.dylib`
- Windows: `opencc-jieba.dll`

MSYS/MinGW builds may emit `msys-opencc-jieba.dll`, which is also accepted by
the loader.

CMake installs plugin binaries into the platform plugin directory and installs
plugin configs/resources into the OpenCC data directory.
Within a single plugin search directory, keep only one DLL for a given
segmentation type. On Windows this applies to both `opencc-<type>.dll` and
`msys-opencc-<type>.dll`. Multiple matching DLL names for the same type in one
search directory are treated as an error.

## Resource Resolution

Plugin JSON uses resource names rather than platform paths. Example:

```json
{
  "segmentation": {
    "type": "jieba",
    "resources": {
      "dict_path": "jieba_dict/jieba.dict.utf8",
      "model_path": "jieba_dict/hmm_model.utf8",
      "user_dict_path": "jieba_dict/user.dict.utf8"
    }
  }
}
```

The core passes these values to the plugin host. The plugin is responsible for
resolving them at runtime. Relative resource paths are expected to resolve
within the existing OpenCC data layout rather than a plugin-specific ad hoc
directory tree.

## Segmentation ABI

The current segmentation plugin ABI entry point is:

- `opencc_get_segmentation_plugin_v2()`

Segmentation results are returned as a sequence of segment lengths measured in
Unicode code points, not as copied token strings. The ABI contract is:

- input text is passed to the plugin as null-terminated UTF-8
- the plugin returns `segment_count` plus `codepoint_lengths`
- each element is the number of Unicode code points in the next segment
- lengths must be positive and must cover the full input, in order
- the host reconstructs segment boundaries from the original UTF-8 input

This keeps the ABI simpler and avoids allocating one string per token across
the plugin boundary.

### Customizing Jieba Dictionaries

When using the `jieba` plugin, you can add custom terminology to the segmenter by defining a custom `user.dict.utf8` or editing the installed one. 

Custom dictionaries must be encoded in UTF-8. Each line follows the format: `[Word] [Frequency] [Part-of-Speech]`, separated by spaces. The frequency and POS tags are optional.

Example:
```text
云计算 5 n
机器学习 8 n
区块链 10 nz
```

## Testing

Each plugin should prefer integration tests that exercise:

- the built `opencc` command or `libopencc`
- the built plugin shared library
- real plugin JSON configs
- real installed or runfiles-based resource files

Current `jieba` targets:

- CMake plugin target: `opencc_jieba`
- CMake integration test: `JiebaPluginIntegrationTest`
- Bazel plugin target: `//plugins/jieba:opencc-jieba`
- Bazel integration test: `//plugins/jieba:jieba_plugin_integration_test`

## Adding A New Plugin

1. Create `plugins/<name>/src`, `include`, `data`, and `tests`.
2. Export `opencc_get_segmentation_plugin_v2()`.
3. Name the output binary using the `opencc-<type>` convention.
4. Keep JSON configs platform-neutral and resource-oriented.
5. Add both CMake and Bazel build rules.
6. Add an integration test that loads the built plugin through the real host.

## Packaging for Distro Maintainers

To align with downstream Linux distribution packaging standards (e.g., Debian
`apt`, Arch `pacman`), OpenCC plugins support decoupled compilation. This lets
maintainers build and distribute the core `opencc` system separately from
heavier third-party plugins such as `opencc-jieba`.

### 1. Build And Install Core OpenCC

Compile the main tree normally, but disable the optional `jieba` plugin:

```bash
mkdir build_core && cd build_core
cmake .. -DBUILD_OPENCC_JIEBA_PLUGIN=OFF -DCMAKE_INSTALL_PREFIX=/usr
make && make install
```

### 2. Build The Plugin Standalone

Plugins can detect standalone builds automatically. Build from the plugin
directory and point `OpenCC_DIR` at the installed OpenCC CMake package:

```bash
cd plugins/jieba
mkdir build_plugin && cd build_plugin
cmake .. -DOpenCC_DIR=/usr/lib/cmake/opencc -DCMAKE_INSTALL_PREFIX=/usr
make && make install
```

Standalone default installation paths are intended to align with the core
OpenCC layout:

- Windows: `bin/plugins`
- Linux/macOS: `lib/opencc/plugins`