inputx-pinyin-data-bigrams 1.4.0

Embedded bigram FSAs (inter-token + intra-token) for the inputx-pinyin engine — single byte slices, zero deps, include_bytes-loaded. Optional dependency of inputx-pinyin (default-on via the `bigrams` feature).
Documentation
# inputx-pinyin-data-bigrams

Embedded bigram FSAs for the
[`inputx-pinyin`](https://crates.io/crates/inputx-pinyin) engine.

```toml
[dependencies]
inputx-pinyin-data-bigrams = "1.4"
```

Pure data crate: two `pub const` byte slices via `include_bytes!`,
zero dependencies, `#![no_std]` clean. Split out of `inputx-pinyin`
in v1.4.7 sub-phase B (Strategy C — 3 `inputx-pinyin-data-*`
stones + facade umbrella) so the facade publishes light and
consumers who only need exact-syllable lookup can opt out via the
facade's `bigrams` feature.

## What's in the box

- **`EMBEDDED_BIGRAMS`** (~4.5 MB) — inter-token word bigram FSA:
  keys `<prev_word>\0<next_word>` where both ends are distinct
  jieba tokens adjacent in the source corpus. Sole input to the
  facade's `PinyinDict::bigram_boost` and next-word prediction
  paths.
- **`EMBEDDED_BIGRAMS_INTRA`** (~1.5 MB) — intra-token char bigram
  FSA: keys `<a>\0<b>` for adjacent characters *inside* one jieba
  token (e.g. `(你, 好)` captured from `你好`). Helps Viterbi
  composition prefer known phrases; never used for next-word
  prediction.

Both are in the
[`inputx-fsa::Fsa`]https://crates.io/crates/inputx-fsa binary
format.

## Usage

Almost always indirect — `inputx-pinyin`'s default-on `bigrams`
feature pulls this in and wires it through
`PinyinDict::bigram_boost`. Direct use is for custom runtimes:

```rust
use inputx_pinyin_data_bigrams::{EMBEDDED_BIGRAMS, EMBEDDED_BIGRAMS_INTRA};
use inputx_fsa::Fsa;

let bigrams = Fsa::new(EMBEDDED_BIGRAMS).expect("valid FSA");
if let Some(count) = bigrams.get(b"\xe4\xbd\xa0\xe5\xa5\xbd\x00\xe5\x90\x97") {
    println!("(你好, 吗) bigram count = {count}");
}
```

## API stability

- **`EMBEDDED_BIGRAMS` / `EMBEDDED_BIGRAMS_INTRA`** — module path
  stable for the 1.x line. Underlying bytes rebuild with each
  release as the upstream corpus / weight pipeline refreshes.
- **No public API beyond the two consts** — by design.

## License

Dual-licensed under MIT OR Apache-2.0. Bigram counts derive from
permissively-licensed corpora (Leipzig Corpora / SUBTLEX-CH-WF);
see [`inputx-pinyin`](https://crates.io/crates/inputx-pinyin) for
the attribution chain.