inputx_pinyin_data_bigrams/lib.rs
1//! `inputx-pinyin-data-bigrams` — embedded bigram FSAs for the
2//! [`inputx-pinyin`](https://crates.io/crates/inputx-pinyin) engine.
3//!
4//! Pure data crate: two `pub const` byte slices via `include_bytes!`,
5//! zero dependencies, `#![no_std]` clean. Split out of `inputx-pinyin`
6//! in v1.4.7 sub-phase B (Strategy C) so the facade publishes light
7//! and consumers can opt out via the facade's `bigrams` feature.
8//!
9//! Ships two FSAs:
10//!
11//! - [`EMBEDDED_BIGRAMS`] — inter-token word bigrams (`<prev_word>\0
12//! <next_word>` where both are distinct jieba tokens adjacent in
13//! the source corpus). Sole input to next-word prediction in the
14//! facade.
15//! - [`EMBEDDED_BIGRAMS_INTRA`] — intra-token char bigrams
16//! (`<a>\0<b>` for adjacent chars *inside* one jieba token).
17//! Helps Viterbi composition prefer known phrases; never used for
18//! next-word prediction.
19
20#![no_std]
21
22/// Inter-token word bigram FSA, in the
23/// [`inputx_fsa::Fsa`](https://docs.rs/inputx-fsa) binary format.
24pub const EMBEDDED_BIGRAMS: &[u8] =
25 include_bytes!("../data/bigrams.fsa");
26
27/// Intra-token char bigram FSA, in the
28/// [`inputx_fsa::Fsa`](https://docs.rs/inputx-fsa) binary format.
29pub const EMBEDDED_BIGRAMS_INTRA: &[u8] =
30 include_bytes!("../data/bigrams_intra.fsa");