1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
//! Translation pipeline.
//!
//! Translates an arbitrary natural-language fragment from one language to
//! another by running the full
//! `source → formalize → semantic meta language → deformalize → target`
//! flow on top of Wikipedia, Wikidata and Wiktionary. There is no
//! pre-extracted translation table built into the binary: every answer is
//! the result of an actual API round-trip (live or replayed from the
//! seeded raw-response cache).
//!
//! 1. **Formalize** — fetch the source-edition Wiktionary page and the
//! Wikidata Lexeme / Q-item that backs the surface so the surface
//! collapses to a language-neutral [`MeaningId`].
//!
//! 2. **Deformalize** — render that [`MeaningId`] back into the target
//! language by joining on Wikidata `P5137` ("item for this sense") and
//! by parsing translation tables (`{{trans-top}}`, `{{перев-блок}}`,
//! `=== Translations ===` / `=== Перевод ===`) on either the source-
//! or target-edition Wiktionary page.
//!
//! Every successful HTTP response is preserved verbatim under
//! [`cache::DEFAULT_CACHE_DIR`] keyed by **semantic identity** of the
//! resource (Wikidata Q-id, Wiktionary `(lang, page)`, SPARQL query hash,
//! …) so a single fetch can feed translation, fact lookup, attribute
//! formalization or any other formalization path. The first ~128 most
//! frequent Wikidata entities and ~128 most frequent properties — plus
//! the Wiktionary pages they point at — are committed to the repository
//! under [`cache::SEED_CACHE_DIR`] so unit tests, the browser worker and
//! a clean CI checkout can all run the full pipeline offline without
//! hitting the network. Live fetches are gated on
//! `FORMAL_AI_LIVE_API=1`.
//!
//! ## Module layout
//!
//! - [`http`] — `curl`-backed HTTP client; mirrors [`crate::telegram_runtime`]
//! so we don't pull a TLS crate into the core.
//! - [`cache`] — semantic-identity file cache for raw API responses, with
//! support for replaying responses from a committed `.lino` seed bundle.
//! - [`meaning`] — [`MeaningId`], the semantic meta-language identity.
//! - [`wiktionary`] — Wiktionary client + wikitext parser.
//! - [`wikidata`] — Wikidata SPARQL + Lexeme / entity / property client.
//! - [`formatting`] — typography mirror (case + terminal punctuation).
//! - [`pipeline`] — orchestration (`TranslationPipeline::translate`).
//!
//! ## Default wiring
//!
//! Most callers want a process-wide translator that consults the seeded
//! raw-response cache first and falls through to live HTTP only when
//! `FORMAL_AI_LIVE_API=1`. Use [`translate_via_default_pipeline`] for that.
use OnceLock;
pub use CachedHttpClient;
pub use match_source_formatting;
pub use ;
pub use ;
pub use ;
pub use extract_unquoted_translation_surface;
/// Process-wide cached HTTP client used by the default pipeline.
///
/// The client reads from the committed seed cache and the gitignored
/// local accelerator under `data/` first and falls through to the live
/// network only when `FORMAL_AI_LIVE_API=1` is set. This keeps unit
/// tests offline by default; integration runs that refresh the cache
/// opt in explicitly.
/// Translate `surface` from `source` to `target` using the default pipeline.
///
/// Uses the process-wide cached translator and returns the primary
/// candidate surface form along with the meaning id, so callers can
/// both render the answer and embed the meaning id in their trace.
///
/// Errors propagate as [`HttpError`]; the caller decides whether to render
/// a placeholder or surface the error to the user.