Expand description
From the transbot crate you can build instance of translation robot to translate documents (currently HTML/EPUB/MarkDown/TEXT is supported) by interact with an AI LLM (Large Language Model).
Resuming is possible. You need to call TransBot::set_resuming_support to enable it.
And to support saving middle state for later resuming when interrupting by Ctrl+C,
you need to capture the system signal and call TransBot::set_interrupted to notify
the library to know it so that it can save the middle state and quit the current job.
And notice below.
Interrupting check is not performed in middle of file IO or an interaction with the LLM.
but only between such actions.
Files like <dest_path>.temp[.x] are used to save the middle state, and no resuming is
performed if they are removed.
TransConfig::syntax_strategy (and also TransConfig::text_chunk_size in ‘bytransbot’ case)
needs to be consistent for resuming to work.
For all supported formats supported except EPUB (but including HTML in EPUB), you can use ‘whole_doc_to_llm’ option to tell transbot to send the whole document to LLM to translate without being parsed or splitted by transbot.
The syntax strategy makes sense only for HTML/MarkDown, and ‘stripped’ strategy is not supported yet for MarkDown.
Below is an example of how to use the library crate.
use anyhow::Error;
use transbot::{LlmConfig, LlmProvider, PromptHint, SyntaxStrategy, TransBot, TransConfig};
fn main() -> Result<(), Error> {
let llm_config = LlmConfig::new("translategemma:4b", LlmProvider::OLLAMA { full_url: None });
let mut prompt_hint = PromptHint::new();
prompt_hint.set_topic("Rust programming").set_extra_prompt(
"Follow below term translation: \n\
trait: 特型",
);
let mut trans_config = TransConfig::new();
trans_config
.set_dest_lang("Chinese")
.set_html_elem_selector("p,h1,h2,h3,li,code[class=\"c\"]")
.set_syntax_strategy(SyntaxStrategy::MaintainedByTransBot)
.set_prompt_hint(prompt_hint)
.set_clean_cjk_ascii_spacing(true)
.set_print_translating_text(true);
let transbot = TransBot::new(&llm_config, &trans_config)?;
transbot.translate_html_file("example.html", None)
}Structs§
- LlmConfig
- The configuration for LLM interactions.
- Prompt
Hint - The prompt hint.
- Trans
Bot - The translation robot.
- Trans
Config - The configuration for translation.
Enums§
- DocFormat
- LlmApi
Style - The API style of the LLM, which defines the message structure during interacting with the LLM. Most LLM provides provide openai-compatible API (although its full service URL is slightly differrent from the one for its native API). Please refer to the API documents of your LLM provider if needed.
- LlmProvider
- The LLM provider. For ollama providers, an optional full service URL may be provided, and ‘http://localhost:11434/api/chat’ is used if it’s omitted. For custom providers, the api sytle and the full service URL must be provided.
- Syntax
Strategy - The strategy to maintain the syntax defined by sub elements of selected elements in the
document. None of the options here is ideally perfect. Which one is suitable depends on
the LLM’s strenth to maintain the HTML tags and how much LLM tokens you want to spend, and
whether losing the syntax is acceptable.
For example, forSee <a href="a_long_link">the blog</a> for details.text in the paragraph to translate, the behavior of each variant is explained in its document.