Skip to main content

Crate transbot

Crate transbot 

Source
Expand description

From the transbot crate you can build instance of translation robot to translate documents (currently HTML/EPUB/MarkDown/TEXT is supported) by interact with an AI LLM (Large Language Model).

Resuming is possible. You need to call TransBot::set_resuming_support to enable it. And to support saving middle state for later resuming when interrupting by Ctrl+C, you need to capture the system signal and call TransBot::set_interrupted to notify the library to know it so that it can save the middle state and quit the current job. And notice below.
Interrupting check is not performed in middle of file IO or an interaction with the LLM. but only between such actions.
Files like <dest_path>.temp[.x] are used to save the middle state, and no resuming is performed if they are removed.
TransConfig::syntax_strategy (and also TransConfig::text_chunk_size in ‘bytransbot’ case) needs to be consistent for resuming to work.

For all supported formats supported except EPUB (but including HTML in EPUB), you can use ‘whole_doc_to_llm’ option to tell transbot to send the whole document to LLM to translate without being parsed or splitted by transbot.

The syntax strategy makes sense only for HTML/MarkDown, and ‘stripped’ strategy is not supported yet for MarkDown.

Below is an example of how to use the library crate.

use anyhow::Error;
use transbot::{LlmConfig, LlmProvider, PromptHint, SyntaxStrategy, TransBot, TransConfig};

fn main() -> Result<(), Error> {
    let llm_config = LlmConfig::new("translategemma:4b", LlmProvider::OLLAMA { full_url: None });
    let mut prompt_hint = PromptHint::new();
    prompt_hint.set_topic("Rust programming").set_extra_prompt(
        "Follow below term translation: \n\
        trait: 特型",
    );
    let mut trans_config = TransConfig::new();
    trans_config
        .set_dest_lang("Chinese")
        .set_html_elem_selector("p,h1,h2,h3,li,code[class=\"c\"]")
        .set_syntax_strategy(SyntaxStrategy::MaintainedByTransBot)
        .set_prompt_hint(prompt_hint)
        .set_clean_cjk_ascii_spacing(true)
        .set_print_translating_text(true);
    let transbot = TransBot::new(&llm_config, &trans_config)?;
    transbot.translate_html_file("example.html", None)
}

Structs§

LlmConfig
The configuration for LLM interactions.
PromptHint
The prompt hint.
TransBot
The translation robot.
TransConfig
The configuration for translation.

Enums§

DocFormat
LlmApiStyle
The API style of the LLM, which defines the message structure during interacting with the LLM. Most LLM provides provide openai-compatible API (although its full service URL is slightly differrent from the one for its native API). Please refer to the API documents of your LLM provider if needed.
LlmProvider
The LLM provider. For ollama providers, an optional full service URL may be provided, and ‘http://localhost:11434/api/chat’ is used if it’s omitted. For custom providers, the api sytle and the full service URL must be provided.
SyntaxStrategy
The strategy to maintain the syntax defined by sub elements of selected elements in the document. None of the options here is ideally perfect. Which one is suitable depends on the LLM’s strenth to maintain the HTML tags and how much LLM tokens you want to spend, and whether losing the syntax is acceptable.
For example, for See <a href="a_long_link">the blog</a> for details. text in the paragraph to translate, the behavior of each variant is explained in its document.