Expand description
This crate provides Rust bindings for OpenNMT/CTranslate2.
This crate provides the following:
- Rust bindings for
Translator and
Generator provided by
CTranslate2, specifically
translator::Translator
andgenerator::Generator
. - More user-friendly versions of these,
Translator
andGenerator
, which incorporate tokenizers for easier handling.
§Tokenizers
Both translator::Translator
and generator::Generator
work with sequences of tokens.
To handle human-readable strings, a tokenizer is necessary.
The Translator
and Generator
utilize Hugging Face and SentencePiece tokenizers
to convert between strings and token sequences.
The auto::Tokenizer
automatically determines which tokenizer to use and constructs it
appropriately.
§Example:
§auto::Tokenizer
Here is an example of using auto::Tokenizer
to build a Translator and translate a string:
use ct2rs::config::Config;
use ct2rs::Translator;
// Translator::new creates a translator instance with auto::Tokenizer.
let t = Translator::new("/path/to/model", &Config::default())?;
let res = t.translate_batch(
&vec!["Hallo World!"],
&Default::default(),
None,
)?;
for r in res {
println!("{:?}", r);
}
§tokenizers::Tokenizer
The following example translates English to German and Japanese using the tokenizer provided by
the Hugging Face’s tokenizers
crate.
use ct2rs::{TranslationOptions, Translator};
use ct2rs::config::Config;
use ct2rs::tokenizers::Tokenizer;
let path = "/path/to/model";
let t = Translator::with_tokenizer(&path, Tokenizer::new(&path)?, &Config::default())?;
let res = t.translate_batch_with_target_prefix(
&vec![
"Hello world!",
"This library provides Rust bindings for CTranslate2.",
],
&vec![vec!["deu_Latn"], vec!["jpn_Jpan"]],
&TranslationOptions {
return_scores: true,
..Default::default()
},
None
)?;
for r in res {
println!("{}, (score: {:?})", r.0, r.1);
}
§sentencepiece::Tokenizer
The following example generates text using the tokenizer provided by Sentencepiece crate.
use ct2rs::config::{Config, Device};
use ct2rs::{Generator, GenerationOptions};
use ct2rs::sentencepiece::Tokenizer;
let path = "/path/to/model";
let g = Generator::with_tokenizer(&path, Tokenizer::new(&path)?, &Config::default())?;
let res = g.generate_batch(
&vec!["prompt"],
&GenerationOptions::default(),
None,
)?;
for r in res {
println!("{:?}", r.0);
}
§Supported Models
The ct2rs
crate has been tested and confirmed to work with the following models:
- BART
- BLOOM
- FALCON
- Marian-MT
- MPT
- NLLB
- GPT-2
- GPT-J
- OPT
- T5
Please see the respective examples for each model.
§Stream API
This crate also offers a streaming API that utilizes callback closures. Please refer to the example code for more information.
Re-exports§
pub use crate::config::set_log_level;
pub use crate::config::set_random_seed;
pub use crate::generator::GenerationOptions;
pub use crate::translator::TranslationOptions;
Modules§
- This module provides a tokenizer that automatically determines the appropriate tokenizer.
- This module provides a tokenizer based on the Byte Pair Encoding (BPE) model.
- Configs and associated enums.
- This module provides Rust bindings for the
ctranslate2::Generator
. - A module for utilizing the tokenizer based on Sentencepiece crate.
- A module for utilizing the tokenizer provided by the Hugging Face’s
tokenizers
crate. - This module provides a Rust binding to the
ctranslate2::Translator
.
Structs§
- The result for a single generation step.
- A text generator with a tokenizer.
- A text translator with a tokenizer.
Traits§
- Defines the necessary functions for a tokenizer.