Crate ct2rs

source ·
Expand description

This crate provides Rust bindings for OpenNMT/CTranslate2.

This crate provides the following:

§Tokenizers

Both translator::Translator and generator::Generator work with sequences of tokens. To handle human-readable strings, a tokenizer is necessary. The Translator and Generator utilize Hugging Face and SentencePiece tokenizers to convert between strings and token sequences. The auto::Tokenizer automatically determines which tokenizer to use and constructs it appropriately.

§Example:

§auto::Tokenizer

Here is an example of using auto::Tokenizer to build a Translator and translate a string:

use ct2rs::config::Config;
use ct2rs::Translator;

// Translator::new creates a translator instance with auto::Tokenizer.
let t = Translator::new("/path/to/model", &Config::default())?;
let res = t.translate_batch(
    &vec!["Hallo World!"],
    &Default::default(),
    None,
)?;
for r in res {
    println!("{:?}", r);
}

§tokenizers::Tokenizer

The following example translates English to German and Japanese using the tokenizer provided by the Hugging Face’s tokenizers crate.


use ct2rs::{TranslationOptions, Translator};
use ct2rs::config::Config;
use ct2rs::tokenizers::Tokenizer;

let path = "/path/to/model";
let t = Translator::with_tokenizer(&path, Tokenizer::new(&path)?, &Config::default())?;
let res = t.translate_batch_with_target_prefix(
    &vec![
        "Hello world!",
        "This library provides Rust bindings for CTranslate2.",
    ],
    &vec![vec!["deu_Latn"], vec!["jpn_Jpan"]],
    &TranslationOptions {
        return_scores: true,
        ..Default::default()
    },
    None
)?;
for r in res {
    println!("{}, (score: {:?})", r.0, r.1);
}

§sentencepiece::Tokenizer

The following example generates text using the tokenizer provided by Sentencepiece crate.

use ct2rs::config::{Config, Device};
use ct2rs::{Generator, GenerationOptions};
use ct2rs::sentencepiece::Tokenizer;

let path = "/path/to/model";
let g = Generator::with_tokenizer(&path, Tokenizer::new(&path)?, &Config::default())?;
let res = g.generate_batch(
    &vec!["prompt"],
    &GenerationOptions::default(),
    None,
)?;
for r in res {
    println!("{:?}", r.0);
}

§Supported Models

The ct2rs crate has been tested and confirmed to work with the following models:

  • BART
  • BLOOM
  • FALCON
  • Marian-MT
  • MPT
  • NLLB
  • GPT-2
  • GPT-J
  • OPT
  • T5

Please see the respective examples for each model.

§Stream API

This crate also offers a streaming API that utilizes callback closures. Please refer to the example code for more information.

Re-exports§

Modules§

Structs§

Traits§

  • Defines the necessary functions for a tokenizer.