Crate auto_encoder

source
Expand description

§Auto Encoder

auto_encoder is a Rust library designed to automatically detect and encode various text and binary file formats, along with specific language encodings.

§Features

  • Automatic Encoding Detection: Detects text encoding based on locale or content.
  • Binary Format Detection: Checks if a given file is a known binary format by inspecting its initial bytes.
  • HTML Language Detection: Extracts and detects the language of an HTML document from its content.

§Usage

Here’s a quick example to get you started:

§Encoding Detection

Automatically detect the encoding for a given locale:

use auto_encoder::encoding_for_locale;

let encoding = encoding_for_locale("ja-jp").unwrap();
println!("Encoding for Japanese locale: {:?}", encoding);

Encode bytes from a given HTML content and language:

use auto_encoder::encode_bytes_from_language;

let html_content = b"\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF\xE3\x80\x81\xE4\xB8\x96\xE7\x95\x8C\xEF\xBC\x81";
let encoded = encode_bytes_from_language(html_content, "ja");
println!("Encoded content: {}", encoded);

§Binary Format Detection

Check if a given file content is a known binary format:

use auto_encoder::is_binary_file;

let file_content = &[0xFF, 0xD8, 0xFF]; // JPEG file signature
let is_binary = is_binary_file(file_content);
println!("Is the file a known binary format? {}", is_binary);

§HTML Language Detection

Detect the language attribute from an HTML document:

use auto_encoder::detect_language;

let html_content = br#"<html lang="en"><head><title>Test</title></head><body></body></html>"#;
let language = detect_language(html_content).unwrap();
println!("Language detected: {}", language);

Statics§

Functions§

  • Detect the language of a HTML resource. This does nothing without the “encoding” flag enabled.
  • Get the content with proper encoding. Pass in a proper encoding label like SHIFT_JIS.
  • Get the content with proper encoding from a language. Pass in a proper language like “ja”. This does nothing without the “encoding” flag.
  • Get encoding for the locale if found
  • Checks if the file is a known binary format using its initial bytes.