compact-enc-det 0.1.0

Rust bindings for Compact Encoding Detection (CED) library - detect character encodings in text
Documentation

compact-enc-det

Crates.io Documentation License: MIT

High-level Rust bindings for Compact Encoding Detection (CED) - a library for detecting character encodings in text.

Features

  • Safe, ergonomic Rust API
  • Type-safe enums for encodings, languages, and corpus types
  • Support for 75+ character encodings
  • Provides reliability confidence scores
  • Optional hints for improved detection accuracy

Installation

[dependencies]
compact-enc-det = "0.1"

Usage

use compact_enc_det::{detect_encoding, DetectHints, Encoding};

fn main() {
    let text = "Hello, world! 你好世界";
    let detection = detect_encoding(text.as_bytes(), DetectHints::default());

    println!("Detected: {:?}", detection.encoding);
    println!("MIME name: {}", detection.mime_name);
    println!("Reliable: {}", detection.is_reliable);

    assert_eq!(detection.encoding, Encoding::UTF8);
}

Detection with Hints

use compact_enc_det::{detect_encoding, DetectHints, Language, TextCorpusType};

let hints = DetectHints {
    url_hint: "https://example.jp/page.html",
    language_hint: Some(Language::JAPANESE),
    corpus_type: TextCorpusType::WEB_CORPUS,
    ..Default::default()
};

let detection = detect_encoding(japanese_text, hints);

Documentation

For complete documentation, see docs.rs/compact-enc-det.

License

MIT License - see LICENSE file for details.

The underlying C++ library is licensed under Apache License 2.0.