zantetsu 0.1.3

Ultra-fast, intelligent library for anime metadata extraction and normalization
Documentation

Zantetsu

Ultra-fast, intelligent library for anime metadata extraction and normalization.

Features

  • Heuristic Parsing: Regex-based parsing for fast, reliable extraction
  • Neural CRF: DistilBERT + CRF model for accurate sequence labeling
  • Character CNN: CNN + BiLSTM + CRF for robust character-level parsing (in development)
  • Semantic Search: HNSW vector index for title matching
  • Quality Scoring: Configurable quality profiles for release validation

Quick Start

use zantetsu::{EpisodeSpec, Zantetsu};

let engine = Zantetsu::new()?;
let result = engine.parse("[SubsPlease] Cowboy Bebop - 01 [1080p][HEVC].mkv")?;

assert_eq!(result.title.as_deref(), Some("Cowboy Bebop"));
assert_eq!(result.group.as_deref(), Some("SubsPlease"));
assert_eq!(result.episode, Some(EpisodeSpec::Single(1)));
assert!(result.resolution.is_some());
# Ok::<(), Box<dyn std::error::Error>>(())

Architecture

Zantetsu combines multiple parsing strategies:

  1. Heuristic Parser — Fast regex-based parsing (production-ready, 92.38% accuracy)
  2. Neural CRF — DistilBERT + CRF with Viterbi decoding (early stage)
  3. Character CNN — Lightweight CNN + BiLSTM + CRF with RAD augmentations (in development)

The engine automatically selects the best parser based on availability and confidence.

Crates

  • zantetsu-core — Parsing engine (heuristic + neural + character CNN)
  • zantetsu-vecdb — Semantic vector search with HNSW
  • zantetsu-trainer — Model training and RLAIF workflows
  • zantetsu-ffi — Multi-language bindings (TypeScript, Python, C/C++)