Zantetsu
Ultra-fast, intelligent library for anime metadata extraction and normalization.
Features
- Heuristic Parsing: Regex-based parsing for fast, reliable extraction
- Neural CRF: DistilBERT + CRF model for accurate sequence labeling
- Character CNN: CNN + BiLSTM + CRF for robust character-level parsing (in development)
- Semantic Search: HNSW vector index for title matching
- Quality Scoring: Configurable quality profiles for release validation
Quick Start
use ;
let engine = new?;
let result = engine.parse?;
assert_eq!;
assert_eq!;
assert_eq!;
assert!;
# Ok::
Architecture
Zantetsu combines multiple parsing strategies:
- Heuristic Parser — Fast regex-based parsing (production-ready, 92.38% accuracy)
- Neural CRF — DistilBERT + CRF with Viterbi decoding (early stage)
- Character CNN — Lightweight CNN + BiLSTM + CRF with RAD augmentations (in development)
The engine automatically selects the best parser based on availability and confidence.
Crates
zantetsu-core— Parsing engine (heuristic + neural + character CNN)zantetsu-vecdb— Semantic vector search with HNSWzantetsu-trainer— Model training and RLAIF workflowszantetsu-ffi— Multi-language bindings (TypeScript, Python, C/C++)