# ๐ jmdict-fast
> **Blazing-fast, Japanese dictionary engine**
[](https://www.rust-lang.org)
[](LICENSE)
[]()
> **Note:** This crate uses [bunpo](https://github.com/theGlenn/jmdict-fst/tree/main/bunpo) for Japanese conjugation handling. Both crates are part of the same monorepo but are published separately to crates.io.
---
## โจ Features
- **๐พ Compile-time indexed data** โ FST + binary blob for maximum efficiency
- **โก Instant lookups** โ O(log n) exact matching across all writing systems
- **๐ Multimodal search** โ Kanji, kana, and romaji support
- **๐ฆ Ergonomic Rust API** โ Usable as a library or binary
- **๐ชถ Tiny binary** โ Zero runtime parsing, no allocations during lookup
- **๐ฏ Memory-mapped** โ Zero-copy access to all dictionary data
---
## ๐๏ธ Performance at a Glance
| **Index Size** | ~888KB (FSTs) |
| **Data Size** | 16MB binary blob |
| **Entries** | 22,569 |
| **Unique Keys** | 24,342 |
| **Lookup Speed** | O(log n), instant |
| **Memory Usage** | Memory-mapped, zero allocations |
---
## ๐ Quick Start
### Building the Dictionary
```bash
cargo build
```
This creates:
- `OUT_DIR/kanji.fst` โ Kanji lookup index
- `OUT_DIR/kana.fst` โ Kana lookup index
- `OUT_DIR/romaji.fst` โ Romaji lookup index
- `OUT_DIR/entries.bin` โ Binary blob with all entries
### Using the Library
#### Search - Prefix
```rust
use jmdict_fast::Dict;
fn main() -> anyhow::Result<()> {
let dict = Dict::load_default()?;
let results = dict.lookup_partial("ใใใซ");
for entry in &results {
println!("Found: {:?}", entry.kanji);
println!("Reading: {:?}", entry.kana);
println!("Meanings: {:?}", entry.sense[0].gloss);
}
Ok(())
}
```
#### Search Exact
```rust
use jmdict_fast::Dict;
fn main() -> anyhow::Result<()> {
let dict = Dict::load_default()?;
let results = dict.lookup_exact("ใใใซใกใฏ");
for entry in &results {
println!("Found: {:?}", entry.kanji);
println!("Reading: {:?}", entry.kana);
println!("Meanings: {:?}", entry.sense[0].gloss);
}
Ok(())
}
```
---
## ๐ Data Structure
```
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ kanji.fst โ โ kana.fst โ โ romaji.fst โ
โ (243KB) โ โ (257KB) โ โ (388KB) โ
โ โ โ โ โ โ
โ ๆผขๅญ โ Entry ID โ โ ใใช โ Entry ID โ โ romaji โ Entry IDโ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโ
โ entries.bin โ
โ (16MB) โ
โ โ
โ Offset Table โ
โ + JSON Entries โ
โโโโโโโโโโโโโโโโโโโ
```
---
## ๐ง API Reference
- `Dict::load<P: AsRef<Path>>(base_dir: P) -> Result<Self>` โ Loads the dictionary from the specified directory.
- `dict.lookup_exact(term: &str) -> Vec<Entry>` โ Performs exact lookup across all writing systems.
**Entry Structure:**
```rust
pub struct Entry {
pub id: String, // JMdict entry ID
pub kanji: Vec<KanjiEntry>, // Kanji forms
pub kana: Vec<KanaEntry>, // Kana readings
pub sense: Vec<SenseEntry>, // Meanings and metadata
}
pub struct KanjiEntry {
pub common: bool, // Is this a common kanji?
pub text: String, // The kanji text
pub tags: Vec<String>, // JMdict tags
}
pub struct KanaEntry {
pub common: bool, // Is this a common reading?
pub text: String, // The kana text
pub tags: Vec<String>, // JMdict tags
pub applies_to_kanji: Vec<String>, // Which kanji this applies to
}
pub struct SenseEntry {
pub part_of_speech: Vec<String>, // Grammatical information
pub applies_to_kanji: Vec<String>, // Which kanji this sense applies to
pub applies_to_kana: Vec<String>, // Which kana this sense applies to
pub gloss: Vec<GlossEntry>, // English translations
// ... other JMdict fields
}
```
---
## ๐ ๏ธ Development
### Caching System
The build script implements a robust caching system to avoid re-downloading the large JMdict dataset. See [CACHING.md](./CACHING.md) and [CACHE_QUICK_REFERENCE.md](./CACHE_QUICK_REFERENCE.md) for details.
---
## ๐ How It Works
1. **Build Phase:** The `build` tool processes the JMdict JSON and creates FST indexes and a binary blob for instant retrieval.
2. **Runtime Phase:** The library provides memory-mapped loading, FST-based lookups, and efficient entry retrieval.
---
## ๐ Real Benchmark Results
**Criterion (lookup_word.rs) โ MacBook, Rust 1.70+**
```
lookup_exact ็ซ (jmdict-fast)
time: [4.06 ยตs]
lookup_word ็ซ (jmdict)
time: [511.96 ยตs]
```
- **jmdict-fast** is ~125x faster than a traditional filter-based approach for exact lookups.
- Both methods are stable, but jmdict-fast is highly optimized for speed and memory.
---
## ๐ค Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request
---
## ๐ License
MIT License โ see [LICENSE](LICENSE) for details.
---
## ๐ Acknowledgments
- **JMdict** โ The source dictionary data - see (EDRDG DICTIONARY LICENCE STATEMENT)[https://www.edrdg.org/edrdg/licence.html]
- **FST crate** โ Fast finite state transducer implementation
- [10ten Japanese Reader](https://github.com/birchill/10ten-ja-reader) for their definflector implemtation
- **Rust ecosystem** โ For making this possible
---
**Built with โค๏ธ and Rust** ๐ฆ