Expand description
The JMdict file is a comprehensive multilingual dictionary of the Japanese language. The original JMdict file, included in this repository (and hence, in releases of this crate) comes as XML. Instead of stuffing the XML in the binary directly, this crate parses the XML at compile-time and generates an optimized representation that is compiled into the binary. The crate’s API affords type-safe access to this embedded database.
§WARNING: Licensing on database files
The database files compiled into the crate are licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications linking this crate directly oder indirectly must display appropriate copyright notices to users. Please refer to the EDRDG’s license statement for details.
§Basic usage
The database is accessed through the entries() function which provides an iterator over all database entries compiled into the application. While traversing the database and its entries, you will find that, whenever you expect a list of something, you will get an iterator instead. These iterators provide an abstraction between you as the user of the library, and the physical representation of the database as embedded in the binary.
The following example looks up the reading for お母さん in the database:
let kanji_form = "お母さん";
let entry = jmdict::entries().find(|e| {
e.kanji_elements().any(|k| k.text == kanji_form)
}).unwrap();
let reading_form = entry.reading_elements().next().unwrap().text;
assert_eq!(reading_form, "おかあさん");
§Cargo features
§Common configurations
- The
default
feature includes the most common words (about 30000 entries) and only their English translations. - The
full
feature includes everything in the JMdict.
§Entry selection
- The
scope-uncommon
feature includes uncommon words and glosses. - The
scope-archaic
feature includes glosses with the “archaic” label. If disabled, the PartOfSpeech enum will not include variants that are only relevant for archaic vocabulary, such as obsolete conjugation patterns. (The AllPartOfSpeech enum always contains all variants.)
§Target languages
At least one target language must be selected. Selecting a target language will include all available translations in that language. Entries that do not have any translation in any of the selected languages will be skipped.
translations-eng
: English (included indefault
)translations-dut
: Dutchtranslations-fre
: Frenchtranslations-ger
: Germantranslations-hun
: Hungariantranslations-rus
: Russiantranslations-slv
: Sloveniantranslations-spa
: Spanishtranslations-swe
: Swedish
The GlossLanguage enum will only contain variants corresponding to the enabled target
languages. For example, in the default configuration, GlossLanguage::English
will be the only
variant. (The AllGlossLanguage enum always contains all variants.)
§Crippled builds: db-minimal
When the db-minimal
feature is enabled, only a severly reduced portion of the JMdict will
be parsed (to be exact, only chunks 000, 100 and 999). This is also completely useless for
actual usage, but allows for quick edit-compile-test cycles while working on this crate’s
code.
§Crippled builds: db-empty
When the db-empty
feature is enabled, downloading and parsing of the JMdict contents is
disabled entirely. The crate is compiled as usual, but entries()
will be an empty list.
This is useful for documentation builds like for docs.rs
, where --all-features
is given.
Structs§
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Error type for all enum conversions of the form
impl TryFrom<AllFoo> for Foo
. - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An entry in the JMdict dictionary.
- A particular translation or explanation for a Japanese word or phrase in a different language.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- A representation of a dictionary entry using kanji or other non-kana scripts.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Relative priority of a ReadingElement or KanjiElement.
- A representation of a dictionary entry using only kana.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- The translational equivalent of a Japanese word or phrase.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
Enums§
- The language of a particular Gloss. This enum contains all possible variants, including those that have been disabled by compile-time flags in
enum GlossLanguage
. - Where a word can appear in a sentence for a particular Sense of the word. This enum contains all possible variants, including those that have been disabled by compile-time flags in
enum PartOfSpeech
. - Dialect of Japanese in which a certain vocabulary occurs.
- The language of a particular Gloss.
- Type of gloss.
- Information regarding a certain KanjiElement.
- Where a word can appear in a sentence for a particular Sense of the word.
- PriorityInCorpus appears in struct Priority. It describes how often a dictionary entry appears in a certain corpus of text.
- Information regarding a certain ReadingElement.
- Information regarding a certain Sense.
- Field of study where a certain Sense originates.
Traits§
- Common methods provided by all enums in this crate.
Functions§
- Returns an iterator over all entries in the database.