Expand description
The JMdict file is a comprehensive multilingual dictionary of the Japanese language. The original JMdict file, included in this repository (and hence, in releases of this crate) comes as XML. Instead of stuffing the XML in the binary directly, this crate parses the XML at compile-time and generates an optimized representation that is compiled into the binary. The crate’s API affords type-safe access to this embedded database.
§WARNING: Licensing on database files
The database files compiled into the crate are licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications linking this crate directly oder indirectly must display appropriate copyright notices to users. Please refer to the EDRDG’s license statement for details.
§Basic usage
The database is accessed through the entries() function which provides an iterator over all database entries compiled into the application. While traversing the database and its entries, you will find that, whenever you expect a list of something, you will get an iterator instead. These iterators provide an abstraction between you as the user of the library, and the physical representation of the database as embedded in the binary.
The following example looks up the reading for お母さん in the database:
let kanji_form = "お母さん";
let entry = jmdict::entries().find(|e| {
e.kanji_elements().any(|k| k.text == kanji_form)
}).unwrap();
let reading_form = entry.reading_elements().next().unwrap().text;
assert_eq!(reading_form, "おかあさん");
§Cargo features
§Common configurations
- The
default
feature includes the most common words (about 30000 entries) and only their English translations. - The
full
feature includes everything in the JMdict.
§Entry selection
- The
scope-uncommon
feature includes uncommon words and glosses. - The
scope-archaic
feature includes glosses with the “archaic” label. If disabled, the PartOfSpeech enum will not include variants that are only relevant for archaic vocabulary, such as obsolete conjugation patterns. (The AllPartOfSpeech enum always contains all variants.)
§Target languages
At least one target language must be selected. Selecting a target language will include all available translations in that language. Entries that do not have any translation in any of the selected languages will be skipped.
translations-eng
: English (included indefault
)translations-dut
: Dutchtranslations-fre
: Frenchtranslations-ger
: Germantranslations-hun
: Hungariantranslations-rus
: Russiantranslations-slv
: Sloveniantranslations-spa
: Spanishtranslations-swe
: Swedish
The GlossLanguage enum will only contain variants corresponding to the enabled target
languages. For example, in the default configuration, GlossLanguage::English
will be the only
variant. (The AllGlossLanguage enum always contains all variants.)
§Crippled builds: db-minimal
When the db-minimal
feature is enabled, only a severly reduced portion of the JMdict will
be parsed (to be exact, only chunks 000, 100 and 999). This is also completely useless for
actual usage, but allows for quick edit-compile-test cycles while working on this crate’s
code.
§Crippled builds: db-empty
When the db-empty
feature is enabled, downloading and parsing of the JMdict contents is
disabled entirely. The crate is compiled as usual, but entries()
will be an empty list.
This is useful for documentation builds like for docs.rs
, where --all-features
is given.
Structs§
- Dialects
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Disabled
Variant - Error type for all enum conversions of the form
impl TryFrom<AllFoo> for Foo
. - Entries
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Entry
- An entry in the JMdict dictionary.
- Gloss
- A particular translation or explanation for a Japanese word or phrase in a different language.
- Glosses
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Kanji
Element - A representation of a dictionary entry using kanji or other non-kana scripts.
- Kanji
Elements - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Kanji
Infos - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Loanword
Source - A source word in other language which a particular Sense of an Entry has been borrowed from.
- Loanword
Sources - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Parts
OfSpeech - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Priority
- Relative priority of a ReadingElement or KanjiElement.
- Reading
Element - A representation of a dictionary entry using only kana.
- Reading
Elements - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Reading
Infos - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Sense
- The translational equivalent of a Japanese word or phrase.
- Sense
Infos - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Sense
Topics - An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Senses
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
- Strings
- An iterator providing fast access to objects in the database. Instances of this iterator can be copied cheaply.
Enums§
- AllGloss
Language - The language of a particular Gloss. This enum contains all possible variants, including those that have been disabled by compile-time flags in
enum GlossLanguage
. - AllPart
OfSpeech - Where a word can appear in a sentence for a particular Sense of the word. This enum contains all possible variants, including those that have been disabled by compile-time flags in
enum PartOfSpeech
. - Dialect
- Dialect of Japanese in which a certain vocabulary occurs.
- Gloss
Language - The language of a particular Gloss.
- Gloss
Type - Type of gloss.
- Kanji
Info - Information regarding a certain KanjiElement.
- Part
OfSpeech - Where a word can appear in a sentence for a particular Sense of the word.
- Priority
InCorpus - PriorityInCorpus appears in struct Priority. It describes how often a dictionary entry appears in a certain corpus of text.
- Reading
Info - Information regarding a certain ReadingElement.
- Sense
Info - Information regarding a certain Sense.
- Sense
Topic - Field of study where a certain Sense originates.
Traits§
- Enum
- Common methods provided by all enums in this crate.
Functions§
- entries
- Returns an iterator over all entries in the database.