rust-readmdict
A port of https://github.com/ffreemt/readmdict
A Rust implementation for reading MDict dictionary files (.mdx format).
Usage
Basic Usage
To open an MDX file and display basic information:
Output:
Successfully opened MDX file: example_resources/webster.mdx
Number of entries: 109353
List Keys
To list the first 10 keys from the dictionary:
Output:
Successfully opened MDX file: example_resources/webster.mdx
Number of entries: 109353
Keys:
1: 12 a.m
2: 12 midnight
3: 12 p.m.
4: 20/20
5: 20/20 hindsight
6: 20 hindsight
7: .22
8: .22s
9: 24-7
10: 24/7
... and 109343 more
List Keys Since a Word
To list keys that are alphabetically equal to or greater than a specific word:
Output:
Successfully opened MDX file: example_resources/webster.mdx
Number of entries: 109353
Keys since 'apple':
1: apple
2: apple cheeked
3: apple of someone's eye
4: apple pie
5: apple pies
6: apple polisher
7: apple polishers
8: apple-cheeked
9: apples
10: applesauce
... and 99401 more
Look up a word and show its content
# Look up the definition of "apple"
# Look up a resource file in MDD
Example output:
Successfully opened MDX file: example_resources/webster.mdx
Number of entries: 109353
Looking up 'apple':
Definition:
<div class="entry">...[HTML content with definition]...</div>
Features
- Read MDX dictionary files
- Extract header information and metadata
- Parse and list dictionary keys
- List keys alphabetically from a specific starting word
- Look up words and display their content from MDX files
- Look up resources and display their content from MDD files
- Support for compressed key blocks (zlib)
- Handle different MDX versions (1.x and 2.x)
Building
Implementation Details
This is a Rust port of the Python readmdict library. The implementation follows a simplified file structure that closely mirrors the original Python codebase.
File Structure Mapping
| Python File | Rust File | Purpose |
|---|---|---|
readmdict/__main__.py |
src/main.rs |
CLI entry point and argument parsing |
readmdict/readmdict.py |
src/readmdict.rs |
Core library with all classes (MDict, MDX, MDD) |
readmdict/pureSalsa20.py |
Use salsa20 crate |
Salsa20 encryption (external crate) |
readmdict/ripemd128.py |
Use ripemd crate |
RIPEMD128 hashing (external crate) |
| N/A | src/lib.rs |
Library entry point (re-exports from readmdict.rs) |
Core Classes
| Python Class/Function | Rust Equivalent | Location |
|---|---|---|
MDict (base class) |
struct MDict |
src/readmdict.rs |
MDX (inherits MDict) |
struct Mdx |
src/readmdict.rs |
MDD (inherits MDict) |
struct Mdd |
src/readmdict.rs |
Method-to-Method Mapping
Utility Functions:
| Python Function | Rust Function | Location |
|---|---|---|
_unescape_entities(text) |
unescape_entities(text: &[u8]) -> Vec<u8> |
src/readmdict.rs |
_fast_decrypt(data, key) |
fast_decrypt(data: &[u8], key: &[u8]) -> Vec<u8> |
src/readmdict.rs |
_mdx_decrypt(comp_block) |
mdx_decrypt(comp_block: &[u8]) -> Result<Vec<u8>> |
src/readmdict.rs |
_salsa_decrypt(ciphertext, key) |
salsa_decrypt(ciphertext: &[u8], key: &[u8]) -> Result<Vec<u8>> |
src/readmdict.rs |
_decrypt_regcode_by_deviceid(regcode, deviceid) |
decrypt_regcode_by_deviceid(regcode: &[u8], deviceid: &[u8]) -> Result<Vec<u8>> |
src/readmdict.rs |
_decrypt_regcode_by_email(regcode, email) |
decrypt_regcode_by_email(regcode: &[u8], email: &[u8]) -> Result<Vec<u8>> |
src/readmdict.rs |
MDict Class Methods:
| Python Method | Rust Method | Purpose |
|---|---|---|
__init__(fname, encoding, passcode) |
new(fname: &str, encoding: Option<String>, passcode: Option<Passcode>) -> Result<Self> |
Constructor |
__len__() |
len(&self) -> usize |
Get number of entries |
__iter__() |
keys(&self) -> impl Iterator<Item = &[u8]> |
Iterator over keys |
keys() |
keys(&self) -> impl Iterator<Item = &[u8]> |
Get dictionary keys |
_read_number(f) |
read_number<R: Read>(&self, reader: &mut R) -> Result<u64> |
Read number from file |
_parse_header(header) |
parse_header(header: &[u8]) -> Result<HashMap<String, String>> |
Parse header attributes |
_decode_key_block_info(data) |
decode_key_block_info(&self, data: &[u8]) -> Result<Vec<(u64, u64)>> |
Decode key block info |
_decode_key_block(data, info) |
decode_key_block(&self, data: &[u8], info: &[(u64, u64)]) -> Result<Vec<(u64, Vec<u8>)>> |
Decode key block |
_split_key_block(data) |
split_key_block(&self, data: &[u8]) -> Result<Vec<(u64, Vec<u8>)>> |
Split key block into entries |
_read_header() |
read_header(&mut self) -> Result<HashMap<String, String>> |
Read and parse file header |
_read_keys() |
read_keys(&mut self) -> Result<Vec<(u64, Vec<u8>)>> |
Read key blocks |
_read_keys_brutal() |
read_keys_brutal(&mut self) -> Result<Vec<(u64, Vec<u8>)>> |
Fallback key reading method |
MDX Class Methods:
| Python Method | Rust Method | Purpose |
|---|---|---|
__init__(fname, encoding, substyle, passcode) |
new(fname: &str, encoding: Option<String>, substyle: bool, passcode: Option<Passcode>) -> Result<Self> |
Constructor |
items() |
items(&self) -> impl Iterator<Item = Result<(Vec<u8>, Vec<u8>)>> |
Iterator over key-value pairs |
_substitute_stylesheet(txt) |
substitute_stylesheet(&self, txt: &str) -> String |
Apply stylesheet substitution |
_decode_record_block() |
decode_record_block(&self) -> impl Iterator<Item = Result<(Vec<u8>, Vec<u8>)>> |
Decode record blocks |
MDD Class Methods:
| Python Method | Rust Method | Purpose |
|---|---|---|
__init__(fname, passcode) |
new(fname: &str, passcode: Option<Passcode>) -> Result<Self> |
Constructor |
items() |
items(&self) -> impl Iterator<Item = Result<(Vec<u8>, Vec<u8>)>> |
Iterator over filename-content pairs |
_decode_record_block() |
decode_record_block(&self) -> impl Iterator<Item = Result<(Vec<u8>, Vec<u8>)>> |
Decode record blocks |
Implementation Checklist
- 1. Create basic project structure (
src/lib.rs,src/main.rs) - 2. Implement core readmdict module (
src/readmdict.rs) containing:- 2.1. Utility functions (
unescape_entities, etc.) - 2.2. Crypto functions (
fast_decrypt,mdx_decrypt,salsa_decrypt, etc.) - 2.3. Base
MDictstruct with all methods - 2.4.
Mdxstruct inheriting fromMDict - 2.5.
Mddstruct inheriting fromMDict
- 2.1. Utility functions (
- 3. Implement CLI interface (
src/main.rs) matching__main__.py - 4. Update
src/lib.rsto re-export fromreadmdict.rs - 5. Add error handling and comprehensive tests
- 6. Add documentation and usage examples
- 7. Performance optimization and benchmarking
Detailed Structure Plan
src/readmdict.rs (single file containing everything from readmdict.py):
// Imports and dependencies
use HashMap;
use File;
use ;
use Path;
use ;
use ZlibDecoder;
use Regex;
use Encoding;
use ;
use ;
use Sha256;
use adler32;
// Error types
pub type Result<T> = Result;
// Utility functions (direct ports from Python)
// Passcode struct
// Base MDict struct (equivalent to Python MDict class)
// MDX struct (equivalent to Python MDX class)
// MDD struct (equivalent to Python MDD class)
src/lib.rs (simple re-export):
pub use *;
src/main.rs (direct port of main.py):
use Parser;
use *;
use Path;
use fs;
use Write;
Implementation Considerations
Key Differences from Python:
- Error Handling: Rust uses
Result<T, E>instead of exceptions - Memory Management: No garbage collection, explicit ownership
- String Handling: Distinction between
String,&str, andVec<u8> - Iterator Patterns: Rust iterators are lazy and zero-cost
- File I/O: More explicit error handling required
External Crate Dependencies:
clap: Command-line argument parsing (replacesargparse)flate2: Zlib compression (replaceszlib)salsa20: Salsa20 encryption (replacespureSalsa20.py)ripemd: RIPEMD128 hashing (replacesripemd128.py)encoding_rs: Text encoding supportregex: Regular expressions for parsingbyteorder: Binary data readingthiserror: Error type derivationhex: Hexadecimal encoding/decodingadler: Adler32 checksums
Performance Optimizations:
- Zero-copy where possible: Use
&[u8]slices instead ofVec<u8>when data doesn't need to be owned - Streaming iterators: Process records on-demand instead of loading everything into memory
- Efficient string handling: Use
Cow<str>for strings that might not need allocation - Memory mapping: Consider using
memmap2for large files - Parallel processing: Use
rayonfor CPU-intensive operations like decompression
Testing Strategy:
- Unit tests: Test each utility function and method individually
- Integration tests: Test with real MDX/MDD files
- Property-based tests: Use
proptestfor edge cases - Benchmark tests: Compare performance with Python implementation
- Compatibility tests: Ensure output matches Python version exactly