djvu-rs
Pure-Rust DjVu decoder. MIT licensed. Written from the DjVu v3 public specification.
Features
- IFF container parser — zero-copy, borrowing slices from input
- JB2 bilevel image decoder — adaptive arithmetic coding (ZP coder) with symbol dictionary
- IW44 wavelet image decoder — planar YCbCr storage, multiple refinement chunks
- BZZ decompressor — ZP arithmetic coding + MTF + BWT (DIRM, NAVM, ANTz chunks)
- Text layer extraction — TXTz/TXTa chunk parsing with zone hierarchy (page/column/region/paragraph/line/word/character)
- Annotation parsing — ANTz/ANTa chunk parsing (hyperlinks, map areas, background color)
- Bookmarks — NAVM table-of-contents parsing
- Multi-page documents — DJVM bundle format with DIRM directory chunk
- Page rendering — composite foreground + background into RGBA output
- PDF export — selectable text, lossless IW44/JB2 embedding, bookmarks, hyperlinks
- TIFF export — multi-page color and bilevel modes (feature flag
tiff) - hOCR / ALTO XML export — text layer as hOCR or ALTO XML for OCR toolchains and archives
- Serde support —
Serialize/Deserializeon all public data types (feature flagserde) - WebAssembly (WASM) —
wasm-bindgenbindings for use in browsers and Node.js (feature flagwasm) - image-rs integration —
image::ImageDecoderimpl for use with theimagecrate (feature flagimage) - Async render —
tokio::task::spawn_blockingwrapper (feature flagasync) no_stdcompatible — IFF/BZZ/JB2/IW44/ZP modules work withalloconly
Quick start
use ;
let data = read?;
let doc = parse?;
println!;
let page = doc.page?;
println!;
let opts = RenderOptions ;
let pixmap = render_pixmap?;
// pixmap.data — RGBA bytes (width × height × 4), row-major
Text extraction
use DjVuDocument;
let data = read?;
let doc = parse?;
let page = doc.page?;
if let Some = page.text?
PDF export
use ;
let data = read?;
let doc = parse?;
let pdf_bytes = djvu_to_pdf?;
write?;
TIFF export
Requires the tiff feature flag: djvu-rs = { version = "…", features = ["tiff"] }.
use ;
let data = read?;
let doc = parse?;
let tiff_bytes = djvu_to_tiff?;
write?;
Async render
Requires the async feature flag: djvu-rs = { version = "…", features = ["async"] }.
use ;
let data = read?;
let doc = parse?;
let page = doc.page?;
let opts = RenderOptions ;
let pixmap = render_pixmap_async.await?;
Low-level IFF access
use parse_form;
let data = read?;
let form = parse_form?;
println!;
for chunk in &form.chunks
CLI
The djvu binary is included when the std feature is enabled (the default).
# Install
# Document info
# Render page 1 to PNG at 200 DPI
# Render all pages to a PDF
# Export all pages to CBZ
# Extract text from page 2
# Extract text from all pages
hOCR and ALTO XML export
use ;
let data = read?;
let doc = parse?;
// hOCR — compatible with Tesseract, ABBYY, and most OCR toolchains
let hocr = to_hocr?;
write?;
// ALTO XML — used by libraries and archives (DFG, Europeana, etc.)
let alto = to_alto?;
write?;
Serde support
Requires the serde feature flag: djvu-rs = { version = "…", features = ["serde"] }.
All public data types (DjVuBookmark, TextZone, MapArea, PageInfo, etc.) implement
Serialize and Deserialize.
use DjVuDocument;
let data = read?;
let doc = parse?;
let info = doc.page?.info?;
let json = to_string_pretty?;
println!;
image-rs integration
Requires the image feature flag: djvu-rs = { version = "…", features = ["image"] }.
use ;
use DynamicImage;
let data = read?;
let doc = parse?;
let page = doc.page?;
let decoder = new?;
let img = from_decoder?;
img.save?;
WebAssembly
Build with wasm-pack:
Then use in JavaScript/TypeScript:
import init from './pkg/djvu_rs.js';
await ;
const doc = ;
console.log;
const page = doc.;
const pixels = page.; // Uint8ClampedArray, RGBA
const img = ;
ctx.;
See examples/wasm/ for a complete drag-and-drop demo.
Feature flags
| Flag | Default | Description |
|---|---|---|
std |
enabled | DjVuDocument, file I/O, rendering, PDF export, CLI |
tiff |
disabled | TIFF export via the tiff crate |
async |
disabled | Async render API via tokio::task::spawn_blocking |
parallel |
disabled | Parallel multi-page render via rayon (render_pages_parallel) |
jpeg |
disabled | Standalone JPEG decode without full std (JPEG is included in std by default) |
mmap |
disabled | Memory-mapped file I/O via memmap2 (DjVuDocument::from_mmap) |
serde |
disabled | Serialize + Deserialize for all public data types |
image |
disabled | image::ImageDecoder impl via DjVuDecoder — integrates with the image crate |
wasm |
disabled | WebAssembly bindings via wasm-bindgen (WasmDocument, WasmPage) |
Without std, the crate provides IFF parsing, BZZ decompression, JB2/IW44 decoding,
text/annotation parsing — all codec primitives that work on byte slices.
Performance
Measured on Apple M1 Max (Rust 1.92, release profile, --features parallel).
Compared to DjVuLibre 3.5.29 C library (render-only, in-process):
| Page type | djvu-rs | libdjvulibre | Ratio |
|---|---|---|---|
| Color IW44, 300 dpi (849×1100 px) | 3.2 ms | 37 ms | ~12× faster |
| Bilevel JB2, 300 dpi (849×1100 px) | 3.2 ms | 37 ms | ~12× faster |
| Bilevel JB2, 600 dpi, sparse (2649×4530 px) | 10.5 ms | 12.2 ms | ~1.2× faster |
| Bilevel JB2, 600 dpi, dense (2649×4530 px) | 36.2 ms | 13.8 ms | ~0.38× (libdjvulibre wins) |
Document open + parse: djvu-rs ≈ 1.9 ms vs libdjvulibre ≈ 24–60 ms — 10–30× faster.
Dense 600 dpi pages remain bottlenecked on the sequential ZP arithmetic decoder. All other page types match or beat the C library.
See BENCHMARKS_RESULTS.md for full details and methodology.
Minimum supported Rust version (MSRV)
Rust 1.88 (edition 2024 — let-chains stabilized in 1.88)
Roadmap
See GitHub milestones for the full roadmap and progress tracking.
License
MIT. See LICENSE.
Specification
Written from the public DjVu v3 specification:
- https://www.sndjvu.org/spec.html
- https://djvu.sourceforge.net/spec/DjVu3Spec.djvu (the spec is itself a DjVu file)
No code derived from GPL-licensed DjVuLibre or any other GPL source. All algorithms are independent implementations from the spec.