djvu-rs
Pure-Rust DjVu codec — decode and encode DjVu documents. MIT licensed, no GPL dependencies. Written from the DjVu v3 public specification.
Features
- IFF container parser — zero-copy, borrowing slices from input
- JB2 bilevel image decoder — adaptive arithmetic coding (ZP coder) with symbol dictionary
- JB2 bilevel image encoder — encode any
Bitmapinto a validSjbzchunk payload - IW44 wavelet image decoder — planar YCbCr storage, multiple refinement chunks
- IW44 wavelet image encoder — encode color (
Pixmap) or grayscale (GrayPixmap) intoBG44/FG44chunk payloads - G4/MMR bilevel image decoder — ITU-T T.6 Group 4 fax decoder (
Smmrchunks) - BZZ decompressor — ZP arithmetic coding + MTF + BWT (DIRM, NAVM, ANTz chunks)
- Text layer extraction — TXTz/TXTa chunk parsing with zone hierarchy (page/column/region/paragraph/line/word/character)
- Annotation parsing — ANTz/ANTa chunk parsing (hyperlinks, map areas, background color)
- Annotation encoding — serialize
Annotation+MapAreaslices into ANTa or ANTz chunk payloads - Bookmarks — NAVM table-of-contents parsing
- Bookmark encoding — serialize
DjVuBookmarktrees into NAVM chunk payloads - Multi-page documents — DJVM bundle format with DIRM directory chunk; indirect DJVM creation and loading from directory
- Page rendering — composite foreground + background into RGBA output
- PDF export — selectable text, lossless IW44/JB2 embedding, bookmarks, hyperlinks
- TIFF export — multi-page color and bilevel modes (feature flag
tiff) - hOCR / ALTO XML export — text layer as hOCR or ALTO XML for OCR toolchains and archives
- Serde support —
Serialize/Deserializeon all public data types (feature flagserde) - EPUB 3 export — page images + invisible text overlay + bookmarks as navigation (feature flag
epub) - WebAssembly (WASM) —
wasm-bindgenbindings for use in browsers and Node.js (feature flagwasm) - image-rs integration —
image::ImageDecoderimpl for use with theimagecrate (feature flagimage) - Async render and lazy loading — async render wrappers plus true per-page lazy loading over
AsyncRead + AsyncSeek(feature flagasync) - Workspace codec crates — standalone
djvu-iff,djvu-bzz,djvu-bitmap,djvu-jb2,djvu-pixmap,djvu-iw44, anddjvu-zpcrates for focused consumers - Fuzzing integration — libFuzzer targets and in-tree OSS-Fuzz project files
no_stdcompatible — IFF/BZZ/JB2/IW44/ZP codec modules work withalloconly
Quick start
use ;
let data = read?;
let doc = parse?;
println!;
let page = doc.page?;
println!;
let target_dpi = 150u32;
let opts = RenderOptions ;
let pixmap = render_pixmap?;
// pixmap.data — RGBA bytes (width × height × 4), row-major
Text extraction
use DjVuDocument;
let data = read?;
let doc = parse?;
let page = doc.page?;
if let Some = page.text?
PDF export
use ;
let data = read?;
let doc = parse?;
let pdf_bytes = djvu_to_pdf?;
write?;
TIFF export
Requires the tiff feature flag: djvu-rs = { version = "…", features = ["tiff"] }.
use ;
let data = read?;
let doc = parse?;
let tiff_bytes = djvu_to_tiff?;
write?;
Async render
Requires the async feature flag: djvu-rs = { version = "…", features = ["async"] }.
use ;
let data = read?;
let doc = parse?;
let page = doc.page?;
let target_dpi = 150u32;
let opts = RenderOptions ;
let pixmap = render_pixmap_async.await?;
Lazy async loading
Requires the async feature flag. Unlike load_document_async, the lazy
loader keeps a seekable async reader and fetches page/component byte ranges
only when page_async(i) is called. Parsed pages are cached as Arc<DjVuPage>.
use from_async_reader_lazy;
let file = open.await?;
let doc = from_async_reader_lazy.await?;
println!;
let page = doc.page_async.await?;
println!;
Supported shapes: single-page FORM:DJVU and bundled FORM:DJVM, including
shared DJVI dictionaries referenced via INCL. For browser-local !Send
readers on wasm32, use from_async_reader_lazy_local.
See examples/async_lazy_first_page.rs
for a native first-page latency probe and
examples/wasm/range_lazy.md for the HTTP
Range: bytes=start-end integration shape.
Low-level IFF access
use parse_form;
let data = read?;
let form = parse_form?;
println!;
for chunk in &form.chunks
Encoding
JB2 bilevel image encoder
use ;
let mut bm = new;
// ... fill bitmap pixels ...
let sjbz_payload = encode_jb2;
// Wrap in a Sjbz IFF chunk and embed in a DjVu FORM:DJVU.
IW44 wavelet encoder
use ;
let pixmap: Pixmap = /* ... your RGBA/YCbCr image ... */;
let chunks: = encode_iw44_color;
// Each Vec<u8> is a BG44 chunk payload; wrap each in a BG44 IFF tag.
Grayscale:
use ;
let gray: GrayPixmap = /* ... */;
let chunks: = encode_iw44_gray;
Iw44EncodeOptions fields (all have sensible defaults):
| Field | Default | Description |
|---|---|---|
slices_per_chunk |
10 | Slices packed into each BG44/FG44 chunk |
total_slices |
100 | Total refinement slices to encode |
chroma_delay |
0 | Y slices before Cb/Cr encoding begins |
chroma_half |
true | Encode chroma at half resolution |
Bookmark encoder
use ;
let bookmarks = vec!;
let navm_payload = encode_navm;
Annotation encoder
use ;
let ann = default;
let areas: = vec!;
let anta_payload = encode_annotations; // uncompressed ANTa
let antz_payload = encode_annotations_bzz; // BZZ-compressed ANTz
Indirect multi-page documents
Create an indirect DJVM index file that references per-page .djvu files:
use create_indirect;
let index = create_indirect?;
write?;
// Distribute book.djvu alongside the individual page files.
Load an indirect document by resolving component files from a directory:
use DjVuDocument;
let index = read?;
let doc = parse_from_dir?;
println!;
CLI
The djvu binary is enabled by the cli feature.
# Install
# Document info
# Render page 1 to PNG at 200 DPI
# Render all pages to a PDF
# Export all pages to CBZ
# Extract text from page 2
# Extract text from all pages
# Encode a PNG image into a single-page DjVu (bilevel JB2, lossless)
# Encode a PNG image into a layered lossy DjVu (JB2 mask + IW44 background)
# Encode a directory of PNGs into a bundled DJVM with shared Djbz
For single PNG input, --quality lossless luminance-thresholds the image into a
JB2 mask and writes INFO + Sjbz; --quality quality uses the layered encoder
(INFO + Sjbz + BG44...) for color input. --quality archival is still
reserved for a future FGbz/profitability model. Directory input currently uses
the lossless multi-page JB2 path only.
hOCR and ALTO XML export
use ;
let data = read?;
let doc = parse?;
// hOCR — compatible with Tesseract, ABBYY, and most OCR toolchains
let hocr = to_hocr?;
write?;
// ALTO XML — used by libraries and archives (DFG, Europeana, etc.)
let alto = to_alto?;
write?;
Serde support
Requires the serde feature flag: djvu-rs = { version = "…", features = ["serde"] }.
All public data types (DjVuBookmark, TextZone, MapArea, PageInfo, etc.) implement
Serialize and Deserialize.
use DjVuDocument;
let data = read?;
let doc = parse?;
let json = to_string_pretty?;
println!;
image-rs integration
Requires the image feature flag: djvu-rs = { version = "…", features = ["image"] }.
use ;
use DynamicImage;
let data = read?;
let doc = parse?;
let page = doc.page?;
let decoder = new?.with_size;
let img = from_decoder?;
img.save?;
EPUB export
Requires the epub feature flag: djvu-rs = { version = "…", features = ["epub"] }.
use ;
let data = read?;
let doc = parse?;
let epub_bytes = djvu_to_epub?;
write?;
CLI:
WebAssembly
Build with wasm-pack:
Then use in JavaScript/TypeScript:
import init from './pkg/djvu_rs.js';
await ;
const doc = ;
console.log;
const page = doc.;
const pixels = page.; // Uint8ClampedArray, RGBA
const img = ;
ctx.;
See examples/wasm/ for a complete drag-and-drop demo.
Feature flags
| Flag | Default | Description |
|---|---|---|
std |
enabled | DjVuDocument, file I/O, rendering, PDF export |
cli |
disabled | Build the djvu command-line binary |
tiff |
disabled | TIFF export via the tiff crate |
async |
disabled | Async render API and lazy AsyncRead + AsyncSeek document loading |
parallel |
disabled | Parallel multi-page render via rayon (render_pages_parallel) |
jpeg |
disabled | Standalone JPEG decode without full std (JPEG is included in std by default) |
mmap |
disabled | Memory-mapped file I/O via memmap2 (DjVuDocument::from_mmap) |
serde |
disabled | Serialize + Deserialize for all public data types |
image |
disabled | image::ImageDecoder impl via DjVuDecoder — integrates with the image crate |
epub |
disabled | EPUB 3 export via djvu_to_epub — page images, text overlay, bookmarks as nav |
wasm |
disabled | WebAssembly bindings via wasm-bindgen (WasmDocument, WasmPage) |
Without std, the crate provides IFF parsing, BZZ decompression, JB2/IW44 decoding,
text/annotation parsing — all codec primitives that work on byte slices.
Performance
Latest full Criterion run: macOS arm64, Rust 1.92, release profile
(2026-05-04).
| Benchmark | Time |
|---|---|
render_page/dpi/72 |
216 µs |
render_page/dpi/144 |
895 µs |
render_page/dpi/300 |
3.42 ms |
render_colorbook (150 dpi, warm) |
7.29 ms |
render_colorbook_cold |
17.9 ms |
render_corpus_color (native 600 dpi) |
71.2 ms |
render_corpus_bilevel (native 600 dpi) |
70.4 ms |
jb2_decode |
134 µs |
iw44_decode_first_chunk |
599 µs |
iw44_decode_corpus_color |
671 µs |
parse_multipage_520p |
2.27 ms |
render_large_doc_first_page |
12.8 ms |
Comparison with DjVuLibre
The benchmark workflow still runs a DjVuLibre comparison via
scripts/bench_djvulibre.sh and formats it with
scripts/djvulibre_compare.py.
Current local matrix:
| Scenario | djvu-rs | DjVuLibre | Ratio |
|---|---|---|---|
| Small color IW44, 72 dpi | 217 µs | 122 µs | DjVuLibre 1.8x faster |
| Large color IW44, 150 dpi | 7.29 ms | 6.00 ms | DjVuLibre 1.2x faster |
| Native color corpus, 300 dpi | 73.28 ms | 36.47 ms | DjVuLibre 2.0x faster |
| Native bilevel JB2 corpus, 300 dpi | 72.33 ms | 35.26 ms | DjVuLibre 2.1x faster |
The same workflow also records ddjvu CLI timings for these files
(30.20-81.00 ms locally), including process startup and PPM output.
See BENCHMARKS_RESULTS.md for the full Criterion run, methodology, and the full DjVuLibre comparison. Historical multi-platform results are kept in BENCHMARKS.md; compare those carefully because some benchmark definitions and output sizes have changed over time.
Recent targeted experiments are recorded in PERF_EXPERIMENTS.md, including:
- #233 lazy async loading: a 100 MiB padded 520-page DJVM reached first pixel in 491.469 ms while reading only 28,578 bytes at simulated 12.5 MiB/s throughput.
- #189 x86-64-v3 AVX2 validation: existing AVX2 decode paths showed
iw44_decode_corpus_color-18.88% andiw44_decode_first_chunk-4.85% on GitHub-hosted x86_64, with one sub4 partial-decode regression recorded for follow-up. - #258 shared-Djbz clustering: Hamming shared clustering was rejected as default; byte-exact shared-Djbz remains the measured safe path.
Minimum supported Rust version (MSRV)
Rust 1.88 (edition 2024 — let-chains stabilized in 1.88)
Roadmap
See GitHub milestones for the full roadmap and progress tracking.
License
MIT. See LICENSE.
Specification
Written from the public DjVu v3 specification:
- https://www.sndjvu.org/spec.html
- https://djvu.sourceforge.net/spec/DjVu3Spec.djvu (the spec is itself a DjVu file)
No code derived from GPL-licensed DjVuLibre or any other GPL source. All algorithms are independent implementations from the spec.