epub-parser
A Rust library for extracting metadata, table of contents, text, cover, and images from EPUB files
Features
- ✅ Parse EPUB container and locate OPF file
- ✅ Extract Dublin Core metadata (title, author, publisher, language, identifier, date, rights)
- ✅ Parse NCX table of contents with hierarchical structure
- ✅ Extract text from HTML/XHTML content files
- ✅ Extract cover image from EPUB
- ✅ Extract all images from EPUB
- ✅ Follow reading order from OPF spine
- ✅ Clean text extraction (strips HTML, handles line breaks)
Dependencies
zip- for extracting EPUB (which is a ZIP archive)quick-xml- for parsing XML (OPF, NCX) and HTML content
Usage
use Epub;
use Path;
// Parse from file path
let epub = parse?;
// Or parse from in-memory buffer
let buffer = read?;
let epub_from_buffer = parse_from_buffer?;
// Access metadata
println!;
println!;
// Access images, the first is cover
for image in &epub.images
// Access table of contents
for entry in &epub.toc
// Access page content
for page in &epub.pages