Crate rbook

Source
Expand description

A fast, format-agnostic, ergonomic ebook library with a current focus on EPUB.

The primary goal of rbook is to provide an easy-to-use high-level API for handling ebooks. Most importantly, this library is designed with future formats in mind (CBZ, FB2, MOBI, etc.) via core traits defined within the ebook and reader module, allowing all formats to share the same “base” API.

Traits such as Ebook allow formats to be handled generically. For example, retrieving the data of a cover image agnostic to the concrete format (e.g., Epub):

// Here `ebook` may be of any supported format.
fn cover_image_bytes<E: Ebook>(ebook: &E) -> Option<Vec<u8>> {
    // 1 - An ebook may not have a `cover_image` entry, hence the try operator (`?`).
    // 2 - `read_bytes` returns a `Result`; `ok()` coverts the result into `Option`.
    ebook.manifest().cover_image()?.read_bytes().ok()
}

§Features

Here is a non-exhaustive list of the features rbook provides:

FeatureOverview
EPUB 2 and 3Read-only (for now) view of EPUB 2 and 3 formats.
Streaming ReaderRandom‐access or sequential iteration over readable content.
Detailed TypesAbstractions built on expressive traits and types.
MetadataTyped access to titles, creators, publishers, languages, tags, roles, attributes, and more.
ManifestLookup and traverse contained resources such as readable content (XHTML) and images.
SpineChronological reading order and preferred page direction.
Table of Contents (ToC)Navigation points, including the EPUB 2 guide and EPUB 3 landmarks.
ResourcesRetrieve bytes or UTF-8 strings for any manifest resource.

§Default crate features

These are toggleable features for rbook that are enabled by default in a project’s cargo.toml file:

FeatureDescription
preludeConvenience prelude only including common traits.
threadsafeEnables constraint and support for Send + Sync.

Default features can be disabled and toggled selectively. For example, omitting the prelude while retaining the threadsafe feature:

[dependencies]
rbook = { version = "0.6.6", default-features = false, features = ["threadsafe"] }

§Opening an Ebook

rbook supports several methods to open an ebook (Epub):

  • A directory containing the contents of an unzipped ebook:
    let epub = Epub::open("/ebooks/unzipped_epub_dir");
  • A file path:
    let epub = Epub::open("/ebooks/zipped.epub");
  • Or any implementation of Read + Seek (and Send + Sync if the threadsafe feature is enabled):
    let cursor = std::io::Cursor::new(bytes_vec);
    let epub = Epub::read(cursor, EpubSettings::default());

Aside from how the contents of an ebook are stored, settings can also be provided to control parser behavior, such as strictness:

// Import traits
use rbook::Ebook;
// use rbook::prelude::*; // or the prelude for convenient trait imports

use rbook::epub::{Epub, EpubSettings};

let epub = Epub::open_with(
    "tests/ebooks/example_epub",
    EpubSettings::builder().strict(false), // Disable strict checks (`true` by default)
).unwrap();

§Reading an Ebook

Reading the contents of an ebook is handled by a Reader, which traverses end-user-readable resources in canonical order:

use rbook::reader::{Reader, ReaderContent};

// Create a reader instance
let mut reader = epub.reader();

// Print the readable content
while let Some(Ok(data)) = reader.read_next() {
    assert_eq!("application/xhtml+xml", data.manifest_entry().media_type());
    println!("{}", data.content());
}

As with an ebook, a reader can receive settings to control behavior, such as linearity:

use rbook::epub::reader::{EpubReaderSettings, LinearBehavior};

let mut reader = epub.reader_with(
    // Make a reader omit non-linear content
    EpubReaderSettings::builder().linear_behavior(LinearBehavior::LinearOnly)
);

§Resource retrieval from an Ebook

All files such as text, images, and video are accessible within an ebook programmatically.

The simplest way to access and retrieve resources from an ebook is through the Manifest, specifically through its entries via ManifestEntry::read_str and ManifestEntry::read_bytes:

let manifest_entry = epub.manifest().cover_image().unwrap();
let cover_image_bytes = manifest_entry.read_bytes()?;

// process bytes //

For finer grain control, the Ebook trait provides two methods that accept a Resource as an argument:

let manifest_entry = epub.manifest().cover_image().unwrap();

let bytes_a = epub.read_resource_bytes(manifest_entry.resource())?;
let bytes_b = epub.read_resource_bytes("/EPUB/img/cover.webm")?;

assert_eq!(bytes_a, bytes_b);

All resource retrieval methods are fallible, and attempts to access malformed or missing resources will return an EbookError::Archive error.

§See Also

  • Epub documentation of read_resource_* methods for normalization details.

§The prelude

rbook provides a prelude consisting only of traits for convenience. It circumvents manually importing each trait and helps keep imports lean:

// Without the prelude (Verbose; manually importing each trait):
/*1*/ use rbook::Ebook;
/*2*/ use rbook::ebook::manifest::ManifestEntry;
/*3*/ use rbook::ebook::spine::{Spine, SpineEntry};

// With the prelude, lines 1, 2, and 3 can be consolidated into `use rbook::prelude::*;`

// Retrieve the manifest entry associated with a spine entry:
let epub = rbook::Epub::open("tests/ebooks/example_epub")?;
let spine_entry = epub.spine().by_order(2).unwrap();
let manifest_entry_a = spine_entry.manifest_entry().unwrap();

assert_eq!("c1", spine_entry.idref());
assert_eq!("c1", manifest_entry_a.id());

The idea of libraries providing a prelude is subjective and may not be desirable. As such, it is set as a default crate feature that can be disabled inside a project’s cargo.toml file. For example, omitting the prelude while retaining the threadsafe feature:

[dependencies]
rbook = { version = "0.6.6", default-features = false, features = ["threadsafe"] }

§Examples

§Accessing Metadata: Retrieving the main title

// Retrieve the main title (all titles retrievable via `titles()`)
let title = epub.metadata().title().unwrap();
assert_eq!("Example EPUB", title.value());
assert_eq!(TitleKind::Main, title.kind());

// Retrieve the first alternate script of a title
let alternate_script = title.alternate_scripts().next().unwrap();
assert_eq!("サンプルEPUB", alternate_script.value());
assert_eq!("ja", alternate_script.language().scheme().code());
assert_eq!(LanguageKind::Bcp47, alternate_script.language().kind());

§Accessing Metadata: Retrieving the first creator

// Retrieve the first creator
let creator = epub.metadata().creators().next().unwrap();
assert_eq!("John Doe", creator.value());
assert_eq!(Some("Doe, John"), creator.file_as());
assert_eq!(0, creator.order());

// Retrieve the main role of a creator (all roles retrievable via `roles()`)
let role = creator.main_role().unwrap();
assert_eq!("aut", role.code());
assert_eq!(Some("marc:relators"), role.source());

// Retrieve the first alternate script of a creator
let alternate_script = creator.alternate_scripts().next().unwrap();
assert_eq!("山田太郎", alternate_script.value());
assert_eq!("ja", alternate_script.language().scheme().code());
assert_eq!(LanguageKind::Bcp47, alternate_script.language().kind());

§Extracting images from the Manifest

use std::fs::{self, File};
use std::path::Path;
use std::io::Write;

// Create an output directory for the extracted images
let dir = Path::new("extracted_images");
fs::create_dir(&dir).unwrap();

for image in epub.manifest().images() {
    // Retrieve the raw image bytes
    let bytes = image.read_bytes().unwrap();

    // Extract the filename from the href and write to disk
    let filename = image.href().name().decode(); // Decode as EPUB hrefs may be URL-encoded
    let mut file = File::create(dir.join(&*filename)).unwrap();
    file.write_all(&bytes).unwrap();
}

§Accessing EpubManifest fallbacks

// Fallbacks
let webm_cover = epub.manifest().cover_image().unwrap();
let kind = webm_cover.resource_kind();
assert_eq!(("image", "webm"), (kind.maintype(), kind.subtype()));

// If the app does not support `webm`; fallback
let avif_cover = webm_cover.fallback().unwrap();
assert_eq!("image/avif", avif_cover.media_type());

// If the app does not support `avif`; fallback
let png_cover = avif_cover.fallback().unwrap();
assert_eq!("image/png", png_cover.media_type());

// No more fallbacks
assert_eq!(None, png_cover.fallback());

Re-exports§

pub use self::ebook::Ebook;
pub use self::epub::Epub;
pub use crate::ebook::epub;

Modules§

ebook
Core format-agnostic Ebook module and implementations.
prelude
The rbook prelude for convenient imports of the core ebook and reader traits.
reader
Sequential + random‐access Ebook Reader module.