# ebook
A comprehensive Rust tool for reading, writing, and operating on various ebook formats. Available as a **CLI**, **MCP server** (via [rmcp](https://crates.io/crates/rmcp), the Rust Model Context Protocol SDK), and a **Rust library**.
## Why this project (and the MCP server) exists
Most long-form knowledge still lives in **ebook containers**, not in clean Markdown or HTML on disk. EPUB, Kindle (MOBI/AZW/KF8), FB2, CBZ, and PDF package text, structure, fonts, images, and metadata in ways that general-purpose file tools and LLM context windows do not understand out of the box. Assistants and automation therefore hit a wall: they cannot reliably **open, navigate, validate, convert, or summarize** those files without a dedicated format layer.
This crate exists to be that layer: one API (and one MCP surface) over many ebook formats so tools and agents can treat books like first-class data—whether you run it as `ebook …` in a shell, embed it in Rust, or attach the MCP server to a client so the model can call **`read_ebook`**, **`convert_ebook`**, **`validate_ebook`**, and friends on real paths.
### What you get
- **Format detection** - Identify EPUB, MOBI, AZW, PDF, CBZ, FB2, TXT, and more from structure and extension
- **Metadata and TOC** - Titles, authors, chapters, and navigation where the format supports it
- **Content and assets** - Text plus image extraction where applicable
- **Conversion and repair** - Pipeline between supported formats and basic healing of damaged files
- **Agent-ready MCP** - Standard protocol and tool schemas so clients do not reimplement ZIP/XML/PDF/MOBI stacks
This crate ties these capabilities together for CLI use, library use, and MCP-hosted assistants.
## Supported Formats
- **EPUB** (2.0 & 3.0) - Electronic Publication format
- **MOBI** - Mobipocket format
- **AZW** - Kindle format with DRM detection
- **AZW3 (KF8)** - Kindle Format 8
- **FB2** - FictionBook 2.0
- **CBZ** - Comic Book Archive with ComicInfo.xml support
- **TXT** - Plain text files with encoding detection
- **PDF** - Portable Document Format
## Features
### Core Operations
- ✅ Read ebook metadata, content, and table of contents
- ✅ Write/create ebooks in all supported formats
- ✅ Extract images from ebooks (EPUB, CBZ, PDF)
- ✅ Validate ebook file structure and integrity
- ✅ Repair corrupted ebook files
- ✅ Convert between formats (TXT ↔ EPUB, TXT ↔ PDF, TXT ↔ MOBI, EPUB → PDF, etc.)
### Advanced Features
- ✅ **Image optimization** - Resize and compress images in EPUB/CBZ files
- ✅ **Streaming support** - Handle large files efficiently (10MB+ TXT, 50MB+ EPUB)
- ✅ **Progress indicators** - Visual feedback for long operations
- ✅ **Encoding detection** - Automatic character encoding detection for TXT files
- ✅ **Format auto-detection** - Works based on file extension
### Integration
- ✅ **MCP Server** - AI assistant integration via Model Context Protocol
- ✅ **Library API** - Use as a Rust library in your projects
- ✅ **CLI** - Full-featured command-line interface
## Installation
### From source
```bash
git clone https://github.com/yingkitw/ebook.git
cd ebook
cargo build --release
```
The binary will be available at `target/release/ebook` (repository root).
### As a library
Add to your `Cargo.toml`:
```toml
[dependencies]
ebook = "0.1.2"
```
## Usage
### CLI Examples
#### Read an ebook
```bash
# Display full content
ebook read book.epub
# Show metadata only (title, author, etc.)
ebook read book.epub --metadata
# Show table of contents
ebook read book.epub --toc
# Extract images to a directory
ebook read book.epub --extract-images ./images
# Read specific format (auto-detected by extension)
ebook read comic.cbz
ebook read novel.mobi
ebook read document.pdf
```
#### Write/Create an ebook
```bash
# Create from a text file
ebook write output.txt --format txt --title "My Book" --author "John Doe" --content input.txt
# Create an EPUB with all metadata
ebook write output.epub --format epub \
--title "My Novel" \
--author "Jane Smith" \
--publisher "My Press" \
--isbn "978-0-1234567-8-9" \
--content story.txt
# Create a PDF
ebook write output.pdf --format pdf --title "Document" --content text.txt
# Create a CBZ comic archive
ebook write comic.cbz --format cbz --title "Super Comic" --content pages/
```
#### Get ebook information
```bash
# Quick info display
ebook info book.epub
# Output example:
# Format: EPUB
# Title: The Great Book
# Author: John Doe
# Size: 1.2 MB
# Valid: Yes
```
#### Validate an ebook
```bash
# Validate file structure
ebook validate book.epub
# Returns detailed validation results
ebook validate --verbose book.epub
```
#### Repair an ebook
```bash
# Repair in place (creates backup)
ebook repair book.epub
# Repair and save to new file
ebook repair book.epub --output book_fixed.epub
```
#### Convert between formats
```bash
# TXT to EPUB (for e-readers)
ebook convert novel.txt novel.epub
# EPUB to PDF (for printing/sharing)
ebook convert book.epub book.pdf
# MOBI to TXT (extract text)
ebook convert kindle.mobi article.txt
# FB2 to EPUB
ebook convert book.fb2 book.epub
```
#### Optimize images in ebooks
```bash
# Optimize all images in an EPUB (reduces file size)
ebook optimize book.epub
# Custom dimensions and quality
ebook optimize comic.cbz --max-width 1200 --max-height 1600 --quality 80
# Optimize without resizing (compression only)
ebook optimize photo-album.epub --no-resize --quality 75
```
### MCP server (Model Context Protocol)
The MCP server uses **rmcp** on **stdio** (newline-delimited JSON-RPC), matching what mainstream MCP clients expect. It exposes the same ebook operations as tools with JSON Schema arguments generated from Rust types—no hand-maintained protocol loop.
#### Starting the server
```bash
ebook mcp
```
Clients must complete the normal MCP handshake: send **`initialize`** with a valid **`params`** object, then send **`notifications/initialized`** after receiving the **`initialize`** result, before **`tools/list`** or **`tools/call`**. Hosted clients (Claude Desktop, Cursor, etc.) do this automatically.
#### Available MCP tools
| `read_ebook` | Read content, metadata, and table of contents |
| `write_ebook` | Create new ebooks in any supported format |
| `extract_images` | Extract images from ebooks |
| `validate_ebook` | Validate ebook file structure |
| `get_ebook_info` | Get detailed ebook information |
| `convert_ebook` | Convert between formats |
| `optimize_images` | Optimize images in EPUB/CBZ files |
#### Quick Setup for Claude Desktop
Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"ebook": {
"command": "/path/to/ebook/target/release/ebook",
"args": ["mcp"]
}
}
}
```
#### Example AI workflows
**Summarize a book:**
```
User: Read the ebook at ~/Documents/book.epub and summarize chapter 1
Claude: [Uses read_ebook tool, analyzes content, provides summary]
```
**Convert a document:**
```
User: Convert ~/Downloads/novel.txt to EPUB format
Claude: [Uses convert_ebook tool, creates novel.epub]
```
**Extract images:**
```
User: Extract all images from the comic book at ~/comics/issue1.cbz
Claude: [Uses extract_images tool, returns images with metadata]
```
See [docs/MCP.md](docs/MCP.md) for tool parameters and examples.
## Library usage
Use the **`ebook`** crate as a Rust library for formats, conversion, and MCP hosting.
### Embed the MCP server (rmcp)
You can run the same tool surface from your own binary using **`EbookMcp`** and rmcp’s **`ServiceExt`** (see the [rmcp crate](https://docs.rs/rmcp) for transports other than stdio):
```rust
use ebook::mcp::EbookMcp;
use rmcp::{ServiceExt, transport::stdio};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let service = EbookMcp::new().serve(stdio()).await?;
service.waiting().await?;
Ok(())
}
```
`McpServer` in `ebook::mcp` is the thin wrapper used by the `ebook mcp` CLI subcommand.
### Basic example (formats API)
```rust
use ebook::formats::TxtHandler;
use ebook::traits::{EbookReader, EbookWriter};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read a text file
let mut handler = TxtHandler::new();
handler.read_from_file("book.txt".as_ref())?;
// Get content
let content = handler.get_content()?;
println!("{}", content);
// Get metadata
let metadata = handler.get_metadata()?;
println!("Title: {:?}", metadata.title);
Ok(())
}
```
### Working with different formats
```rust
use ebook::formats::{EpubHandler, MobiHandler, PdfHandler};
use ebook::traits::EbookReader;
// Read EPUB
let mut epub = EpubHandler::new();
epub.read_from_file("book.epub".as_ref())?;
let toc = epub.get_toc()?;
println!("Table of Contents: {:?}", toc);
// Read MOBI
let mut mobi = MobiHandler::new();
mobi.read_from_file("kindle.mobi".as_ref())?;
let metadata = mobi.get_metadata()?;
// Read PDF
let mut pdf = PdfHandler::new();
pdf.read_from_file("document.pdf".as_ref())?;
let content = pdf.get_content()?;
```
### Format detection
```rust
use ebook::utils::detect_format;
let format = detect_format("book.epub".as_ref())?;
assert_eq!(format, "epub");
```
### Conversion
```rust
use ebook::Converter;
Converter::convert(
"input.txt".as_ref(),
"output.epub".as_ref(),
"epub",
)?;
```
See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed library documentation.
## Architecture
The project follows a trait-based architecture for consistent API across all formats:
### Core Traits
- **`EbookReader`** - Read operations: content, metadata, table of contents, images
- **`EbookWriter`** - Write operations: create ebooks with content and metadata
- **`EbookOperator`** - Advanced operations: convert, validate, repair
### Format Handlers
Each format has a dedicated handler implementing all applicable traits:
| `EpubHandler` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| `MobiHandler` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| `AzwHandler` | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ |
| `Fb2Handler` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| `CbzHandler` | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
| `TxtHandler` | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |
| `PdfHandler` | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
### Key Features
- **Streaming** - Large files are processed in chunks (10MB+ TXT, 50MB+ EPUB)
- **Progress bars** - Visual feedback for long-running operations
- **Error recovery** - Helpful error messages with suggestions
- **Thread-safe** - Safe for concurrent use
## Project Status
**Version:** 0.1.1
**License:** Apache-2.0
**Test Status:** ✅ All 103 tests passing
**Supported Platforms:** macOS, Linux, Windows (Rust-supported platforms)
**Recent Updates:**
- MCP server implemented with **rmcp** (stdio, spec handshake, schema-derived tools); library exposes **`EbookMcp`** for embedding
- AZW format support with DRM detection
- Image optimization for EPUB/CBZ files
- EPUB 3.0 support (nav.xhtml, semantic markup, version switching)
- Streaming for large file handling (10MB+ TXT, 50MB+ EPUB thresholds)
- Comprehensive format conversion with CLI and MCP integration
- Progress indicators for long operations
- 103 comprehensive tests with full coverage
**Planned Features:**
- DJVU and CHM format support
- OCR for scanned PDFs
- Enhanced metadata editing
- Web service API
- Batch processing
See [TODO.md](TODO.md) for complete roadmap and known issues.
## Documentation
- [ARCHITECTURE.md](ARCHITECTURE.md) - Detailed architecture documentation
- [SPEC.md](SPEC.md) - Original specification document
- [docs/MCP.md](docs/MCP.md) - MCP server integration guide
- [TODO.md](TODO.md) - Development roadmap and known issues
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
Licensed under the Apache License, Version 2.0 ([LICENSE](https://www.apache.org/licenses/LICENSE-2.0) or http://www.apache.org/licenses/LICENSE-2.0)
## Development
### Build
```bash
# Debug build
cargo build
# Release build (optimized)
cargo build --release
```
### Run tests
```bash
# Run all tests
cargo test
# Run specific test
cargo test test_epub_read
# Run with output
cargo test -- --nocapture
# Run tests in parallel
cargo test -- --test-threads=4
```
**Test Coverage:** 103 tests covering:
- Format handlers (EPUB, MOBI, AZW, FB2, CBZ, TXT, PDF)
- CLI integration tests
- MCP integration tests
- Conversion tests
- Streaming tests
- Image optimization tests
- EPUB 3.0 features
- Error handling
### Run benchmarks
```bash
# Performance benchmarks (requires criterion)
cargo bench
```
Benchmarks available for:
- EPUB read/write performance
- CBZ read/write performance
- Image optimization performance
### Example files
```bash
# Run with example file
cargo run -- read examples/sample.txt
# Create an EPUB
cargo run -- write output.epub --format epub --title "Test" --content examples/sample.txt
```
### Enable logging
```bash
# Info level
RUST_LOG=info cargo run -- read book.epub
# Debug level (verbose)
RUST_LOG=debug cargo run -- read book.epub
# Trace level (very verbose)
RUST_LOG=trace cargo run -- read book.epub
```