ebook 0.1.2

A CLI tool for reading, writing, and operating on various ebook formats
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
# ebook

A comprehensive Rust tool for reading, writing, and operating on various ebook formats. Available as a **CLI**, **MCP server** (via [rmcp](https://crates.io/crates/rmcp), the Rust Model Context Protocol SDK), and a **Rust library**.

## Why this project (and the MCP server) exists

Most long-form knowledge still lives in **ebook containers**, not in clean Markdown or HTML on disk. EPUB, Kindle (MOBI/AZW/KF8), FB2, CBZ, and PDF package text, structure, fonts, images, and metadata in ways that general-purpose file tools and LLM context windows do not understand out of the box. Assistants and automation therefore hit a wall: they cannot reliably **open, navigate, validate, convert, or summarize** those files without a dedicated format layer.

This crate exists to be that layer: one API (and one MCP surface) over many ebook formats so tools and agents can treat books like first-class data—whether you run it as `ebook …` in a shell, embed it in Rust, or attach the MCP server to a client so the model can call **`read_ebook`**, **`convert_ebook`**, **`validate_ebook`**, and friends on real paths.

### What you get

- **Format detection** - Identify EPUB, MOBI, AZW, PDF, CBZ, FB2, TXT, and more from structure and extension
- **Metadata and TOC** - Titles, authors, chapters, and navigation where the format supports it
- **Content and assets** - Text plus image extraction where applicable
- **Conversion and repair** - Pipeline between supported formats and basic healing of damaged files
- **Agent-ready MCP** - Standard protocol and tool schemas so clients do not reimplement ZIP/XML/PDF/MOBI stacks

This crate ties these capabilities together for CLI use, library use, and MCP-hosted assistants.

## Supported Formats

- **EPUB** (2.0 & 3.0) - Electronic Publication format
- **MOBI** - Mobipocket format
- **AZW** - Kindle format with DRM detection
- **AZW3 (KF8)** - Kindle Format 8
- **FB2** - FictionBook 2.0
- **CBZ** - Comic Book Archive with ComicInfo.xml support
- **TXT** - Plain text files with encoding detection
- **PDF** - Portable Document Format

## Features

### Core Operations
- ✅ Read ebook metadata, content, and table of contents
- ✅ Write/create ebooks in all supported formats
- ✅ Extract images from ebooks (EPUB, CBZ, PDF)
- ✅ Validate ebook file structure and integrity
- ✅ Repair corrupted ebook files
- ✅ Convert between formats (TXT ↔ EPUB, TXT ↔ PDF, TXT ↔ MOBI, EPUB → PDF, etc.)

### Advanced Features
- **Image optimization** - Resize and compress images in EPUB/CBZ files
-**Streaming support** - Handle large files efficiently (10MB+ TXT, 50MB+ EPUB)
-**Progress indicators** - Visual feedback for long operations
-**Encoding detection** - Automatic character encoding detection for TXT files
-**Format auto-detection** - Works based on file extension

### Integration
- **MCP Server** - AI assistant integration via Model Context Protocol
-**Library API** - Use as a Rust library in your projects
-**CLI** - Full-featured command-line interface

## Installation

### From source

```bash
git clone https://github.com/yingkitw/ebook.git
cd ebook
cargo build --release
```

The binary will be available at `target/release/ebook` (repository root).

### As a library

Add to your `Cargo.toml`:

```toml
[dependencies]
ebook = "0.1.2"
```

## Usage

### CLI Examples

#### Read an ebook

```bash
# Display full content
ebook read book.epub

# Show metadata only (title, author, etc.)
ebook read book.epub --metadata

# Show table of contents
ebook read book.epub --toc

# Extract images to a directory
ebook read book.epub --extract-images ./images

# Read specific format (auto-detected by extension)
ebook read comic.cbz
ebook read novel.mobi
ebook read document.pdf
```

#### Write/Create an ebook

```bash
# Create from a text file
ebook write output.txt --format txt --title "My Book" --author "John Doe" --content input.txt

# Create an EPUB with all metadata
ebook write output.epub --format epub \
  --title "My Novel" \
  --author "Jane Smith" \
  --publisher "My Press" \
  --isbn "978-0-1234567-8-9" \
  --content story.txt

# Create a PDF
ebook write output.pdf --format pdf --title "Document" --content text.txt

# Create a CBZ comic archive
ebook write comic.cbz --format cbz --title "Super Comic" --content pages/
```

#### Get ebook information

```bash
# Quick info display
ebook info book.epub

# Output example:
# Format: EPUB
# Title: The Great Book
# Author: John Doe
# Size: 1.2 MB
# Valid: Yes
```

#### Validate an ebook

```bash
# Validate file structure
ebook validate book.epub

# Returns detailed validation results
ebook validate --verbose book.epub
```

#### Repair an ebook

```bash
# Repair in place (creates backup)
ebook repair book.epub

# Repair and save to new file
ebook repair book.epub --output book_fixed.epub
```

#### Convert between formats

```bash
# TXT to EPUB (for e-readers)
ebook convert novel.txt novel.epub

# EPUB to PDF (for printing/sharing)
ebook convert book.epub book.pdf

# MOBI to TXT (extract text)
ebook convert kindle.mobi article.txt

# FB2 to EPUB
ebook convert book.fb2 book.epub
```

#### Optimize images in ebooks

```bash
# Optimize all images in an EPUB (reduces file size)
ebook optimize book.epub

# Custom dimensions and quality
ebook optimize comic.cbz --max-width 1200 --max-height 1600 --quality 80

# Optimize without resizing (compression only)
ebook optimize photo-album.epub --no-resize --quality 75
```

### MCP server (Model Context Protocol)

The MCP server uses **rmcp** on **stdio** (newline-delimited JSON-RPC), matching what mainstream MCP clients expect. It exposes the same ebook operations as tools with JSON Schema arguments generated from Rust types—no hand-maintained protocol loop.

#### Starting the server

```bash
ebook mcp
```

Clients must complete the normal MCP handshake: send **`initialize`** with a valid **`params`** object, then send **`notifications/initialized`** after receiving the **`initialize`** result, before **`tools/list`** or **`tools/call`**. Hosted clients (Claude Desktop, Cursor, etc.) do this automatically.

#### Available MCP tools

| Tool | Description |
|------|-------------|
| `read_ebook` | Read content, metadata, and table of contents |
| `write_ebook` | Create new ebooks in any supported format |
| `extract_images` | Extract images from ebooks |
| `validate_ebook` | Validate ebook file structure |
| `get_ebook_info` | Get detailed ebook information |
| `convert_ebook` | Convert between formats |
| `optimize_images` | Optimize images in EPUB/CBZ files |

#### Quick Setup for Claude Desktop

Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "ebook": {
      "command": "/path/to/ebook/target/release/ebook",
      "args": ["mcp"]
    }
  }
}
```

#### Example AI workflows

**Summarize a book:**
```
User: Read the ebook at ~/Documents/book.epub and summarize chapter 1
Claude: [Uses read_ebook tool, analyzes content, provides summary]
```

**Convert a document:**
```
User: Convert ~/Downloads/novel.txt to EPUB format
Claude: [Uses convert_ebook tool, creates novel.epub]
```

**Extract images:**
```
User: Extract all images from the comic book at ~/comics/issue1.cbz
Claude: [Uses extract_images tool, returns images with metadata]
```

See [docs/MCP.md](docs/MCP.md) for tool parameters and examples.

## Library usage

Use the **`ebook`** crate as a Rust library for formats, conversion, and MCP hosting.

### Embed the MCP server (rmcp)

You can run the same tool surface from your own binary using **`EbookMcp`** and rmcp’s **`ServiceExt`** (see the [rmcp crate](https://docs.rs/rmcp) for transports other than stdio):

```rust
use ebook::mcp::EbookMcp;
use rmcp::{ServiceExt, transport::stdio};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let service = EbookMcp::new().serve(stdio()).await?;
    service.waiting().await?;
    Ok(())
}
```

`McpServer` in `ebook::mcp` is the thin wrapper used by the `ebook mcp` CLI subcommand.

### Basic example (formats API)

```rust
use ebook::formats::TxtHandler;
use ebook::traits::{EbookReader, EbookWriter};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Read a text file
    let mut handler = TxtHandler::new();
    handler.read_from_file("book.txt".as_ref())?;

    // Get content
    let content = handler.get_content()?;
    println!("{}", content);

    // Get metadata
    let metadata = handler.get_metadata()?;
    println!("Title: {:?}", metadata.title);

    Ok(())
}
```

### Working with different formats

```rust
use ebook::formats::{EpubHandler, MobiHandler, PdfHandler};
use ebook::traits::EbookReader;

// Read EPUB
let mut epub = EpubHandler::new();
epub.read_from_file("book.epub".as_ref())?;
let toc = epub.get_toc()?;
println!("Table of Contents: {:?}", toc);

// Read MOBI
let mut mobi = MobiHandler::new();
mobi.read_from_file("kindle.mobi".as_ref())?;
let metadata = mobi.get_metadata()?;

// Read PDF
let mut pdf = PdfHandler::new();
pdf.read_from_file("document.pdf".as_ref())?;
let content = pdf.get_content()?;
```

### Format detection

```rust
use ebook::utils::detect_format;

let format = detect_format("book.epub".as_ref())?;
assert_eq!(format, "epub");
```

### Conversion

```rust
use ebook::Converter;

Converter::convert(
    "input.txt".as_ref(),
    "output.epub".as_ref(),
    "epub",
)?;
```

See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed library documentation.

## Architecture

The project follows a trait-based architecture for consistent API across all formats:

### Core Traits

- **`EbookReader`** - Read operations: content, metadata, table of contents, images
- **`EbookWriter`** - Write operations: create ebooks with content and metadata
- **`EbookOperator`** - Advanced operations: convert, validate, repair

### Format Handlers

Each format has a dedicated handler implementing all applicable traits:

| Handler | Read | Write | Metadata | TOC | Convert | Images |
|---------|------|-------|----------|-----|---------|--------|
| `EpubHandler` |||||||
| `MobiHandler` |||||||
| `AzwHandler` |||||||
| `Fb2Handler` |||||||
| `CbzHandler` |||||||
| `TxtHandler` |||||||
| `PdfHandler` |||||||

### Key Features

- **Streaming** - Large files are processed in chunks (10MB+ TXT, 50MB+ EPUB)
- **Progress bars** - Visual feedback for long-running operations
- **Error recovery** - Helpful error messages with suggestions
- **Thread-safe** - Safe for concurrent use

## Project Status

**Version:** 0.1.1

**License:** Apache-2.0

**Test Status:** ✅ All 103 tests passing

**Supported Platforms:** macOS, Linux, Windows (Rust-supported platforms)

**Recent Updates:**
- MCP server implemented with **rmcp** (stdio, spec handshake, schema-derived tools); library exposes **`EbookMcp`** for embedding
- AZW format support with DRM detection
- Image optimization for EPUB/CBZ files
- EPUB 3.0 support (nav.xhtml, semantic markup, version switching)
- Streaming for large file handling (10MB+ TXT, 50MB+ EPUB thresholds)
- Comprehensive format conversion with CLI and MCP integration
- Progress indicators for long operations
- 103 comprehensive tests with full coverage

**Planned Features:**
- DJVU and CHM format support
- OCR for scanned PDFs
- Enhanced metadata editing
- Web service API
- Batch processing

See [TODO.md](TODO.md) for complete roadmap and known issues.

## Documentation

- [ARCHITECTURE.md]ARCHITECTURE.md - Detailed architecture documentation
- [SPEC.md]SPEC.md - Original specification document
- [docs/MCP.md]docs/MCP.md - MCP server integration guide
- [TODO.md]TODO.md - Development roadmap and known issues

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

Licensed under the Apache License, Version 2.0 ([LICENSE](https://www.apache.org/licenses/LICENSE-2.0) or http://www.apache.org/licenses/LICENSE-2.0)

## Development

### Build

```bash
# Debug build
cargo build

# Release build (optimized)
cargo build --release
```

### Run tests

```bash
# Run all tests
cargo test

# Run specific test
cargo test test_epub_read

# Run with output
cargo test -- --nocapture

# Run tests in parallel
cargo test -- --test-threads=4
```

**Test Coverage:** 103 tests covering:
- Format handlers (EPUB, MOBI, AZW, FB2, CBZ, TXT, PDF)
- CLI integration tests
- MCP integration tests
- Conversion tests
- Streaming tests
- Image optimization tests
- EPUB 3.0 features
- Error handling

### Run benchmarks

```bash
# Performance benchmarks (requires criterion)
cargo bench
```

Benchmarks available for:
- EPUB read/write performance
- CBZ read/write performance
- Image optimization performance

### Example files

```bash
# Run with example file
cargo run -- read examples/sample.txt

# Create an EPUB
cargo run -- write output.epub --format epub --title "Test" --content examples/sample.txt
```

### Enable logging

```bash
# Info level
RUST_LOG=info cargo run -- read book.epub

# Debug level (verbose)
RUST_LOG=debug cargo run -- read book.epub

# Trace level (very verbose)
RUST_LOG=trace cargo run -- read book.epub
```