ebook 0.1.2

A CLI tool for reading, writing, and operating on various ebook formats
Documentation
# Architecture

## Overview

The **ebook** crate follows a modular, trait-based architecture that separates concerns and enables easy extension for new ebook formats.

## Core Components

### 1. Traits (`src/traits.rs`)

The foundation of the architecture consists of three main traits:

#### `EbookReader`
Provides read operations for ebook formats:
- `read_from_file()` - Load ebook from file
- `get_metadata()` - Extract metadata
- `get_content()` - Extract text content
- `get_toc()` - Get table of contents
- `extract_images()` - Extract embedded images

#### `EbookWriter`
Provides write operations for ebook formats:
- `set_metadata()` - Set ebook metadata
- `set_content()` - Set text content
- `add_chapter()` - Add a chapter
- `add_image()` - Add an image
- `write_to_file()` - Save ebook to file

#### `EbookOperator`
Advanced operations (combines Reader + Writer):
- `convert_to()` - Convert to another format
- `validate()` - Validate ebook structure
- `repair()` - Repair corrupted ebooks

### 2. Format Handlers (`src/formats/`)

Each ebook format has a dedicated handler that implements the core traits:

- **`EpubHandler`** - EPUB (ZIP-based, XML metadata)
- **`MobiHandler`** - MOBI/KF8 (binary format)
- **`Fb2Handler`** - FictionBook 2.0 (XML-based)
- **`CbzHandler`** - Comic Book Archive (ZIP with images)
- **`TxtHandler`** - Plain text files
- **`PdfHandler`** - PDF documents

### 3. Metadata (`src/metadata.rs`)

Unified metadata structure across all formats:
```rust
pub struct Metadata {
    pub title: Option<String>,
    pub author: Option<String>,
    pub publisher: Option<String>,
    pub description: Option<String>,
    pub language: Option<String>,
    pub isbn: Option<String>,
    pub publication_date: Option<String>,
    pub cover_image: Option<Vec<u8>>,
    pub format: Option<String>,
    pub custom_fields: HashMap<String, String>,
}
```

### 4. Error Handling (`src/error.rs`)

Centralized error types using `thiserror`:
- `Io` - File I/O errors
- `Zip` - ZIP archive errors
- `Xml` - XML parsing errors
- `Pdf` - PDF-specific errors
- `UnsupportedFormat` - Unknown format
- `InvalidMetadata` - Metadata validation errors
- `Parse` - General parsing errors
- `Encoding` - Character encoding errors
- `NotFound` - Resource not found
- `InvalidStructure` - Invalid file structure
- `NotSupported` - Unsupported operation

### 5. Utilities (`src/utils.rs`)

Helper functions:
- `detect_format()` - Auto-detect ebook format from file extension
- `sanitize_filename()` - Clean filenames for safe file operations
- `guess_mime_type()` - Determine MIME type from filename

## Design Principles

### 1. Separation of Concerns
Each format handler is independent and self-contained. Changes to one format don't affect others.

### 2. Trait-Based Design
All handlers implement common traits, enabling:
- Polymorphic usage
- Easy testing with mock implementations
- Consistent API across formats

### 3. DRY (Don't Repeat Yourself)
Common functionality is extracted to utilities and shared structures.

### 4. KISS (Keep It Simple, Stupid)
Each component has a single, well-defined responsibility.

### 5. Test-Friendly
Traits enable easy mocking and testing. Each handler can be tested independently.

## Data Flow

### Reading an Ebook
```
File → Handler.read_from_file() → Parse Format → Extract Metadata/Content → Return Data
```

### Writing an Ebook
```
Metadata + Content → Handler.set_*() → Format Data → Handler.write_to_file() → File
```

### Converting
```
Source File → Reader.read_from_file() → Extract Data → Writer.set_*() → Writer.write_to_file() → Target File
```

## Extension Points

### Adding a New Format

1. Create a new handler in `src/formats/`:
```rust
pub struct NewFormatHandler {
    metadata: Metadata,
    content: String,
}
```

2. Implement the core traits:
```rust
impl EbookReader for NewFormatHandler { ... }
impl EbookWriter for NewFormatHandler { ... }
impl EbookOperator for NewFormatHandler { ... }
```

3. Add to `src/formats/mod.rs`:
```rust
pub mod newformat;
pub use newformat::NewFormatHandler;
```

4. Update `src/utils.rs` to detect the format:
```rust
"newext" => Ok("newformat".to_string()),
```

5. Add CLI support in `src/main.rs`

6. Write tests in `tests/newformat_tests.rs`

## Dependencies

### Core
- `clap` - CLI parsing
- `thiserror` - Error type derivation
- `serde` / `serde_json` - Serialization
- `schemars` - JSON Schema for MCP tool arguments
- `tokio` - Async runtime (CLI `mcp` subcommand, rmcp stdio)
- `rmcp` - Model Context Protocol server (stdio transport)
- `log` / `env_logger` - Optional logging (`RUST_LOG`)

### Format-Specific
- `zip` - EPUB, CBZ
- `quick-xml` / `xml-rs` - EPUB, FB2, ComicInfo.xml
- `lopdf` - PDF
- `encoding_rs` - Text encoding detection
- `image` - Image optimize path (EPUB/CBZ)
- `base64` - MCP tool image payloads
- `uuid` - EPUB identifiers

### Testing / bench
- `tempfile` - Temporary test files
- `predicates` - CLI test predicates
- `assert_cmd` - Integration tests invoking the binary
- `criterion` - Benchmarks

## Performance Considerations

- **Lazy Loading**: Content is only parsed when requested
- **Streaming**: Large files can be processed in chunks
- **Memory Efficiency**: Avoid loading entire files into memory when possible
- **Caching**: Parsed metadata is cached in handler structs

## Future Enhancements

- Async I/O support
- Streaming conversion for large files
- Plugin system for custom formats
- Advanced PDF text extraction
- EPUB 3.0 full support
- Metadata editing without full rewrite