carbonpdf 0.2.0

Production-ready HTML to PDF conversion using Headless Chrome
Documentation
# CarbonPDF Architecture

## Design Philosophy

CarbonPDF is designed with the following principles:

1. **Separation of Concerns** - Clear boundaries between API, rendering, and Chrome interaction
2. **Extensibility** - Trait-based design allows alternative rendering backends
3. **Safety** - No unsafe code, comprehensive error handling, resource cleanup
4. **Ergonomics** - Builder pattern API with sensible defaults
5. **Performance** - Async-first design, efficient resource management

## System Architecture

### Layer 1: Public API

The public API is the only surface users interact with:

- **PdfBuilder** - Fluent builder for configuration
- **InputSource** - Type-safe input variants (HTML/File/URL)
- **PdfConfig** - Complete PDF generation configuration
- **ChromeConfig** - Browser-specific settings

Design decisions:

- Builder pattern for discoverability and flexibility
- Strong typing prevents invalid configurations
- Validation at build time, not runtime

### Layer 2: Renderer Abstraction

The `PdfRenderer` trait decouples the API from implementation:

```rust
#[async_trait]
pub trait PdfRenderer: Send + Sync {
    async fn render(&self, input: InputSource, config: PdfConfig) -> Result<Vec<u8>>;
    fn name(&self) -> &str;
}
```

**Why?**

- Allows future backends (Playwright, wkhtmltopdf, etc.)
- Simplifies testing with mock implementations
- Clear contracts and boundaries

### Layer 3: Chrome Backend

The Chrome backend consists of:

1. **ChromeRenderer** - Implements PdfRenderer
2. **Process Management** - Browser lifecycle
3. **Protocol Layer** - CDP communication

Key responsibilities:

- Launch and manage Chrome process
- Translate PdfConfig to CDP parameters
- Handle errors and timeouts
- Resource cleanup

## Chrome Lifecycle Management

### Initialization

```text
┌─────────────────┐
│ ChromeRenderer  │
│    ::new()      │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Launch Chrome   │
│ with flags      │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Connect to CDP  │
│ via WebSocket   │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Return renderer │
└─────────────────┘
```

### Per-Request Flow

1. Create new browser page (tab)
2. Load content (set_content or goto)
3. Wait for network idle
4. Call Page.printToPDF
5. Receive PDF data
6. Close page (cleanup)

**Why create new pages per request?**

- Isolation between requests
- Parallel processing capability
- Simpler error recovery

### Shutdown

The browser is closed when ChromeRenderer is dropped. Chromiumoxide handles:

- Closing all pages
- Terminating Chrome process
- Cleaning up temporary files

## Error Handling Strategy

CarbonPDF uses typed errors with context:

```rust
pub enum Error {
    ChromeProcess(String),  // Launch/crash
    Protocol(String),        // CDP communication
    InvalidConfig(String),   // User configuration
    InputSource(String),     // File/URL issues
    Network(reqwest::Error), // HTTP errors
    Timeout(u64),            // Timeout exceeded
    // ...
}
```

**Error handling principles:**

- Fail fast with clear messages
- Provide actionable guidance
- Never panic in library code
- Clean up resources on error

## Configuration Design

### Two-tier configuration

1. **PdfConfig** - Document properties (size, margins, etc.)
   - Portable across renderers
   - Validated before rendering

2. **ChromeConfig** - Browser-specific settings
   - Separate from document config
   - Chrome-specific optimizations

This separation ensures:

- Clear ownership of concerns
- Easy to add new renderers
- Testable without Chrome

## Thread Safety

CarbonPDF is designed to be thread-safe:

- `PdfRenderer` requires `Send + Sync`
- `ChromeRenderer` uses `Arc<Browser>`
- No shared mutable state
- Async API is Send-safe

Safe usage pattern:

```rust
lazy_static! {
    static ref RENDERER: ChromeRenderer = /* ... */;
}

// Safe to call from multiple threads/tasks
tokio::spawn(async {
    RENDERER.render(/* ... */).await
});
```

## Resource Management

### Memory

- PDFs are returned as `Vec<u8>` (owned)
- Large PDFs may require streaming (future enhancement)
- Browser pages closed immediately after use

### Process

- Chrome launched once per renderer instance
- Pages (tabs) created per request
- Automatic cleanup on Drop

### Files

- Temporary files handled by chromiumoxide
- No file artifacts left behind
- Input files read asynchronously

## Testing Strategy

### Unit Tests

- Configuration validation
- Error construction
- Input source resolution

### Integration Tests

- End-to-end PDF generation
- Various configurations
- Error scenarios

### Test Isolation

- Tests don't require Chrome (where possible)
- Mock implementations of PdfRenderer
- Fixtures for deterministic tests

## Trade-offs and Limitations

### Current Design

**Pros:**

- Simple, predictable API
- Full Chrome rendering capabilities
- Type-safe configuration
- Clear error messages

**Cons:**

- Chrome dependency (large binary)
- Process overhead per renderer
- No streaming output (yet)
- Single backend currently

### Known Limitations

1. **Chrome Required** - Not embeddable library
   - Mitigation: Clear error messages, auto-download feature (roadmap)

2. **Process Overhead** - Chrome is heavy
   - Mitigation: Reuse renderer instances, connection pooling (roadmap)

3. **No Streaming** - Full PDF in memory
   - Mitigation: Reasonable for most use cases, streaming planned

4. **Synchronous Drop** - Cleanup in drop
   - Mitigation: Works fine, async drop would be ideal but not available

## Future Enhancements

### v0.2

- Connection pooling for high-volume scenarios
- Streaming PDF output
- Automatic Chrome download

### v0.3

- Playwright backend support
- PDF/A compliance mode
- Batch processing utilities

## Security Considerations

1. **No user code execution** - HTML is rendered, not executed as Rust code
2. **Sandbox mode** - Chrome runs sandboxed by default (disable only in trusted environments)
3. **Input validation** - All config validated before processing
4. **No unsafe code** - Pure safe Rust (except in dependencies)
5. **Resource limits** - Timeouts prevent infinite hangs

## Performance Characteristics

- **Browser startup**: ~1-2 seconds (one-time cost)
- **Page creation**: ~50-100ms per page
- **PDF generation**: Depends on content complexity
- **Memory**: ~100MB base + content size

**Optimization recommendations:**

1. Reuse ChromeRenderer instances
2. Keep HTML simple, inline critical CSS
3. Set appropriate timeouts
4. Consider connection pooling for high-volume