libpgdump 2.1.0

A Rust library for reading and writing PostgreSQL dump files
Documentation
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project

libpgdump is a Rust library for reading and writing PostgreSQL dump files. Supports all three pg_dump formats: custom (`-Fc`), directory (`-Fd`), and tar (`-Ft`). All four compression algorithms are supported: none, gzip, lz4, zstd (tar format does not support compression).

## Build Commands

```
cargo build
cargo test
cargo test <test_name>    # run a single test
cargo clippy              # lint
cargo fmt                 # format
cargo doc --no-deps       # build docs
just check                # fmt-check + lint + test
just bootstrap            # generate test fixtures (requires Docker)
```

## Architecture

The public API is `Dump::load(path)` / `Dump::save(path)` in `src/dump.rs`. `load()` auto-detects format from file type (directory) or magic bytes (`PGDMP` = custom, `ustar` = tar).

### Format readers/writers

Each format has its own `read_archive` / `write_archive` in `src/format/`:
- `custom.rs` — Custom format (Fc), binary stream with TOC + data blocks
- `directory.rs` — Directory format (Fd), `toc.dat` file + per-entry `.dat` files
- `tar.rs` — Tar format (Ft), standard tar archive with `toc.dat` + data files

All three share the `ArchiveData` intermediate struct (defined in `custom.rs`) that `Dump` converts to/from.

### Core modules

- `src/io/primitives.rs` — Low-level read/write for pg_dump's custom integer, string, and offset encodings
- `src/compress/` — Compression layer (none, gzip, lz4, zstd) with `decompressor`/`compressor` factory functions
- `src/entry.rs` — TOC entry model
- `src/header.rs` — Archive header model
- `src/types.rs` — Core enums: `ObjectType` (50+ pg_dump object types with `section()`, `priority()`, `as_str()`), `Section`, `Format`, `CompressionAlgorithm`, etc.
- `src/sort.rs` — Weighted topological sort of TOC entries, matching pg_dump's `pg_dump_sort.c`
- `src/constants.rs` — Archive magic bytes (`PGDMP`)
- `src/version.rs` — Archive version handling and PG version mapping
- `src/error.rs` — Error types using `thiserror`

## Testing

- Unit tests are inline in each module
- Integration tests in `tests/read_dump.rs` and `tests/round_trip.rs`; shared helpers in `tests/common/mod.rs`
- Fixture-based tests require dump files in `build/data/` (generated via `just bootstrap`)
- Tests gracefully skip when fixtures are not present

## Key Design Decisions

- Integer encoding: 1 sign byte + N magnitude bytes (LSB first). NOT standard little-endian.
- Strings: length-prefixed (pg_dump int) + UTF-8 bytes. Length -1 = NULL.
- Version-aware parsing: fields present/absent based on archive version (1.12.0–1.16.0).
- Object types are the `ObjectType` enum (not strings). `Entry.desc` is `ObjectType`, with `section()` and `priority()` methods. Unknown types round-trip via `ObjectType::Other(String)`.
- TOC entries are sorted on save using weighted topological sort matching pg_dump's algorithm.
- Custom format writes use atomic rename (write to `.tmp`, rename on success) to avoid partial files.