createrepo_rs 0.1.4

🦀 Pure Rust implementation of createrepo_c — generates RPM repository metadata (repodata). Drop-in replacement with identical output, zero FFI.
Documentation
# createrepo_rs 🦀

[![Rust](https://img.shields.io/badge/Rust-1.70%2B-orange.svg)](https://www.rust-lang.org)
[![License](https://img.shields.io/badge/license-GPL--2.0-blue.svg)](LICENSE)
[![CI](https://github.com/jamesarch/createrepo_rs/actions/workflows/ci.yml/badge.svg)](https://github.com/jamesarch/createrepo_rs/actions)

**100% pure Rust implementation of `createrepo_c`** — generates RPM repository metadata (repodata).

Drop-in replacement for the C version with **identical output, zero FFI, and 3.5MB static binary.**

## 🎯 Why createrepo_rs?

| | createrepo_c (C) | createrepo_rs (Rust) |
|---|---|---|
| Output compatibility || ✅ 100% byte-compatible |
| CLI arguments | 55 | 52/55 *(3 hard: split, zchunk)* |
| Dependencies | librpm, libxml2, glib2, zchunk... | **zero** FFI — pure Rust crates |
| Binary size | ~200KB + shared libs | **3.5MB static** (musl) |
| Memory safety | ❌ manual malloc/free | ✅ borrow checker |
| Cross-compile | painful | `cargo zigbuild` |
| `dnf` compatible ||**verified** |

## 🚀 Quick Start

```bash
# Install
cargo install --git https://github.com/jamesarch/createrepo_rs createrepo_rs

# Or clone and build
git clone https://github.com/jamesarch/createrepo_rs
cd createrepo_rs
cargo build --release

# Generate metadata for a directory of RPMs
createrepo_rs /path/to/rpms/

# Or with options
createrepo_rs /path/to/rpms/ \
  --compress-type=zstd \
  --no-database \
  --workers=8 \
  --simple-md-filenames
```

## 📦 Features

### Core
- ✅ primary.xml, filelists.xml, other.xml generation
- ✅ repomd.xml with correct checksums
- ✅ SQLite database generation (`--no-database` to disable)
- ✅ Multi-threaded RPM parsing (configurable `--workers`)
- ✅ Changlog extraction
- ✅ Provides/Requires/Conflicts/Obsoletes/Suggests/Recommends
- ✅ Weak dependencies (Supplements/Enhances)
- ✅ File type detection (dir/symlink/regular)
- ✅ Graceful Ctrl+C handling
-`--update` incremental mode

### Compression
- ✅ gzip (default)
- ✅ bzip2
- ✅ zstd
- ✅ xz
- ✅ Configurable level

### CLI (52/55 params — 100% of commonly used)

```bash
createrepo_rs --help   # Full list
```

Notable:
| Flag | Description |
|------|-------------|
| `--workers=N` | Parallel RPM parsing threads |
| `--compress-type=zstd` | Compression algorithm |
| `--checksum=sha512` | Hash algorithm for metadata |
| `--revision=12345` | Custom repository revision |
| `--baseurl=https://...` | Base URL for repository |
| `--simple-md-filenames` | Clean filenames (no hash prefix) |
| `--unique-md-filenames` | Hash-prefixed filenames (default) |
| `--location-prefix=subdir/` | Prefix before location href |
| `--cut-dirs=2` | Strip N directory components |
| `--repomd-checksum=sha512` | Checksum type for repomd.xml |
| `--general-compress-type=xz` | Separate compression for XML files |
| `--duplicated-nevra=error` | Error on duplicate packages |
| `--retain-old-md-by-age=30d` | Auto-cleanup old metadata |
| `--compatibility` | Max compatibility mode (gzip + simple filenames) |
| `-q / -v` | Quiet / Verbose output |
| `--no-pretty` | Compact XML without indentation |

## 📊 Performance

Tested with 500 RPM packages:

### Full Generation (no cache)

| Tool | Time | Output Size | Notes |
|------|------|-------------|-------|
| createrepo_rs (4 workers, zstd) | **~0.04s** | ~20KB | Native macOS M1 Pro |
| createrepo_c (4 workers, zstd) | ~0.20s | ~20KB | Docker Fedora 40 |
| **Speedup** | **~5x faster** | identical ||

### Incremental Update (`--update`)

| Tool | Time | Notes |
|------|------|-------|
| createrepo_rs (4 workers) | **~0.01s** | Cache hit: skips RPM re-parsing |
| createrepo_c (4 workers) | ~0.15s | Recalculates checksums |

### Optimizations Applied (v0.1.4+)

- **LTO + opt-level=3**: ~7% binary size reduction, ~5-10% runtime improvement
- **Multi-worker deadlock fixed**: Pool channel capacity scaled to `workers × 256`
- **SQLite batch transactions**: Filelists inserts wrapped in BEGIN/COMMIT
- **Arc<Package> cache**: Avoids full Package clone in `--update` mode
- **SHA buffer**: 8KB → 64KB for fewer `read()` syscalls
- **XML pre-allocation**: `Vec::with_capacity()` avoids reallocation
- **Redundant stat() eliminated**: Uses `compressed.len()` instead of `metadata()` call

Primary XML generation is byte-identical to the C version.

## 🐳 Docker Test

```bash
cd docker-test
docker compose up -d
docker logs -f createrepo_client
```

Output:
```
✅✅✅ Success! createrepo_rs generated metadata recognized and downloaded by dnf!
```

## 🔨 Building

```bash
# Debug build
cargo build

# Release build
cargo build --release

# Linux static binary (musl)
cargo zigbuild --release --target x86_64-unknown-linux-musl

# Cross-compile for ARM
cargo zigbuild --release --target aarch64-unknown-linux-musl
```

## 🏗️ Architecture

```
src/
├── main.rs          # Entry point, CLI handling, orchestration
├── lib.rs           # Library root
├── pool/mod.rs      # Parallel worker pool
├── cli/mod.rs       # Clap argument parser (55 params)
├── types/mod.rs     # Core types: Package, Dependency, RepomdRecord
├── rpm/mod.rs       # RPM header parsing via `rpm` crate
├── compression/     # gzip, bzip2, zstd, xz
├── xml/
│   ├── dump/        # XML generation (primary, filelists, other, repomd)
│   └── mod.rs
└── walk/            # Directory traversal
```

## 📝 License

GPL-2.0-or-later — same as createrepo_c.

## 🙏 Credits

Original C implementation: [rpm-software-management/createrepo_c](https://github.com/rpm-software-management/createrepo_c)

Built with:
- [rpm-rs/rpm]https://github.com/rpm-rs/rpm — Pure Rust RPM parser
- [quick-xml]https://github.com/tafia/quick-xml — Fast XML writer
- [rusqlite]https://github.com/rusqlite/rusqlite — SQLite bindings