pbfhogg 0.3.0

Fast OpenStreetMap PBF reader and writer for Rust. Read, write, and merge .osm.pbf files with pipelined parallel decoding.
Documentation

pbfhogg

Rust library for reading, writing, and transforming OpenStreetMap PBF files. Designed for planet-scale operations on normal hardware.

Read 59 million elements in 0.31s (parallel) or 1.3s (pipelined, preserving file order). Write them back with pipelined compression in 6.3s. All encoding and decoding is hand-rolled wire-format protobuf - no external protobuf dependencies, no per-element allocation.

Developed on Linux, untested elsewhere. Optional features for O_DIRECT and io_uring are Linux-only.

For the CLI toolkit (pbfhogg-cli), see the CLI crate.

Usage

[dependencies]
pbfhogg = "0.3"

Library users who only need read/write can disable the commands feature to skip serde_json and s2 dependencies:

[dependencies]
pbfhogg = { version = "0.3", default-features = false }

For reverse geocoding queries (memory-mapped index reader), enable just the geocode-reader feature:

[dependencies]
pbfhogg = { version = "0.3", default-features = false, features = ["geocode-reader"] }

Reading

use pbfhogg::{ElementReader, Element};

let reader = ElementReader::from_path("input.osm.pbf")?;

// Check if the PBF declares sorted elements
if reader.header().is_sorted() {
    println!("PBF is sorted by type then ID");
}

reader.for_each(|element| {
    if let Element::Way(way) = element {
        // process way
    }
})?;
# Ok::<(), std::io::Error>(())

Parallel aggregation

use pbfhogg::{ElementReader, Element};

let reader = ElementReader::from_path("input.osm.pbf")?;
let ways = reader.par_map_reduce(
    |element| match element {
        Element::Way(_) => 1,
        _ => 0,
    },
    || 0_u64,
    |a, b| a + b,
)?;
println!("Number of ways: {ways}");
# Ok::<(), std::io::Error>(())

Writing

use pbfhogg::write::block_builder::{HeaderBuilder, BlockBuilder};
use pbfhogg::write::writer::{PbfWriter, Compression};

let header_bytes = HeaderBuilder::new()
    .bbox(9.0, 54.0, 13.0, 58.0)
    .sorted()
    .build()?;
let mut writer = PbfWriter::to_path("output.osm.pbf".as_ref(), Compression::default(), &header_bytes)?;

let mut bb = BlockBuilder::new();
bb.add_node(1, 556_761_000, 125_683_000, [("name", "Copenhagen")], None);
if let Some(bytes) = bb.take()? {
    writer.write_primitive_block(bytes)?;
}
writer.flush()?;
# Ok::<(), std::io::Error>(())

In-memory writing

For tests or small PBFs, use PbfWriter::new with any Write impl:

use pbfhogg::write::block_builder::{BlockBuilder, HeaderBuilder};
use pbfhogg::write::writer::{PbfWriter, Compression};

let header_bytes = HeaderBuilder::new().sorted().build()?;
let mut buf = std::io::Cursor::new(Vec::new());
let mut writer = PbfWriter::new(&mut buf, Compression::default());
writer.write_header(&header_bytes)?;

let mut bb = BlockBuilder::new();
// ... add elements, write blocks synchronously ...
writer.flush()?;
# Ok::<(), std::io::Error>(())

Read modes

Method Order Use case
for_each File order Sequential processing, order-dependent consumers
for_each_pipelined File order Fastest ordered read - parallel decompression overlapping I/O
for_each_block_pipelined File order Consumers that need owned PrimitiveBlocks for parallel per-block processing
into_blocks_pipelined File order Iterator interface - early exit, zipping two files
par_map_reduce Arbitrary Aggregation (counts, statistics) where order doesn't matter

for_each_pipelined is the production hot path. It uses a 3-stage pipeline (I/O thread → rayon decode pool → reorder buffer) to overlap reading, decompression, and element processing while preserving file order.

for_each_block_pipelined and into_blocks_pipelined deliver owned PrimitiveBlocks that can be sent to other threads, enabling overlapped I/O + decode + consumer parallelism. into_blocks_pipelined requires R: 'static (ElementReader<FileReader> satisfies this).

HeaderBuilder::from_header(&existing_header) copies bbox and replication metadata from an existing PBF header - useful for transforms that preserve metadata.

Features

Feature Description
commands (default) Enables check_refs, extract, geocode index builder, and their deps (serde_json, s2)
geocode-reader Enables geocode_index::Reader for reverse geocoding queries (depends on s2). Included by commands.
linux-direct-io O_DIRECT read/write paths - bypasses page cache for planet-scale I/O
linux-io-uring io_uring writer with registered buffers - 20% faster writes above ~4 GB

Compression

PbfWriter supports three compression modes via Compression:

Mode Description
Compression::Zlib(level) Standard PBF compression (default level 6), compatible with all tools
Compression::Zstd(level) Better ratio and faster decompression than zlib
Compression::None No compression - fastest writes, ideal for erofs or intermediate files

Zlib uses zlib-rs (pure Rust, no C compiler needed). With pipelined writes (to_path), compression is dispatched to rayon and all modes converge to the decode + serialization floor.

BlobHeader extensions

PbfWriter automatically embeds additional metadata in BlobHeader fields that standard PBF readers silently skip (per protobuf wire format rules for unknown fields):

  • Indexdata (field 2): element type, ID range, and spatial bounding box per blob. Enables O(1) blob classification for merge, sort, and spatial filtering without decompression.
  • Tagdata (field 4): set of unique tag key strings per blob. Enables skipping decompression of blobs that provably lack required tag keys.

Correctness

See CORRECTNESS.md for parser/encoder edge cases and data representation limits accepted by design, and DEVIATIONS.md for intentional behavioral differences from osmium.

Acknowledgements

pbfhogg started as a fork of osmpbf by Thomas Brüggemann, which provided the foundation for PBF reading in Rust.

License

Licensed under either of

at your option.