pbfhogg
Rust library for reading, writing, and transforming OpenStreetMap PBF files. Designed for planet-scale operations on normal hardware.
Read 59 million elements in 0.31s (parallel) or 1.3s (pipelined, preserving file order). Write them back with pipelined compression in 6.3s. All encoding and decoding is hand-rolled wire-format protobuf - no external protobuf dependencies, no per-element allocation.
Developed on Linux, untested elsewhere. Optional features for O_DIRECT and io_uring are Linux-only.
For the CLI toolkit (pbfhogg-cli), see the CLI crate.
Usage
[]
= "0.3"
Library users who only need read/write can disable the commands feature to skip serde_json and s2 dependencies:
[]
= { = "0.3", = false }
For reverse geocoding queries (memory-mapped index reader), enable just the geocode-reader feature:
[]
= { = "0.3", = false, = ["geocode-reader"] }
Reading
use ;
let reader = from_path?;
// Check if the PBF declares sorted elements
if reader.header.is_sorted
reader.for_each?;
# Ok::
Parallel aggregation
use ;
let reader = from_path?;
let ways = reader.par_map_reduce?;
println!;
# Ok::
Writing
use ;
use ;
let header_bytes = new
.bbox
.sorted
.build?;
let mut writer = to_path?;
let mut bb = new;
bb.add_node;
if let Some = bb.take?
writer.flush?;
# Ok::
In-memory writing
For tests or small PBFs, use PbfWriter::new with any Write impl:
use ;
use ;
let header_bytes = new.sorted.build?;
let mut buf = new;
let mut writer = new;
writer.write_header?;
let mut bb = new;
// ... add elements, write blocks synchronously ...
writer.flush?;
# Ok::
Read modes
| Method | Order | Use case |
|---|---|---|
for_each |
File order | Sequential processing, order-dependent consumers |
for_each_pipelined |
File order | Fastest ordered read - parallel decompression overlapping I/O |
for_each_block_pipelined |
File order | Consumers that need owned PrimitiveBlocks for parallel per-block processing |
into_blocks_pipelined |
File order | Iterator interface - early exit, zipping two files |
par_map_reduce |
Arbitrary | Aggregation (counts, statistics) where order doesn't matter |
for_each_pipelined is the production hot path. It uses a 3-stage pipeline (I/O thread → rayon decode pool → reorder buffer) to overlap reading, decompression, and element processing while preserving file order.
for_each_block_pipelined and into_blocks_pipelined deliver owned PrimitiveBlocks that can be sent to other threads, enabling overlapped I/O + decode + consumer parallelism. into_blocks_pipelined requires R: 'static (ElementReader<FileReader> satisfies this).
HeaderBuilder::from_header(&existing_header) copies bbox and replication metadata from an existing PBF header - useful for transforms that preserve metadata.
Features
| Feature | Description |
|---|---|
commands (default) |
Enables check_refs, extract, geocode index builder, and their deps (serde_json, s2) |
geocode-reader |
Enables geocode_index::Reader for reverse geocoding queries (depends on s2). Included by commands. |
linux-direct-io |
O_DIRECT read/write paths - bypasses page cache for planet-scale I/O |
linux-io-uring |
io_uring writer with registered buffers - 20% faster writes above ~4 GB |
Compression
PbfWriter supports three compression modes via Compression:
| Mode | Description |
|---|---|
Compression::Zlib(level) |
Standard PBF compression (default level 6), compatible with all tools |
Compression::Zstd(level) |
Better ratio and faster decompression than zlib |
Compression::None |
No compression - fastest writes, ideal for erofs or intermediate files |
Zlib uses zlib-rs (pure Rust, no C compiler needed). With pipelined writes (to_path), compression is dispatched to rayon and all modes converge to the decode + serialization floor.
BlobHeader extensions
PbfWriter automatically embeds additional metadata in BlobHeader fields that standard PBF readers silently skip (per protobuf wire format rules for unknown fields):
- Indexdata (field 2): element type, ID range, and spatial bounding box per blob. Enables O(1) blob classification for merge, sort, and spatial filtering without decompression.
- Tagdata (field 4): set of unique tag key strings per blob. Enables skipping decompression of blobs that provably lack required tag keys.
Correctness
See CORRECTNESS.md for parser/encoder edge cases and data representation limits accepted by design, and DEVIATIONS.md for intentional behavioral differences from osmium.
Acknowledgements
pbfhogg started as a fork of osmpbf by Thomas Brüggemann, which provided the foundation for PBF reading in Rust.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.