hibp_bin_fetch/lib.rs
1//! Downloads the Have I Been Pwned password hash database and converts it to a
2//! compact 6-byte binary format for use with [hibp-verifier](https://crates.io/crates/hibp-verifier).
3//!
4//! **This is not a general-purpose HIBP downloader.** It produces a custom binary
5//! format (sha1t48) specifically designed for fast password breach checking with
6//! `hibp-verifier`. If you need the original HIBP data format, use the official
7//! [Pwned Passwords downloader](https://haveibeenpwned.com/Passwords).
8//!
9//! # Binary Format
10//!
11//! This tool produces 1,048,576 binary files (one per 5-character hex prefix), each
12//! containing sorted 6-byte records. The first 2.5 bytes of each SHA1 hash are encoded
13//! in the filename, so we store only bytes 2-7 (the 6-byte suffix) in each record.
14//!
15//! Each prefix file (e.g., `00000.bin`, `FFFFF.bin`) contains:
16//!
17//! - Fixed 6-byte records (bytes 2-7 of the SHA1 hash)
18//! - Sorted in ascending order
19//! - Direct indexing: record N is at byte offset N * 6
20//!
21//! This enables O(log n) binary search with no parsing overhead, which is exactly
22//! what `hibp-verifier` uses for sub-microsecond lookups.
23//!
24//! # Why This Format?
25//!
26//! The HIBP dataset contains approximately 900 million SHA1 password hashes. Storing
27//! the full 20-byte hash for each entry requires significant space. By truncating to
28//! 48 bits (6 bytes), we reduce storage from 77 GB to 13 GB while maintaining an
29//! acceptably low collision probability.
30//!
31//! With ~900 million entries, the expected number of collisions is less than 1. For
32//! password breach checking, a false positive (incorrectly marking a password as
33//! breached) is harmless—it only causes a user to choose a different password.
34//!
35//! # Installation
36//!
37//! ```sh
38//! cargo install hibp-bin-fetch
39//! ```
40//!
41//! # Usage
42//!
43//! Download the full dataset:
44//!
45//! ```sh
46//! hibp-bin-fetch --output ./hibp-data
47//! ```
48//!
49//! Then use [hibp-verifier](https://crates.io/crates/hibp-verifier) to check passwords
50//! against the downloaded dataset.
51
52pub mod conversion;
53pub mod error;
54pub mod worker;
55
56pub use conversion::{hex_to_nibble, line_to_sha1t48, prefix_to_hex};
57pub use error::Error;
58pub use worker::{get_completed_prefixes, worker};
59
60/// Total number of prefix files (16^5 = 1,048,576)
61pub const TOTAL_PREFIXES: u32 = 0x100000;