iscrawl

Tiny crawler/bot detection from User-Agent strings.

sub-140ns per call
zero allocations
zero dependencies

Install

[dependencies]
iscrawl = "1.0"

Use

use iscrawl::is_crawler;

assert!(is_crawler("Googlebot/2.1 (+http://www.google.com/bot.html)"));
assert!(!is_crawler(
    "Mozilla/5.0 (X11; Linux x86_64; rv:115.0) Gecko/20100101 Firefox/115.0"
));

One function. One bool. That's it.

Why fast

Stack buffer of 512 bytes, no heap.
First-byte lookup table prunes 99% of needle scans.
Single pass over the lowered input.
lto = "fat", codegen-units = 1, panic = "abort".

Benchmarked over a corpus of ~25k real User-Agents: best case ~130 ns/call on x86_64.

How it decides

Empty input counts as crawler.
Input over 512 bytes is rejected (returns false).
If any crawler keyword (bot, crawl, spider, scanner, +http, @, archive, ...) appears: crawler.
If the UA does not start with Mozilla/ or Opera/ and has no known browser engine token (gecko, webkit, chrome, firefox, msie, edge, opera, ...): crawler.
If the UA starts with Mozilla/ or Opera/ but is missing both an engine token and the (compatible; marker: crawler.
Otherwise: browser.

Heuristic, not a database. Trades a sliver of recall for speed and zero maintenance.

Accuracy

Measured against bundled fixture corpora:

corpus	size	result
crawler_user_agents.txt	2,149	95.4% detected
loadkpi_crawlers.txt	3,696	94.7% detected
crawler_user_agents_pgts.txt	156	98.1% detected
browser_user_agents.txt	19,897	<1% false positive

Run cargo test --release to verify on your machine.

Develop

cargo build --release
cargo test --release
cargo test --doc
cargo doc --no-deps --open

Format

Standard formatting across Rust + markdown/yaml.

cargo fmt --all
cargo clippy --all-targets -- -D warnings
npx --yes prettier --write --single-quote --print-width=100 --trailing-comma=es5 --end-of-line=lf "**/*.{md,yml}"

CI enforces cargo fmt --check and cargo clippy -D warnings on every push.

Publish

Pushes to main/master trigger .github/workflows/publish.yml:

Reads name + version from Cargo.toml.
Skips if that version is already on crates.io.
Tests on Ubuntu, macOS, Windows × stable, beta.
Runs cargo publish using CARGO_REGISTRY_TOKEN.
Tags the commit vX.Y.Z.

Bump version in Cargo.toml, push to main, done.

Funding

If this saved you time, buy me a coffee.

License

Apache-2.0. See LICENSE.

iscrawl 1.0.0