hyperlink
Very fast link checker for static sites.
-
Supports traversing file-system paths only, no arbitrary URLs.
-
No support for the
<base>
tag. -
No support for external links. It does not know how to speak HTTP.
-
-
Does not honor
robots.txt
. A broken link is still broken for users even if not indexed by Google. -
Does not parse CSS files, as broken links in CSS have not been a practical concern for us. We are concerned about broken link in the page content, not the chrome around it.
-
Only supports UTF-8 encoded HTML files.
-
Fast. docs.sentry.io produces 1.1 GB of HTML files. All alternatives we tried were slower than
hyperlink
on this site.hyperlink
handles this amount of data in 4 seconds on a MacBook Pro 2018. -
Pay for what you need. By default,
hyperlink
checks for hard 404s in internal links only. Anything beyond that is opt-in. See Options for a list of features to enable.
Installation and Usage
Download the latest binary and:
# Check a folder of HTML
# Also validate anchors
# src/ is a folder of Markdown. Show original Markdown file paths in errors
Or as GitHub action:
- uses: untitaker/hyperlink@0.1.10
with:
args: public/ --sources src/
Or build from source by installing Rust and running
cargo build --release
.
Options
When invoked without options, hyperlink
only checks for 404s of internal
links. However, it can do more.
-
-j/--jobs
: How many threads to spawn for parsing HTML. By defaulthyperlink
will attempt to saturate your CPU. -
--check-anchors
: Opt-in, check for validity of anchors on pages. Broken anchors are considered warnings, meaning thathyperlink
willexit 2
if there are only broken anchors but no hard 404s. -
--sources
: A folder of markdown files that were the input for the HTMLhyperlink
has to check. This is used to provide better error messages that point at the actual file to edit.hyperlink
does very simple content-based matching to figure out which markdown files may have been involved in the creation of a HTML file.Why not just crawl and validate links in Markdown at this point? Answer:
-
There are countless of proprietary extensions to markdown out there for creating intra-page links that are generally not supported by link checking tools.
-
The structure of your markdown content does not necessarily match the structure of your HTML (i.e. what the user actually sees). With this setup,
hyperlink
does not have to assume anything about your build pipeline.
-
-
--github-actions
: Emit GitHub actions errors, i.e. add error messages in-line to PR diffs. This is only useful with--sources
set.If you are using
hyperlink
through the GitHub action this option is already set. It is only useful if you are downloading/building and running hyperlink yourself in CI.
Exit codes
exit 1
: There have been errors (hard 404s)exit 2
: There have been only warnings (broken anchors)
Alternatives
-
wummel/linkchecker seems to be the most feature rich out of all, but was a non-starter due to performance. This applies to other countless link checkers we tried that are not mentioned here.
-
linkcheck is faster than
linkchecker
but still quite slow on large sites.We tried
linkcheck
together withhttp-server
on localhost, although that does not seem to be the bottleneck at all. -
htmltest is one of the fastest linkcheckers we've tried (after disabling most checks to ensure feature parity with
hyperlink
), however is still slower thanhyperlink
in single-threaded mode (-j 1
) -
muffet seems to have similar performance as
htmltest
. We testedmuffet
withhttp-server
and webfsd without noticing a change in timings. -
liche seems to be closest in performance to
hyperlink
.
License
Licensed under the MIT, see ./LICENSE
.