archivr
A Tumblr backup tool.
Installing
Binary distributions are available at the GitHub Actions "Build" job.
You may also use cargo install, Nix, or compile the binary yourself.
With cargo, you can run
This repository also provides a Nix flake. You can run the command by using:
# or `nix shell` to add it to a shell
You can also add it to your Nix configuration that way.
Prerequisites
Register a new Tumblr application
In order to interact with the Tumblr API, archivr needs an OAuth consumer key and secret.
- In Tumblr, go to Settings > Apps
- Click on the "Register" link at the bottom
- Click the green "Register application" button
- Fill out the following fields:
- Application name: archivr (this doesn't really matter, this is for your reference)
- Application website: codeberg.org/ryf/archivr (again, doesn't really matter)
- Application description: Tumblr backup tool
- Administrative contact email: your email
- Default callback URL:
http://localhost:6263/callback - OAuth2 redirect URLs:
http://localhost:6263/redirect
- Click "Save changes"
Usage
This will kick off a job to back up an entire blog.
CLI flags
| Flag | Short | Description |
|---|---|---|
--consumer-key |
Tumblr OAuth consumer key | |
--consumer-secret |
Tumblr OAuth consumer secret | |
--config-file |
Path to a JSON config file with blog_name, consumer_key, and consumer_secret |
|
--output-dir |
-o |
Output directory (defaults to ./{blog_name}) |
--json |
Save posts as raw JSON instead of HTML | |
--template |
-t |
Custom Jinja template for HTML output (exclusive with --json) |
--directories |
-d |
Create a subdirectory for each post |
--save-images |
Download post images locally instead of linking to CDN | |
--before |
Only fetch posts before this date (Unix timestamp or RFC3339) | |
--after |
Only fetch posts after this date (Unix timestamp or RFC3339) | |
--resume |
Resume a previously interrupted backup | |
--quiet |
-q |
Suppress progress output |
--reauth |
Force re-authentication, ignoring saved tokens | |
--cookies-file |
Path to a Netscape/Mozilla-format cookies file for dashboard access | |
--dashboard |
Use Tumblr's internal dashboard API (requires --cookies-file) |
|
--headless |
Manual auth flow for environments without a browser (servers, containers) |
Headless / server usage
When running archivr on a remote server, in a container, or anywhere without a browser, the default OAuth flow won't work because the localhost redirect can't reach your machine. Use --headless to authenticate manually:
This will:
- Print an authorization URL
- You open that URL in a browser on any machine and authorize with Tumblr
- Tumblr redirects your browser to
http://localhost:6263/redirect?code=...— the page will fail to load, but the full URL will be visible in your browser's address bar - Copy the URL from the address bar and paste it into the terminal
- archivr extracts the authorization code and completes authentication
The resulting token is saved to disk, so subsequent runs don't need --headless again unless the token expires and can't be refreshed.
Job config file
You can specify all of the CLI arguments in a config file as well, passing --config-file <PATH> instead.
Custom templates
By default, archivr renders each post as a self-contained HTML file using a built-in template. You can override this with your own Jinja template:
Note: The
--template(-t) flag is mutually exclusive with--json. When--jsonis set, posts are saved as raw JSON and no template rendering occurs.
Templates are rendered with minijinja, which supports standard Jinja2 syntax — {{ }} for expressions, {% %} for control flow, and {# #} for comments.
Template context
Your template receives the following variables:
| Variable | Type | Description |
|---|---|---|
post |
object | The full post object (see fields below) |
is_reblog |
bool | true if the post was reblogged from another blog |
is_original |
bool | true if the post is original content |
newer_href |
string? | Relative URL to the next-newer post (for navigation links) |
Post fields
Access these as {{ post.field_name }}:
| Field | Type | Description |
|---|---|---|
id |
int | The post ID |
blog_name |
string | Name of the blog |
post_url |
string | Full URL to the post on Tumblr |
post_type |
string | Post type (e.g. "text", "photo") |
original_type |
string | Original post type before conversion |
timestamp |
int | Unix timestamp |
date |
string | Human-readable date |
content |
list | Content blocks (see below) |
trail |
list | Reblog trail items |
tags |
list | List of tag strings |
summary |
string | Post summary text |
note_count |
int | Number of notes |
slug |
string | URL slug |
short_url |
string | Short URL |
reblog_key |
string | Reblog key |
state |
string | Post state (e.g. "published") |
reblogged_from_name |
string? | Blog name this was reblogged from |
reblogged_from_url |
string? | URL of the blog this was reblogged from |
reblogged_root_name |
string? | Original post's blog name |
reblogged_root_url |
string? | Original post's blog URL |
liked |
bool | Whether you liked the post |
followed |
bool | Whether you follow the blog |
Content blocks
Each item in post.content (and in each trail item's content) is an object with a type field. The possible types and their fields are:
text — A text block.
text(string) — The text content (may contain HTML).subtype(string?) — Style hint:"heading1","heading2","quote","indented","chat", etc.
image — An image block.
media(list) — Each entry hasurl,width,height, andmedia_type.alt_text(string?) — Alt text for the image.caption(string?) — Image caption.
video — A video block.
media(list?) — Each entry hasurl,width,height, andmedia_type.url(string?) — External video URL (when no direct media).provider(string?) — Video provider name (e.g."youtube").embed_html(string?) — Embeddable HTML from the provider.duration(number?) — Duration in seconds.
audio — An audio block.
media(list?) — Each entry hasurlandmedia_type.url(string?) — External audio URL.provider(string?) — Audio provider name.title(string?) — Track title.artist(string?) — Artist name.album(string?) — Album name.embed_html(string?) — Embeddable HTML.
link — A link block.
url(string) — The link URL.title(string?) — Link title.description(string?) — Link description.
paywall — A paywall/premium content marker.
text(string?) — Display text (defaults to "Premium content").
Trail items
Each item in post.trail has:
| Field | Type | Description |
|---|---|---|
content |
list | Content blocks (same types as above) |
blog |
object? | Blog info with name, url, and uuid |
post |
object? | Post info with id |
is_root_item |
bool | Whether this is the root trail item |
The render_block() function
Templates have access to a built-in render_block(block) function that converts a content block into the default HTML representation. This lets you customize the overall page layout while reusing the default rendering for individual blocks:
{# Loop through content blocks, using the built-in renderer for each one #}
{% for block in post.content %}
{{ render_block(block) }}
{% endfor %}
You can also selectively override rendering for specific block types:
{% for block in post.content %}
{% if block.type == "image" %}
{# Custom image rendering #}
{% for m in block.media %}
<img src="{{ m.url }}" alt="{{ block.alt_text }}" loading="lazy">
{% endfor %}
{% else %}
{{ render_block(block) }}
{% endif %}
{% endfor %}
Example: minimal custom template
{{ post.blog_name }} - {{ post.id }}
{{ post.blog_name }}
{{ post.date }} · {{ post.note_count }} notes
{% if is_reblog %}
Reblogged from {{ post.reblogged_from_name }}
{% endif %}
{% for item in post.trail %}
{% if item.blog %}{{ item.blog.name }}:{% endif %}
{% for block in item.content %}
{{ render_block(block) }}
{% endfor %}
{% endfor %}
{% for block in post.content %}
{{ render_block(block) }}
{% endfor %}
{% for tag in post.tags %}
#{{ tag }}
{% endfor %}
Planned features
The following features are not yet implemented but are planned for future releases:
- Incremental backups (only fetch posts newer than the last run)
- Video and audio downloading (
--save-video,--save-audio) - Liked posts backup (
--likes) - Tag filtering (
--include-tags) - Notes backup (
--save-notes) - Index page generation (
--index-file) - Automatic rate limit retry with backoff