Butterfly-dl ๐ฆ
A high-performance, memory-efficient OpenStreetMap data downloader with intelligent source routing, resilient networking, and beautiful progress display.
Features
- ๐ Optimized for Large Files: <1GB RAM usage regardless of file size (including 81GB planet.osm.pbf)
- ๐จ Enhanced Progress Display: Beautiful tqdm-style progress bars with smooth Unicode blocks
- ๐ก๏ธ Network Resilience: Intelligent retry with exponential backoff and smart resume from interruption points
- ๐ File Safety: Comprehensive overwrite protection with prompts and CLI flags
- ๐ง Smart Source Routing: HTTP with parallel downloads optimized by file size
- ๐ Semantic Error Intelligence: Advanced fuzzy matching that understands semantic intent and geographic relationships
- ๐ Dynamic Source Loading: Automatically fetches latest available regions from Geofabrik
- ๐ก HTTP Protocol: Advanced HTTP with range requests and connection pooling
- ๐ง Streaming Support: Direct stdout streaming for pipeline integration
- โก Performance Optimized: Auto-tuning connections, Direct I/O for large files
- ๐ง Curl-like Interface: Simple positional arguments, stderr logging
Installation
Pre-built Binaries (Recommended)
Download the latest release for your platform:
# Linux (x86_64)
# macOS (Intel)
# Windows
# Download butterfly-dl-v2.0.0-x86_64-windows.zip from releases page
Package Managers
Debian/Ubuntu
Cargo (Rust)
Build from Source
Workspace Structure
This repository is organized as a Rust workspace containing multiple OSM tools:
- butterfly-common: Shared utilities and error handling
- tools/butterfly-dl: The main downloader tool
- Future tools: butterfly-shrink, butterfly-extract, butterfly-serve
Building Individual Tools
# Build all tools
# Build specific tool
# Test specific tool
Usage
Basic Examples
# Download planet file from HTTP (81GB)
# Download continent from HTTP
# Download country/region from HTTP
# Stream to stdout for processing
|
# Save to custom file name
# Verbose output with source info
Enhanced Features
๐จ Beautiful Progress Display
# Smooth tqdm-style progress bars with comprehensive information
# 75%|โโโโโโโโโโโโโ | 450MB/600MB [01:30<00:30, 25.2MB/s]
๐ก๏ธ Network Resilience & Recovery
# Automatic retry with smart resume - no lost progress
# โ ๏ธ Network error (attempt 1): operation timed out. Retrying in 1000ms...
# โ ๏ธ Stream interrupted at 300MB, resuming...
# โ
Download completed!
๐ File Overwrite Protection
# Interactive prompts for existing files
# โ ๏ธ File already exists: belgium-latest.osm.pbf
# Overwrite? [y/N]: n
# โ Download cancelled
# Force overwrite without prompting
# โ ๏ธ Overwriting existing file: belgium-latest.osm.pbf
# Never overwrite, fail if file exists
# Error: File already exists: belgium-latest.osm.pbf (use --force to overwrite)
Source Resolution
| Input | Source | Description |
|---|---|---|
planet |
HTTP | Planet file from https://planet.openstreetmap.org/pbf/planet-latest.osm.pbf |
europe |
HTTP | Continent from https://download.geofabrik.de/europe-latest.osm.pbf |
europe/belgium |
HTTP | Country from https://download.geofabrik.de/europe/belgium-latest.osm.pbf |
Intelligent Error Handling
butterfly-dl includes smart error correction with fuzzy matching:
# Semantic intent recognition
# Error: Source 'austrailia' not found. Did you mean 'australia-oceania'?
# Typo correction
# Error: Source 'antartica' not found. Did you mean 'antarctica'?
# Geographic accuracy
# Error: Source 'antartica/belgium' not found. Did you mean 'europe/belgium'?
# Standalone country recognition
# Error: Source 'luxembourg' not found. Did you mean 'europe/luxembourg'?
# Smart continent suggestions
# Error: Source 'plant' not found. Did you mean 'planet'?
Features:
- Semantic Intelligence: Hybrid fuzzy matching that understands semantic intent, not just character distance
- Dynamic Source Discovery: Automatically fetches available regions from Geofabrik JSON API
- Contextual Scoring: Prioritizes meaningful matches like "australia-oceania" over "austria" for "austrailia"
- Geographic Intelligence: Knows Belgium belongs to Europe, not Antarctica
- Fallback Protection: Works offline with comprehensive fallback region list
Output Options
- No output argument: Auto-generated filename (e.g.,
belgium-latest.osm.pbf) - Filename: Save to specified file
-: Stream to stdout (logs go to stderr)
Performance Features
Memory Efficiency
- Fixed 64KB buffers: Memory usage independent of file size
- Ring buffer ordering: Small memory footprint for parallel downloads
- Direct I/O: Bypasses OS page cache for files >1GB (Unix systems)
- Streaming writes: No intermediate accumulation
Download Optimization
- HTTP: Single optimized stream for maximum network utilization
- HTTP: Auto-tuned parallel range requests (2-16 connections based on file size)
- Fallback: Graceful degradation for servers without range support
- Progress tracking: Real-time progress bars to stderr
Intelligent Defaults
- Connection scaling: Based on file size and CPU count
- Protocol selection: Optimal source for each data type
- Error handling: Robust retry and fallback mechanisms
Technical Architecture
Memory Usage Breakdown
Connection buffers: 16 ร 64KB = 1MB
Ring buffer: 64MB (max)
HTTP client overhead: ~50MB
Runtime: ~50MB
Total: ~215MB (well under 1GB limit)
Direct I/O Support
Automatically enabled for files >1GB on Unix systems:
- Bypasses OS page cache
- Reduces memory pressure
- Optimizes large sequential writes
- Falls back gracefully if not available
CLI Reference
Downloads single OpenStreetMap files efficiently:
butterfly-dl planet # Download planet file (81GB) from HTTP
butterfly-dl europe # Download Europe continent from HTTP
butterfly-dl europe/belgium # Download Belgium from HTTP
butterfly-dl europe/monaco - # Stream Monaco to stdout
Usage: butterfly-dl [OPTIONS] <SOURCE> [OUTPUT]
Arguments:
<SOURCE> Source to download: "planet" (HTTP), "europe" (continent), or "europe/belgium" (country/region)
[OUTPUT] Output file path, or "-" for stdout
Options:
--dry-run Show what would be downloaded without downloading
-v, --verbose Enable verbose logging
-h, --help Print help
-V, --version Print version
Examples
Planet Download (81GB)
# Download planet file (uses HTTP, single stream, Direct I/O)
# Stream planet to compressed archive
|
Regional Downloads
# Download all of Europe (parallel HTTP ranges)
# Download specific country
# Download to custom location
Pipeline Integration
# Stream and process immediately
|
# Compress on the fly
|
# Chain with other tools
|
Development
Building
Testing
# Run all tests
# Run with verbose output
Performance Testing
# Test with small file
# Test streaming
|
Version Management
The project uses a centralized version management system to maintain consistency across all components:
๐ Single Source of Truth:
VERSIONfile contains the current version number (e.g.,1.0.0)- All other files automatically read from this central location
๐ง Automatic Version Propagation:
- CLI tool: Uses
env!("BUTTERFLY_VERSION")from build script - HTTP User-Agent: Dynamically includes version in requests
- Library exports: Version available via build-time environment
- C bindings: pkg-config file includes correct version
- Documentation: Version stays in sync automatically
๐ Build Integration:
build.rsreadsVERSIONfile and sets environment variables- Any change to
VERSIONtriggers automatic rebuild - Build system tracks version file as dependency
๐ Updating Version:
# Update version for new release
# Rebuild automatically picks up new version
# All components now use 1.1.0
Note: Cargo.toml version must still be updated manually due to Cargo limitations.
Architecture
- Rust + Tokio: Async/await for concurrent downloads
- HTTP Client: Advanced reqwest client with connection pooling
- Reqwest: HTTP client with connection pooling and range requests
- Indicatif: Progress bars to stderr
- Ring buffer: Maintains chunk ordering with minimal memory
Performance Benchmarks
butterfly-dl includes comprehensive benchmarking against industry-standard tools to validate performance claims:
Benchmark Suite
# Run benchmarks against curl and aria2
Sample Results
All benchmarks conducted on actual hardware over real network conditions
Small Files (Monaco ~0.6MB)
Tool Duration(s) Speed(MB/s) Memory Status
----------------------------------------------------------
curl 0.459 1.32 ~10MB โ
Success
butterfly-dl 0.612 0.99 ~215MB โ
Success
aria2 0.643 0.94 ~50MB โ
Success
For very small files, curl's lightweight design provides startup advantages
Medium Files (Luxembourg ~43MB)
Tool Duration(s) Speed(MB/s) Memory Status
----------------------------------------------------------
butterfly-dl 3.037 14.07 ~215MB โ
Success
aria2 5.447 7.84 ~120MB โ
Success
curl 9.349 4.57 ~10MB โ
Success
butterfly-dl excels with 3x faster than curl and 79% faster than aria2
Key Performance Insights
- ๐ฏ Sweet Spot: Medium to large files (>10MB) where butterfly-dl delivers 79-200% speed improvements
- ๐ Memory Consistency: Fixed ~215MB usage regardless of file size (vs aria2's scaling memory)
- โก Speed Scaling: 14.07 MB/s on 43MB files vs aria2's 7.84 MB/s and curl's 4.57 MB/s
- ๐ง Smart Strategy: Automatically uses single connection for small files, optimized parallel connections for larger files
- ๐ Performance Leader: On medium files, butterfly-dl consistently outperforms both aria2 and curl significantly
Benchmark Features
- ๐ค Automatic Tool Detection - Only tests available tools (curl, aria2, butterfly-dl)
- ๐ Comprehensive Metrics - Duration, speed, memory usage, file integrity validation
- ๐ MD5 Verification - Ensures all tools download identical, uncorrupted files
- ๐งน Clean Testing - Automatic cleanup of temporary benchmark files
- ๐ Fair Comparison - Same network conditions, same target files, same validation
Running Your Own Benchmarks
# Clone and build
# Test with any supported region
# Examples covering different file sizes
Comparison with Alternatives
| Tool | Memory Usage | Parallel Downloads | HTTP Features | Streaming | Speed (43MB file) |
|---|---|---|---|---|---|
butterfly-dl |
~215MB | Yes (Smart) | Advanced | Yes | 14.07 MB/s |
aria2c |
~50-500MB+ | Yes | Basic | Limited | 7.84 MB/s |
curl |
~10MB | No | Basic | Yes | 4.57 MB/s |
wget |
~10MB | No | Basic | No | ~4 MB/s |
License
MIT License - see LICENSE file for details.
Contributing
This project follows XP pair programming with human + AI collaboration. See CLAUDE.md for development guidelines.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Who
Butterfly Project built by Pierre pierre@warnier.net for the broader OpenStreetMap community.
Butterfly-dl: The optimal tool for downloading large OpenStreetMap datasets efficiently.