Data.gov Rust Client
A comprehensive Rust client library and interactive REPL for exploring and downloading data from data.gov, the U.S. government's open data portal.
Features
- π Search & Discovery: Search for datasets with advanced filtering
- π¦ Dataset Management: Get detailed information about datasets and resources
- β¬οΈ File Downloads: Download resources with progress tracking and concurrent downloads
- ποΈ Organization Browsing: Explore government agencies and their data
- π₯οΈ Interactive REPL: Command-line interface for exploring data.gov
- π Async/Await: Built on modern async Rust for high performance
- β‘ Progress Tracking: Visual progress bars for downloads
- π‘οΈ Error Handling: Comprehensive error types and handling
Installation
As a Library
Add this to your Cargo.toml:
[]
= { = "../data-gov" } # Will be published to crates.io
As a CLI Tool
Install the data-gov command-line tool:
# From source (in this repository)
# Once published to crates.io
After installation, the data-gov command will be available in your PATH.
Quick Start
Library Usage
use DataGovClient;
async
Interactive REPL
Run the interactive REPL for exploring data.gov:
# Basic usage (interactive mode, downloads to ~/Downloads/<dataset-name>/)
# With custom base directory (files go to ./my-downloads/<dataset-name>/)
# With API key for higher rate limits
CLI Mode
Execute commands directly without entering interactive mode:
# Search for datasets
# Show dataset details
# Download specific resource
# Download all resources from a dataset
# List government organizations
# Show client information
# Get help
Download Directory Behavior
The tool automatically organizes downloads by dataset:
- Interactive REPL mode: Downloads go to
~/Downloads/<dataset-name>/by default - CLI mode: Downloads go to
./<dataset-name>/(current directory) by default - Custom directory: When using
--download-dir, files go to<custom-dir>/<dataset-name>/
This ensures that files from different datasets are kept organized and don't overwrite each other.
Commands
Both interactive REPL and CLI modes support these commands:
| Command | Description | Example |
|---|---|---|
search <query> [limit] |
Search for datasets | search climate data 20 |
show <dataset_id> |
Show detailed dataset info | show consumer-complaint-database |
download <dataset_id> [index] |
Download dataset resources | download my-dataset 0 |
list organizations |
List government agencies | list orgs |
setdir <path> |
Set download directory | setdir ./downloads |
info |
Show session information | info |
help |
Show help message | help |
quit |
Exit the REPL | quit |
Interactive REPL Examples
)
) ) )
)
Scripting with Shebang
The interactive REPL can be used in shebang scripts to create automated data.gov workflows. The tool automatically ignores comment lines (starting with #) and processes commands from stdin.
Creating a Script
Create an executable script with a shebang line:
#!/usr/bin/env data-gov
# Automated climate data download
# This script searches for and downloads EPA climate data
# Search for climate datasets
# Show details of a specific dataset
# Download the first resource
# Show final info
Running Scripts
Make the script executable and run it:
Script Features
- Comments: Lines starting with
#are ignored (including shebang) - Automation: No interactive prompts - runs commands sequentially
- REPL Mode: Uses interactive mode defaults (downloads to
~/Downloads/<dataset>/) - Error Handling: Continues execution even if individual commands fail
- Clean Output: Same colorized output as interactive mode
Example Scripts
See the examples/ directory for sample scripts:
climate-search.sh- Search and explore climate datasetsdownload-epa-climate.sh- Download EPA climate datalist-orgs.sh- List all government organizationsauto-download.sh- Automated dataset download
CLI Examples
# Search for climate datasets (limit to 5 results)
# Show detailed information about a specific dataset
# Quick organization listing
|
# Download a dataset (creates ./consumer-complaint-database/ directory)
Configuration
Basic Configuration
use ;
let config = new
.with_download_dir
.with_api_key
.with_user_agent
.with_max_concurrent_downloads
.with_progress;
let client = with_config?;
Available Configuration Options
- Base Download Directory: Base directory for downloads (defaults to system Downloads directory in REPL mode, current directory in CLI mode)
- API Key: For higher rate limits (optional)
- User Agent: Custom user agent string
- Max Concurrent Downloads: Number of simultaneous downloads
- Progress Bars: Enable/disable download progress display
- Download Timeout: Timeout for individual downloads
API Reference
DataGovClient
The main client for interacting with data.gov:
Search Methods
search(query, limit, offset, organization, format)- Search for datasetsget_dataset(dataset_id)- Get detailed dataset informationautocomplete_datasets(partial, limit)- Get dataset name suggestionsautocomplete_organizations(partial, limit)- Get organization suggestions
Organization Methods
list_organizations(limit)- List government organizations
Resource Methods
get_downloadable_resources(package)- Find downloadable files in a datasetdownload_resource(resource, output_path)- Download a single resourcedownload_resources(resources, output_dir)- Download multiple resources concurrently
Utility Methods
validate_download_dir()- Check if download directory is writabledownload_dir()- Get current download directoryckan_client()- Access underlying CKAN client
Error Handling
The library uses a comprehensive error type:
use ;
match client.search.await
Architecture
This crate is built on top of the data-gov-ckan crate, which provides low-level CKAN API access. The data-gov crate adds:
- Higher-level abstractions for common workflows
- File download capabilities with progress tracking
- Concurrent download management
- Interactive REPL for exploration
- Rich error handling and validation
- Configuration management
Examples
See the examples/ directory for more examples:
demo.rs- Basic API usage demonstration
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Related Projects
data-gov-ckan- Low-level CKAN API client- CKAN - The open source data management system powering data.gov
Acknowledgments
- Built for the U.S. government's data.gov platform
- Uses the CKAN API for data access
- Inspired by the need for better programmatic access to government data
- CLI design inspired by modern tools like
kubectl,gh, andaws