s3sh - The S3 Shell
Overview
s3sh is an interactive S3 shell for exploring S3-compatible storage with Unix-like commands. Navigate S3 buckets and prefixes like directories, and seamlessly explore archive contents (tar, tar.gz, tar.bz2, zip, parquet) without downloading entire files. Supports multiple providers including AWS S3 and Source Cooperative for accessing public geospatial data.
Key Features
- Multi-Provider Support - Access AWS S3, Source Coop, and other S3-compatible storage services
- Unix-like Commands - Use familiar
ls,cd,cat,pwdcommands to navigate S3 - Pipe Support - Stream command output to external tools like
grep,jq,less,head, etc. - Archive Navigation -
cddirectly into tar/zip/parquet files and explore their contents - Parquet Exploration - Navigate parquet files like directories, view schemas and column data
- Efficient Streaming - Uses S3 range requests to access archive contents without full downloads
- Interactive Shell - Full command history and line editing via rustyline
Installation
Available via crates.io:
# Basic installation (tar, zip support)
# With parquet support (recommended for data files)
Or build from source:
# Basic build
# With parquet support
Usage
Launch the interactive shell:
Providers
s3sh supports multiple S3-compatible storage providers through a plugin system. Use the --provider flag to select a provider:
# Use AWS S3 (default)
# Use Source Coop for public geospatial data
# List available providers
AWS Provider (default)
Standard AWS S3 access with full cross-region support. Requires AWS credentials.
# or explicitly
Source Coop Provider
Access public geospatial datasets from Source Cooperative without credentials:
Available Source Coop datasets include:
- cholera - Historical cholera data
- kerner-lab - Fields of the World agricultural field boundaries
- gistemp - NASA GISS Surface Temperature Analysis
- And many more public geospatial datasets
Basic Commands
Navigate S3 like a filesystem:
# List buckets at root
# Navigate into a bucket
# List objects and prefixes
# Navigate through prefixes
# View file contents
# Show current location
Archive Navigation
Explore archives without downloading:
# Navigate into an archive
# List archive contents
# Navigate within the archive
# View files from inside archives
Parquet File Navigation
Explore parquet files as virtual directories (requires --features parquet):
# Navigate into a parquet file
# View the schema
==============
)
)
)
) ()
# Navigate to columns directory
# List available columns
# View column data (first 100 rows)
# View column statistics
=============================================
Pipe Support
Pipe command output to external Unix utilities:
# Use grep to filter listings
|
# Pipe file contents to jq for JSON formatting
|
# Count lines in a file
|
# Use head to preview large listings
|
# Pipe to less for pagination
|
# Chain multiple pipes (shell handles the pipeline)
| | |
The pipe implementation uses Unix file descriptor redirection to stream output directly to external commands, supporting any valid shell pipeline including multiple pipes, redirections, and command substitutions.
Tab Completion
Smart completion based on context:
# Tab completes bucket names
# Tab completes objects and prefixes
# cd only shows directories and navigable archives
# Works inside archives too
# cat shows all files
## Supported Archive Formats
)
)
## Authentication
### AWS Provider
)
)
)
Source Coop Provider
No authentication required. The Source Coop provider accesses public datasets anonymously.
Technical Details
Architecture
- VFS Abstraction - Unified virtual filesystem for S3 objects and archive entries
- Lazy Archive Indexing - Archives are indexed on first access and cached
- S3 Range Requests - Efficient random access to archive contents
- Async Runtime - Built on Tokio for concurrent S3 operations
Performance
- LRU Caching - Archive indexes are cached to avoid repeated S3 calls
- Streaming - Large files are streamed, not loaded into memory
- Parallel Listings - Tab completion fetches directory contents on-demand
Development
Running Regression Tests
The regression test suite validates performance and functionality against real S3 data. Configure the following environment variables:
Run the tests:
# Run all regression tests
# Run specific test categories
The tests verify:
- Performance - Archive indexing completes within expected thresholds
- Functionality - Navigation (
cd,ls,cat) works correctly in archives - Metrics - Bytes transferred and request counts are accurately tracked
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT License - see LICENSE file for details
Related Projects
- s3grep - Fast parallel grep for S3 buckets