Expand description
§cloud-file
Simple reading of cloud files in Rust
§Highlights
- HTTP, AWS S3, Azure, Google, or local
- Sequential or random access
- Simplifies use of the powerful
object_storecrate, focusing on a useful subset of its features - Access files via URLs and string-based options
- Read binary or text
- Fully async
- Used by genomics crate BedReader, which is used by other Rust and Python projects
- Also see Nine Rules for Accessing Cloud Files from Your Rust Code Practical Lessons from Upgrading Bed-Reader, a Bioinformatics Library in Towards Data Science.
§Install
cargo add cloud-file§Examples
Find the size of a cloud file.
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new(url)?;
let file_size = cloud_file.read_file_size().await?;
assert_eq!(file_size, 14_361);Find the number of lines in a cloud file.
use cloud_file::CloudFile;
use futures::StreamExt; // Enables `.next()` on streams.
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new_with_options(url, [("timeout", "30s")])?;
let mut chunks = cloud_file.stream_chunks().await?;
let mut newline_count: usize = 0;
while let Some(chunk) = chunks.next().await {
let chunk = chunk?;
newline_count += bytecount::count(&chunk, b'\n');
}
assert_eq!(newline_count, 500);§More examples
| Example | Demonstrates |
|---|---|
line_count | Read a file as binary chunks. |
nth_line | Read a file as text lines. |
bigram_counts | Read random regions of a file, without regard to order. |
aws_file_size | Find the size of a file on AWS. |
§Project Links
§Main Functions
| Function | Description |
|---|---|
CloudFile::new | Use a URL string to specify a cloud file for reading. |
CloudFile::new_with_options | Use a URL string and string options to specify a cloud file for reading. |
§URLs
| Cloud Service | Example |
|---|---|
| HTTP | https://www.gutenberg.org/cache/epub/100/pg100.txt |
| local file | file:///M:/data%20files/small.bed |
| AWS S3 | s3://bedreader/v1/toydata.5chrom.bed |
Note: For local files, use the abs_path_to_url_string function to properly encode into a URL.
§Options
| Cloud Service | Details | Example |
|---|---|---|
| HTTP | ClientConfigKey | [("timeout", "30s")] |
| local file | none | |
| AWS S3 | AmazonS3ConfigKey | [("aws_region", "us-west-2"), ("aws_access_key_id", …), ("aws_secret_access_key", …)] |
| Azure | AzureConfigKey | |
GoogleConfigKey |
§High-Level CloudFile Methods
| Method | Retrieves |
|---|---|
stream_chunks | File contents as a stream of Bytes |
stream_line_chunks | File contents as a stream of Bytes, each containing one or more whole lines |
read_all | Whole file contents as an in-memory Bytes |
read_range | Bytes from a specified range |
read_ranges | Vec of Bytes from specified ranges |
read_range_and_file_size | Bytes from a specified range & the file’s size |
read_file_size | Size of the file |
count_lines | Number of lines in the file |
Additional methods:
| Method | Description |
|---|---|
clone | Clone the CloudFile instance. Efficient by design. |
set_extension | Change the CloudFile’s file extension (in place). |
§Low-Level CloudFile Methods
| Method | Description |
|---|---|
get | Call the object_store crate’s get method. |
get_opts | Call the object_store crate’s get_opts method. |
§Lowest-Level CloudFile Methods
You can call any method from the object_store crate. For example, here we
use head to get the metadata for a file and the last_modified time.
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new(url)?;
let meta = cloud_file.cloud_service.head(&cloud_file.store_path).await?;
let last_modified = meta.last_modified;
println!("last_modified: {}", last_modified);
assert_eq!(meta.size, 303);Re-exports§
pub use object_store::path::Path as StorePath;
Structs§
- Cloud
File - The main struct representing the location of a file in the cloud.
- DynObject
Store - Wraps
Box<dyn ObjectStore>for easier usage. AnObjectStore, from the powerfulobject_storecrate, represents a cloud service.
Enums§
- Cloud
File Error - The error type for
CloudFilemethods.
Constants§
- EMPTY_
OPTIONS - An empty set of cloud options
Functions§
- abs_
path_ to_ url_ string - Given a local file’s absolute path, return a URL string to that file.