Expand description
§cloud-file
Simple reading of cloud files in Rust
§Highlights
- HTTP, AWS S3, Azure, Google, or local
- Sequential or random access
- Simplifies use of the powerful
object_store
crate, focusing on a useful subset of its features - Access files via URLs and string-based options
- Read binary or text
- Fully async
- Used by genomics crate BedReader, which is used by other Rust and Python projects
- Also see Nine Rules for Accessing Cloud Files from Your Rust Code Practical Lessons from Upgrading Bed-Reader, a Bioinformatics Library in Towards Data Science.
§Install
cargo add cloud-file
§Examples
Find the size of a cloud file.
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new(url)?;
let file_size = cloud_file.read_file_size().await?;
assert_eq!(file_size, 14_361);
Find the number of lines in a cloud file.
use cloud_file::CloudFile;
use futures::StreamExt; // Enables `.next()` on streams.
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new_with_options(url, [("timeout", "30s")])?;
let mut chunks = cloud_file.stream_chunks().await?;
let mut newline_count: usize = 0;
while let Some(chunk) = chunks.next().await {
let chunk = chunk?;
newline_count += bytecount::count(&chunk, b'\n');
}
assert_eq!(newline_count, 500);
§More examples
Example | Demonstrates |
---|---|
line_count | Read a file as binary chunks. |
nth_line | Read a file as text lines. |
bigram_counts | Read random regions of a file, without regard to order. |
aws_file_size | Find the size of a file on AWS. |
§Project Links
§Main Functions
Function | Description |
---|---|
CloudFile::new | Use a URL string to specify a cloud file for reading. |
CloudFile::new_with_options | Use a URL string and string options to specify a cloud file for reading. |
§URLs
Cloud Service | Example |
---|---|
HTTP | https://www.gutenberg.org/cache/epub/100/pg100.txt |
local file | file:///M:/data%20files/small.bed |
AWS S3 | s3://bedreader/v1/toydata.5chrom.bed |
Note: For local files, use the abs_path_to_url_string
function to properly encode into a URL.
§Options
Cloud Service | Details | Example |
---|---|---|
HTTP | ClientConfigKey | [("timeout", "30s")] |
local file | none | |
AWS S3 | AmazonS3ConfigKey | [("aws_region", "us-west-2"), ("aws_access_key_id", …), ("aws_secret_access_key", …)] |
Azure | AzureConfigKey | |
GoogleConfigKey |
§High-Level CloudFile
Methods
Method | Retrieves |
---|---|
stream_chunks | File contents as a stream of Bytes |
stream_line_chunks | File contents as a stream of Bytes , each containing one or more whole lines |
read_all | Whole file contents as an in-memory Bytes |
read_range | Bytes from a specified range |
read_ranges | Vec of Bytes from specified ranges |
read_range_and_file_size | Bytes from a specified range & the file’s size |
read_file_size | Size of the file |
count_lines | Number of lines in the file |
Additional methods:
Method | Description |
---|---|
clone | Clone the CloudFile instance. Efficient by design. |
set_extension | Change the CloudFile ’s file extension (in place). |
§Low-Level CloudFile
Methods
Method | Description |
---|---|
get | Call the object_store crate’s get method. |
get_opts | Call the object_store crate’s get_opts method. |
§Lowest-Level CloudFile
Methods
You can call any method from the object_store
crate. For example, here we
use head
to get the metadata for a file and the last_modified time.
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new(url)?;
let meta = cloud_file.cloud_service.head(&cloud_file.store_path).await?;
let last_modified = meta.last_modified;
println!("last_modified: {}", last_modified);
assert_eq!(meta.size, 303);
Re-exports§
pub use object_store::path::Path as StorePath;
Structs§
- Cloud
File - The main struct representing the location of a file in the cloud.
- DynObject
Store - Wraps
Box<dyn ObjectStore>
for easier usage. AnObjectStore
, from the powerfulobject_store
crate, represents a cloud service.
Enums§
- Cloud
File Error - The error type for
CloudFile
methods.
Constants§
- EMPTY_
OPTIONS - An empty set of cloud options
Functions§
- abs_
path_ to_ url_ string - Given a local file’s absolute path, return a URL string to that file.