Crate cloud_file

source ·
Expand description

§cloud-file

github crates.io docs.rs build status

Simple reading of cloud files in Rust

§Highlights

§Install

cargo add cloud-file

§Examples

Find the size of a cloud file.

use cloud_file::CloudFile;

let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new(url)?;
let file_size = cloud_file.read_file_size().await?;
assert_eq!(file_size, 14_361);

Find the number of lines in a cloud file.

use cloud_file::CloudFile;
use futures::StreamExt; // Enables `.next()` on streams.

let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new_with_options(url, [("timeout", "30s")])?;
let mut chunks = cloud_file.stream_chunks().await?;
let mut newline_count: usize = 0;
while let Some(chunk) = chunks.next().await {
    let chunk = chunk?;
    newline_count += bytecount::count(&chunk, b'\n');
}
assert_eq!(newline_count, 500);

§More examples

ExampleDemonstrates
line_countRead a file as binary chunks.
nth_lineRead a file as text lines.
bigram_countsRead random regions of a file, without regard to order.
aws_file_sizeFind the size of a file on AWS.

§Main Functions

FunctionDescription
CloudFile::newUse a URL string to specify a cloud file for reading.
CloudFile::new_with_optionsUse a URL string and string options to specify a cloud file for reading.

§URLs

Cloud ServiceExample
HTTPhttps://www.gutenberg.org/cache/epub/100/pg100.txt
local filefile:///M:/data%20files/small.bed
AWS S3s3://bedreader/v1/toydata.5chrom.bed

Note: For local files, use the abs_path_to_url_string function to properly encode into a URL.

§Options

Cloud ServiceDetailsExample
HTTPClientConfigKey[("timeout", "30s")]
local filenone
AWS S3AmazonS3ConfigKey[("aws_region", "us-west-2"), ("aws_access_key_id",), ("aws_secret_access_key",)]
AzureAzureConfigKey
GoogleGoogleConfigKey

§High-Level CloudFile Methods

MethodRetrieves
stream_chunksFile contents as a stream of Bytes
stream_line_chunksFile contents as a stream of Bytes, each containing one or more whole lines
read_allWhole file contents as an in-memory Bytes
read_rangeBytes from a specified range
read_rangesVec of Bytes from specified ranges
read_range_and_file_sizeBytes from a specified range & the file’s size
read_file_sizeSize of the file
count_linesNumber of lines in the file

Additional methods:

MethodDescription
cloneClone the CloudFile instance. Efficient by design.
set_extensionChange the CloudFile’s file extension (in place).

§Low-Level CloudFile Methods

MethodDescription
getCall the object_store crate’s get method.
get_optsCall the object_store crate’s get_opts method.

§Lowest-Level CloudFile Methods

You can call any method from the object_store crate. For example, here we use head to get the metadata for a file and the last_modified time.

use cloud_file::CloudFile;

let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new(url)?;
let meta = cloud_file.cloud_service.head(&cloud_file.store_path).await?;
let last_modified = meta.last_modified;
println!("last_modified: {}", last_modified);
assert_eq!(meta.size, 303);

Re-exports§

  • pub use object_store::path::Path as StorePath;

Structs§

Enums§

Constants§

Functions§