pub struct CloudFile {
pub cloud_service: Arc<DynObjectStore>,
pub store_path: StorePath,
}
Expand description
The main struct representing the location of a file in the cloud.
It is constructed with CloudFile::new
. It is, by design, cheap to clone.
Internally, it stores two pieces of information: the file’s cloud service and the path to the file on that service.
§Examples
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new(url)?;
assert_eq!(cloud_file.read_file_size().await?, 303);
Fields§
§cloud_service: Arc<DynObjectStore>
A cloud service, for example, Http, AWS S3, Azure, the local file system, etc.
Under the covers, it is an Arc
-wrapped DynObjectStore
.
The DynObjectStore
, in turn, holds an ObjectStore
from the
powerful object_store
crate.
store_path: StorePath
A path to a file on the cloud service.
Under the covers, StorePath
is an alias for a Path
in the object_store
crate.
Implementations§
Source§impl CloudFile
impl CloudFile
Sourcepub fn from_structs(store: impl ObjectStore, store_path: StorePath) -> Self
pub fn from_structs(store: impl ObjectStore, store_path: StorePath) -> Self
Create a new CloudFile
from an ObjectStore
and a object_store::path::Path
.
§Example
use cloud_file::CloudFile;
use object_store::{http::HttpBuilder, path::Path as StorePath, ClientOptions};
use std::time::Duration;
let client_options = ClientOptions::new().with_timeout(Duration::from_secs(30));
let http = HttpBuilder::new()
.with_url("https://raw.githubusercontent.com")
.with_client_options(client_options)
.build()?;
let store_path = StorePath::parse("fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed")?;
let cloud_file = CloudFile::from_structs(http, store_path);
assert_eq!(cloud_file.read_file_size().await?, 303);
Sourcepub fn new_with_options<I, K, V>(
location: impl AsRef<str>,
options: I,
) -> Result<CloudFile, CloudFileError>
pub fn new_with_options<I, K, V>( location: impl AsRef<str>, options: I, ) -> Result<CloudFile, CloudFileError>
Create a new CloudFile
from a URL string and options.
§Example
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new_with_options(url, [("timeout", "30s")])?;
assert_eq!(cloud_file.read_file_size().await?, 303);
Sourcepub async fn count_lines(&self) -> Result<usize, CloudFileError>
pub async fn count_lines(&self) -> Result<usize, CloudFileError>
Count the lines in a file stored in the cloud.
§Example
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.fam";
let cloud_file = CloudFile::new(url)?;
assert_eq!(cloud_file.count_lines().await?, 10);
Sourcepub async fn read_file_size(&self) -> Result<usize, CloudFileError>
pub async fn read_file_size(&self) -> Result<usize, CloudFileError>
Return the size of a file stored in the cloud.
§Example
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new(url)?;
assert_eq!(cloud_file.read_file_size().await?, 303);
Sourcepub async fn read_range(
&self,
range: Range<usize>,
) -> Result<Bytes, CloudFileError>
pub async fn read_range( &self, range: Range<usize>, ) -> Result<Bytes, CloudFileError>
Return the Bytes
from a specified range.
§Example
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bim";
let cloud_file = CloudFile::new(url)?;
let bytes = cloud_file.read_range((0..10)).await?;
assert_eq!(bytes.as_ref(), b"1\t1:1:A:C\t");
Sourcepub async fn read_ranges(
&self,
ranges: &[Range<usize>],
) -> Result<Vec<Bytes>, CloudFileError>
pub async fn read_ranges( &self, ranges: &[Range<usize>], ) -> Result<Vec<Bytes>, CloudFileError>
Return the Vec
of Bytes
from specified ranges.
§Example
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bim";
let cloud_file = CloudFile::new(url)?;
let bytes_vec = cloud_file.read_ranges(&[0..10, 1000..1010]).await?;
assert_eq!(bytes_vec.len(), 2);
assert_eq!(bytes_vec[0].as_ref(), b"1\t1:1:A:C\t");
assert_eq!(bytes_vec[1].as_ref(), b":A:C\t0.0\t4");
Sourcepub async fn get_opts(
&self,
get_options: GetOptions,
) -> Result<GetResult, CloudFileError>
pub async fn get_opts( &self, get_options: GetOptions, ) -> Result<GetResult, CloudFileError>
Call the object_store
crate’s get_opts
method.
You can, for example, in one call retrieve a range of bytes from the file and the file’s metadata. The
result is a GetResult
.
§Example
In one call, read the first three bytes of a genomic data file and get the size of the file. Check that the file starts with the expected file signature.
use cloud_file::CloudFile;
use object_store::{GetRange, GetOptions};
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new(url)?;
let get_options = GetOptions {
range: Some(GetRange::Bounded(0..3)),
..Default::default()
};
let get_result = cloud_file.get_opts(get_options).await?;
let size: usize = get_result.meta.size;
let bytes = get_result
.bytes()
.await?;
assert_eq!(bytes.len(), 3);
assert_eq!(bytes[0], 0x6c);
assert_eq!(bytes[1], 0x1b);
assert_eq!(bytes[2], 0x01);
assert_eq!(size, 303);
Sourcepub async fn read_range_and_file_size(
&self,
range: Range<usize>,
) -> Result<(Bytes, usize), CloudFileError>
pub async fn read_range_and_file_size( &self, range: Range<usize>, ) -> Result<(Bytes, usize), CloudFileError>
Retrieve the Bytes
from a specified range & the file’s size.
§Example
In one call, read the first three bytes of a genomic data file and get the size of the file. Check that the file starts with the expected file signature.
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let cloud_file = CloudFile::new(url)?;
let (bytes, size) = cloud_file.read_range_and_file_size(0..3).await?;
assert_eq!(bytes.len(), 3);
assert_eq!(bytes[0], 0x6c);
assert_eq!(bytes[1], 0x1b);
assert_eq!(bytes[2], 0x01);
assert_eq!(size, 303);
Sourcepub async fn get(&self) -> Result<GetResult, CloudFileError>
pub async fn get(&self) -> Result<GetResult, CloudFileError>
Call the object_store
crate’s get
method.
The result is a GetResult
which can,
for example, be converted into a stream of bytes.
§Example
Do a ‘get’, turn result into a stream, then scan all the bytes of the file for the newline character.
use cloud_file::CloudFile;
use futures_util::StreamExt; // Enables `.next()` on streams.
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new_with_options(url, [("timeout", "30s")])?;
let mut stream = cloud_file.get().await?.into_stream();
let mut newline_count: usize = 0;
while let Some(bytes) = stream.next().await {
let bytes = bytes?;
newline_count += bytecount::count(&bytes, b'\n');
}
assert_eq!(newline_count, 500);
Sourcepub async fn read_all(&self) -> Result<Bytes, CloudFileError>
pub async fn read_all(&self) -> Result<Bytes, CloudFileError>
Read the whole file into an in-memory Bytes
.
§Example
Read the whole file, then scan all the bytes of the for the newline character.
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new_with_options(url, [("timeout", "30s")])?;
let all = cloud_file.read_all().await?;
let newline_count = bytecount::count(&all, b'\n');
assert_eq!(newline_count, 500);
Sourcepub async fn stream_chunks(
&self,
) -> Result<BoxStream<'static, Result<Bytes>>, CloudFileError>
pub async fn stream_chunks( &self, ) -> Result<BoxStream<'static, Result<Bytes>>, CloudFileError>
Retrieve the file’s contents as a stream of
Bytes
.
§Example
Open the file as a stream of bytes, then scan all the bytes for the newline character.
use cloud_file::CloudFile;
use futures::StreamExt; // Enables `.next()` on streams.
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let cloud_file = CloudFile::new_with_options(url, [("timeout", "30s")])?;
let mut chunks = cloud_file.stream_chunks().await?;
let mut newline_count: usize = 0;
while let Some(chunk) = chunks.next().await {
let chunk = chunk?;
newline_count += bytecount::count(&chunk, b'\n');
}
assert_eq!(newline_count, 500);
Sourcepub async fn stream_line_chunks(
&self,
) -> Result<BoxStream<'static, Result<Bytes>>, CloudFileError>
pub async fn stream_line_chunks( &self, ) -> Result<BoxStream<'static, Result<Bytes>>, CloudFileError>
Retrieve the file’s contents as a stream of Bytes
,
each containing one or more whole lines.
§Example
Return the 12th line of a file.
use cloud_file::CloudFile;
use futures::StreamExt; // Enables `.next()` on streams.
use std::str::from_utf8;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/toydata.5chrom.fam";
let goal_index = 12;
let cloud_file = CloudFile::new(url)?;
let mut line_chunks = cloud_file.stream_line_chunks().await?;
let mut index_iter = 0..;
let mut goal_line = None;
'outer_loop: while let Some(line_chunk) = line_chunks.next().await {
let line_chunk = line_chunk?;
let lines = from_utf8(&line_chunk)?.lines();
for line in lines {
let index = index_iter.next().unwrap(); // Safe because the iterator is infinite
if index == goal_index {
goal_line = Some(line.to_string());
break 'outer_loop;
}
}
}
assert_eq!(goal_line, Some("per12 per12 0 0 2 -0.0382707".to_string()));
Sourcepub fn set_extension(&mut self, extension: &str) -> Result<(), CloudFileError>
pub fn set_extension(&mut self, extension: &str) -> Result<(), CloudFileError>
Change the CloudFile
’s extension (in place).
It removes the current extension, if any. It appends the given extension, if any.
The method is in-place rather than functional to make it consistent with
std::path::PathBuf::set_extension
.
§Example
use cloud_file::CloudFile;
let url = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/plink_sim_10s_100v_10pmiss.bed";
let mut cloud_file = CloudFile::new(url)?;
assert_eq!(cloud_file.read_file_size().await?, 303);
cloud_file.set_extension("fam")?;
assert_eq!(cloud_file.read_file_size().await?, 130);
Trait Implementations§
Auto Trait Implementations§
impl Freeze for CloudFile
impl !RefUnwindSafe for CloudFile
impl Send for CloudFile
impl Sync for CloudFile
impl Unpin for CloudFile
impl !UnwindSafe for CloudFile
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more