Module opendal::docs::rfcs::rfc_1398_query_based_metadata
source · Expand description
- Proposal Name:
query_based_metadata
- Start Date: 2022-02-22
- RFC PR: datafuselabs/opendal#1398
- Tracking Issue: datafuselabs/opendal#1398
Summary
Read cached metadata based on users query.
Motivation
OpenDAL has native metadata cache for now:
let _ = o.metadata().await?;
// This call doesn't need to send a request.
let _ = o.metadata().await?;
Also, OpenDAL can reuse metadata from list
or scan
:
let mut ds = o.scan().await?;
while let Some(de) = ds.try_next().await? {
// This call doesn't need to send a request (if we are lucky enough).
let _ = de.metadata().await?;
}
By reusing metadata from list
or scan
we can reduce the extra stat
call for each object. In our real use cases, we can reduce the total time to calculate the total length inside a dir with 6k files from 4 minutes to 2 seconds.
However, metadata can only be cached as a whole. If services could return more metadata in stat
than in list
, we wouldn’t be able to mark the metadata as cacheable. If services add more metadata, we could inadvertently introduce the performance degradation.
RFC Object Metadataer intends to add ObjectMetadataer
to address this issue. But it sooner to be proved that a failure: it’s hard to design a correct API.
Users have to write code like the following:
let om = o.metadata().await?;
let _ = om.content_length().await?;
let _ = om.content_md5().await?;
In this RFC, we will add a query based metadata.
Guide-level explanation
After this RFC, o.metadata()
will accept a query composed by ObjectMetadataKey
.
To query already cached metadata:
let meta = op.object("test").metadata(None).await?;
let _ = meta.content_length();
let _ = meta.content_type();
To query content length and content type:
let meta = op
.object("test")
.metadata({
use ObjectMetadataKey::*;
ContentLength | ContentType
})
.await?;
let _ = meta.content_length();
let _ = meta.content_type();
To query all metadata about this object:
let meta = op
.object("test")
.metadata({ ObjectMetadataKey::Complete })
.await?;
let _ = meta.content_length();
let _ = meta.content_type();
Reference-level explanation
We will store bits in ObjectMetadata
to store which fields have been set. And we can compare the bits to decide whether we need to query from storage again:
pub async fn metadata(
&mut self,
flags: impl Into<FlagSet<ObjectMetadataKey>>,
) -> Result<Arc<ObjectMetadata>> {
if let Some(meta) = &self.meta {
if meta.bit().contains(flags) || meta.bit().contains(ObjectMetadataKey::Complete) {
return Ok(meta.clone());
}
}
let meta = Arc::new(self.stat().await?);
self.meta = Some(meta.clone());
Ok(meta)
}
Drawbacks
Breaking changes
After this RFC, Object::metadata()
will accept a query. And all exisintg users need to adapt their code for that.
Rationale and alternatives
None
Prior art
None
Unresolved questions
None
Future possibilities
None