Expand description

Summary

Add multipart support in OpenDAL.

Motivation

Multipart Upload APIs are widely used in object storage services to upload large files concurrently and resumable.

A successful multipart upload includes the following steps:

  • CreateMultipartUpload: Start a new multipart upload.
  • UploadPart: Upload a single part with the previously uploaded id.
  • CompleteMultipartUpload: Complete a multipart upload to get a regular object.

To cancel a multipart upload, users need to call AbortMultipartUpload.

Apart from those APIs, most object services also provide a list API to get the current multipart uploads status:

  • ListMultipartUploads: List current ongoing multipart uploads
  • ListParts: List already uploaded parts.

Before CompleteMultipartUpload has been called, users can’t read already uploaded parts.

After CompleteMultipartUpload or AbortMultipartUpload has been called, all uploaded parts will be removed.

Object storage services commonly allow 10000 parts, and every part will allow up to 5 GiB. This way, users can upload a file up to 48.8 TiB.

OpenDAL users can upload objects larger than 5 GiB via supporting multipart uploads.

Guide-level explanation

Users can start a multipart upload via:

let mp = op.object("path/to/file").create_multipart().await?;

Or build a multipart via already known upload id:

let mp = op.object("path/to/file").into_multipart("<upload_id>");

With Multipart, we can upload a new part:

let part = mp.write(part_number, content).await?;

After all parts have been uploaded, we can finish this upload:

let _ = mp.complete(parts).await?;

Or, we can abort already uploaded parts:

let _ = mp.abort().await?;

Reference-level explanation

Accessor will add the following APIs:

pub trait Accessor: Send + Sync + Debug {
    async fn create_multipart(&self, args: &OpCreateMultipart) -> Result<String> {
        let _ = args;
        unimplemented!()
    }

    async fn write_multipart(&self, args: &OpWriteMultipart) -> Result<PartWriter> {
        let _ = args;
        unimplemented!()
    }

    async fn complete_multipart(&self, args: &OpCompleteMultipart) -> Result<()> {
        let _ = args;
        unimplemented!()
    }

    async fn abort_multipart(&self, args: &OpAbortMultipart) -> Result<()> {
        let _ = args;
        unimplemented!()
    }
}

While closing a PartWriter, a Part will be generated.

Operator will build APIs based on Accessor:

impl Object {
    async fn create_multipart(&self) -> Result<Multipart> {}
    fn into_multipart(&self, upload_id: &str) -> Multipart {}
}

impl Multipart {
    async fn write(&self, part_number: usize, bs: impl AsRef<[u8]>) -> Result<Part> {}
    async fn writer(&self, part_number: usize, size: u64) -> Result<impl PartWrite> {}
    async fn complete(&self, ps: &[Part]) -> Result<()> {}
    async fn abort(&self) -> Result<()> {}
}

Drawbacks

None.

Rationale and alternatives

Why not add new object modes?

It seems natural to add a new object mode like multipart.

pub enum ObjectMode {
    FILE,
    DIR,
    MULTIPART,
    Unknown,
}

However, to make this work, we need big API breaks that introduce mode in Object.

And we need to change every API call to accept mode as args.

For example:

let _ = op.object("path/to/dir/").list(ObjectMODE::MULTIPART);
let _ = op.object("path/to/file").stat(ObjectMODE::MULTIPART)

Why not split Object into File and Dir?

We can split Object into File and Dir to avoid requiring mode in API. There is a vast API breakage too.

Prior art

None.

Unresolved questions

None.

Future possibilities

Support list multipart uploads

We can support listing multipart uploads to list ongoing multipart uploads so we can resume an upload or abort them.

Support list part

We can support listing parts to list already uploaded parts for an upload.