Expand description

Summary

Add command line interface for OpenDAL.

Motivation

Q: There are so many cli out there, why we still need a cli for OpenDAL?

A: Because there are so many cli out there.

To manipulate our date store in different could service, we need to install different clis:

Those clis provide native and seamless experiences for their own products but also lock us and our data.

However, for 80% cases, we just want to do simple jobs like cp, mv and rm. It’s boring to figure out how to use them:

  • aws --endpoint-url http://127.0.0.1:9900/ s3 cp data s3://testbucket/data --recursive
  • azcopy copy 'C:\myDirectory' 'https://mystorageaccount.blob.core.windows.net/mycontainer' --recursive
  • gsutil cp data gs://testbucket/

Can we use them in the same way? Can we let the data flow freely?

Let’s look back OpenDAL’s slogan:

Open Data Access Layer that connect the whole world together

This is a natural extension for OpenDAL: providing a command line interface!

Guide-level explanation

OpenDAL will provide a new cli called: oli. It’s a shortcut of OpenDAL Command Line Interface.

Users can install this cli via:

cargo install oli

Or using they favourite package management:

# Archlinux
pacman -S oli
# Debian / Ubuntu
apt install oli
# Rocky Linux / Fedora
dnf install oli
# macOS
brew install oli

With oli, users can:

  • Upload files to s3: oli cp books.csv s3://bucket/books.csv
  • Download files from azblob: oli cp azblob://bucket/books.csv /tmp/books.csv
  • Move data between storage services: oli mv s3://bucket/dir azblob://bucket/dir
  • Delete all files: oli rm -rf s3://bucket

oli also provide alias to make cloud data manipulating even natural:

  • ocp for oli cp
  • ols for oli ls
  • omv for oli mv
  • orm for oli rm
  • ostat for oli stat

oli will provide profile management so users don’t need to provide credential every time:

  • oli profile add my_s3 --bucket test --access-key-id=example --secret-access-key=example
  • ocp my_s3://dir /tmp/dir

Reference-level explanation

oli will be a separate crate apart from opendal so we will not pollute the dependencies of opendal. But oli will be releases at the same time with the same version of opendal. That means oli will always use the same (latest) version of opendal.

Most operations of oli should be trivial, we will propose new RFCs if requiring big changes.

oli won’t keep configuration. All config will go through environment, for example:

  • OIL_COLOR=always
  • OIL_CONCURRENCY=16

Besides, oil will read profile from env like cargo:

  • OIL_PROFILE_TEST_TYPE=s3
  • OIL_PROFILE_TEST_ENDPOINT=http://127.0.0.1:1090
  • OIL_PROFILE_TEST_BUCKET=test_bucket
  • OIL_PROFILE_TEST_ACCESS_KEPT_ID=access_key_id
  • OIL_PROFILE_TEST_SECRET_ACCESS_KEY=secret_access_key

With those environments, we can:

ocp path/to/dir test://test/to/dir

Drawbacks

None

Rationale and alternatives

s3cmd

s3cmd is a command line s3 client for Linux and Mac.

Usage: s3cmd [options] COMMAND [parameters]

S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.

Commands:
  Make bucket
      s3cmd mb s3://BUCKET
  Remove bucket
      s3cmd rb s3://BUCKET
  List objects or buckets
      s3cmd ls [s3://BUCKET[/PREFIX]]
  List all object in all buckets
      s3cmd la 
  Put file into bucket
      s3cmd put FILE [FILE...] s3://BUCKET[/PREFIX]
  Get file from bucket
      s3cmd get s3://BUCKET/OBJECT LOCAL_FILE
  Delete file from bucket
      s3cmd del s3://BUCKET/OBJECT
  Delete file from bucket (alias for del)
      s3cmd rm s3://BUCKET/OBJECT
  Restore file from Glacier storage
      s3cmd restore s3://BUCKET/OBJECT
  Synchronize a directory tree to S3 (checks files freshness using 
       size and md5 checksum, unless overridden by options, see below)
      s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR
  Disk usage by buckets
      s3cmd du [s3://BUCKET[/PREFIX]]
  Get various information about Buckets or Files
      s3cmd info s3://BUCKET[/OBJECT]
  Copy object
      s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
  Modify object metadata
      s3cmd modify s3://BUCKET1/OBJECT
  Move object
      s3cmd mv s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
  Modify Access control list for Bucket or Files
      s3cmd setacl s3://BUCKET[/OBJECT]
  Modify Bucket Policy
      s3cmd setpolicy FILE s3://BUCKET
  Delete Bucket Policy
      s3cmd delpolicy s3://BUCKET
  Modify Bucket CORS
      s3cmd setcors FILE s3://BUCKET
  Delete Bucket CORS
      s3cmd delcors s3://BUCKET
  Modify Bucket Requester Pays policy
      s3cmd payer s3://BUCKET
  Show multipart uploads
      s3cmd multipart s3://BUCKET [Id]
  Abort a multipart upload
      s3cmd abortmp s3://BUCKET/OBJECT Id
  List parts of a multipart upload
      s3cmd listmp s3://BUCKET/OBJECT Id
  Enable/disable bucket access logging
      s3cmd accesslog s3://BUCKET
  Sign arbitrary string using the secret key
      s3cmd sign STRING-TO-SIGN
  Sign an S3 URL to provide limited public access with expiry
      s3cmd signurl s3://BUCKET/OBJECT <expiry_epoch|+expiry_offset>
  Fix invalid file names in a bucket
      s3cmd fixbucket s3://BUCKET[/PREFIX]
  Create Website from bucket
      s3cmd ws-create s3://BUCKET
  Delete Website
      s3cmd ws-delete s3://BUCKET
  Info about Website
      s3cmd ws-info s3://BUCKET
  Set or delete expiration rule for the bucket
      s3cmd expire s3://BUCKET
  Upload a lifecycle policy for the bucket
      s3cmd setlifecycle FILE s3://BUCKET
  Get a lifecycle policy for the bucket
      s3cmd getlifecycle s3://BUCKET
  Remove a lifecycle policy for the bucket
      s3cmd dellifecycle s3://BUCKET
  List CloudFront distribution points
      s3cmd cflist 
  Display CloudFront distribution point parameters
      s3cmd cfinfo [cf://DIST_ID]
  Create CloudFront distribution point
      s3cmd cfcreate s3://BUCKET
  Delete CloudFront distribution point
      s3cmd cfdelete cf://DIST_ID
  Change CloudFront distribution point parameters
      s3cmd cfmodify cf://DIST_ID
  Display CloudFront invalidation request(s) status
      s3cmd cfinvalinfo cf://DIST_ID[/INVAL_ID]

aws-cli

aws-cli is the official cli provided by AWS.

$ aws s3 ls s3://mybucket
        LastWriteTime            Length Name
        ------------             ------ ----
                                PRE myfolder/
2013-09-03 10:00:00           1234 myfile.txt

$ aws s3 cp myfolder s3://mybucket/myfolder --recursive
upload: myfolder/file1.txt to s3://mybucket/myfolder/file1.txt
upload: myfolder/subfolder/file1.txt to s3://mybucket/myfolder/subfolder/file1.txt

$ aws s3 sync myfolder s3://mybucket/myfolder --exclude *.tmp
upload: myfolder/newfile.txt to s3://mybucket/myfolder/newfile.txt

azcopy

azcopy is the new Azure Storage data transfer utility.

azcopy copy 'C:\myDirectory\myTextFile.txt' 'https://mystorageaccount.blob.core.windows.net/mycontainer/myTextFile.txt'

azcopy copy 'https://mystorageaccount.blob.core.windows.net/mycontainer/myTextFile.txt' 'C:\myDirectory\myTextFile.txt'

azcopy sync 'C:\myDirectory' 'https://mystorageaccount.blob.core.windows.net/mycontainer' --recursive

gsutil

gsutil is a Python application that lets you access Cloud Storage from the command line.

gsutil cp [OPTION]... src_url dst_url
gsutil cp [OPTION]... src_url... dst_url
gsutil cp [OPTION]... -I dst_url

gsutil mv [-p] src_url dst_url
gsutil mv [-p] src_url... dst_url
gsutil mv [-p] -I dst_url

gsutil rm [-f] [-r] url...
gsutil rm [-f] [-r] -I

Unresolved questions

None.

Future possibilities

None.