Expand description

Summary

Implement path normalization to enhance user experience.

Motivation

OpenDAL’s current path behavior makes users confused:

They are different bugs that reflect the exact root cause: the path is not well normalized.

On local fs, we can read the same path with different path: abc/def/../def, abc/def, abc//def, abc/./def.

There is no magic here: our stdlib does the dirty job. For example:

  • std::path::PathBuf::canonicalize: Returns the canonical, absolute form of the path with all intermediate components normalized and symbolic links resolved.
  • std::path::PathBuf::components: Produces an iterator over the Components of the path. When parsing the path, there is a small amount of normalization…

But for s3 alike storage system, there’s no such helpers: abc/def/../def, abc/def, abc//def, abc/./def refers entirely different objects. So users may confuse why I can’t get the object with this path.

So OpenDAL needs to implement path normalization to enhance the user experience.

Guide-level explanation

We will do path normalization automatically.

The following rules will be applied (so far):

  • Remove // inside path: op.object("abc/def") and op.object("abc//def") will resolve to the same object.
  • Make sure path under root: op.object("/abc") and op.object("abc") will resolve to the same object.

Other rules still need more consideration to leave them for the future.

Reference-level explanation

We will build the absolute path via {root}/{path} and replace all // into / instead.

Drawbacks

None

Rationale and alternatives

If we build an actual path via {root}/{path}, the link object may be inaccessible.

I don’t have good ideas so far. Maybe we can add a new flag to control the link behavior. For now, there’s no feature request for link support.

Let’s leave for the future to resolve.

S3 URI Clean

For s3, abc//def is different from abc/def indeed. To make it possible to access not normalized path, we can provide a new flag for the builder:

let builder = Backend::build().disable_path_normalization()

In this way, the user can control the path more precisely.

Prior art

None

Unresolved questions

None

Future possibilities

None