Module glommio::io[][src]

Expand description

glommio::io provides data structures targeted towards File I/O.

File I/O in Glommio comes in two kinds: Buffered and Direct I/O.

Ideally an application would pick one of them according to its needs and not mix both. However if you do want to mix both, it is recommended that you do not do so in the same device: Kernel settings like I/O schedulers and merge settings that are beneficial to one of them can be detrimental to the others.

If you absolutely must use both in the same device, avoid issuing both Direct and Buffered I/O in the same file: at this point you are just trying to drive Linux crazy.

Buffered I/O

Buffered I/O will use the operating system page cache. It is ideal for simpler applications that don’t want to deal with caching policies and have I/O performance as a maybe important, but definitely not crucial part of their performance story.

Disadvantages of Buffered I/O:

  • Hard to know when resources are really used, which make controlled processes almost impossible (the time of write to device is detached from the file write time)
  • More copies than necessary, as the data has to be copied from the device to the page cache, from the page cache to the internal file buffers, and in abstract linear implementations like AsyncWriteExt and AsyncReadExt from user-provided buffers to the file internal buffers.
  • Advanced features for io_uring like Non-interrupt mode, registered files, registered buffers, will not work with Buffered I/O
  • Read amplification for small random reads, as the OS is bounded by the page size (usually 4kB), even though modern NVMe devices are perfectly capable of issuing 512-byte I/O.

The main structure to deal with Buffered I/O is the BufferedFile struct. It is targeted at random I/O. Reads from and writes to it expect a position.

Direct I/O

Direct I/O will not use the Operating System page cache and will always touch the device directly. That will always work very well for stream-based workloads (scanning a file much larger than memory, writing a buffer that will not be read from in the near future, etc) but will require a user-provided cache for good random performance.

There are advantages to using a user-provided cache: Files usually contain serialized objects and every read have to deserialize them. A user-provided cache can cache the parsed objects, among others. Still, not all applications can or want to deal with that complexity.

Disadvantages of Direct I/O:

  • I/O needs to be aligned. Both the buffers and the file positions need specific alignments. The DmaBuffer should hide most of that complexity, but you may still end up with heavy read amplification if you are not careful.
  • Without a user-provided cache, random performance can be bad.

At the lowest level, there are two main structs that deal with File Direct I/O:

DmaFile is targeted at random Direct I/O. Reads from and writes to it expect a position.

DmaStreamWriter and DmaStreamReader perform sequential I/O and their interface is a lot closer to other mainstream rust interfaces in std::fs.

However, despite being sequential, I/O for the two Stream structs are parallel: DmaStreamWriter exposes a setting for write-behind, meaning that it will keep accepting writes to its internal buffers even with older writes are still in-flight. In turn, DmaStreamReader exposes a setting for read-ahead meaning it will initiate I/O for positions you will read into the future sooner.

ImmutableFile

Often times, due to constraints of modern storage systems, files are immutable once written. To safely capture that pattern and provide useful otimizations, Glommio exposes an ImmutableFile: The ImmutableFile can be written to once created, but once sealed it is assumed not to change. This allows Glommio to provide some optimizations, like basic caching, that should make Direct I/O more palatable.

If this matches your access patterns, consider using an ImmutableFile instead of the lower level DmaFile. ImmutableFiles can be accessed both randomly or sequentially.

Structs

An asynchronously accessed file backed by the OS page cache.

A directory representation where asynchronous operations can be issued

A buffer suitable for Direct I/O over io_uring

An asynchronously accessed Direct Memory Access (DMA) file.

Provides linear access to a DmaFile. The DmaFile is a convenient way to manage a file through Direct I/O, but its interface is conductive to random access, as a position must always be specified.

Builds a DmaStreamReader, allowing linear access to a Direct I/O DmaFile

Provides linear access to a DmaFile. The DmaFile is a convenient way to manage a file through Direct I/O, but its interface is conductive to random access, as a position must always be specified.

Builds a DmaStreamWriter, allowing linear access to a Direct I/O DmaFile

A Direct I/O enabled file abstraction that can not be written to.

Builds a new ImmutableFile, allowing linear and random access to a Direct I/O DmaFile.

Sink portion of the ImmutableFile

Options and flags which can be used to configure how a file is opened.

A stream of ReadResult produced asynchronously.

ReadResult encapsulates a buffer, returned by read operations like get_buffer_aligned and read_at

Provides linear read access to a BufferedFile.

Builds a StreamReader, allowing linear read access to a BufferedFile

Provides linear write access to a BufferedFile.

Builds a StreamWriter, allowing linear write access to a BufferedFile

Traits

An interface to an IO vector.

Functions

remove an existing file given its name

rename an existing file.

Allows asynchronous read access to the standard input