get_chunk
About
get_chunk
is a library for creating file iterators or streams (asynchronous iterators),
specialized in efficient file chunking. The main task, the ability to retrieve chunk data especially from large files.
Key Features:
- File Chunking: Divide files, including large ones, into seamless chunks with each "Next" iteration.
- Automatic Chunking: Dynamically adjusts chunk sizes for optimal performance, ensuring efficient memory usage. Large chunks are limited to 85% of available free RAM.
- Modes: Choose between automatic tuning or manually set chunk size based on percentage or byte count.
⚠️ Important Notice:
Iterators created by get_chunk
don't store the entire file in memory, especially for large datasets.
Their purpose is to fetch data from files in chunks, maintaining efficiency.
Key Points:
- Limited File Retention: Creating an iterator for a small file might result in fetching all data, OS-dependent. However, this doesn't guarantee file persistence after iterator creation.
- Deletion Warning: Deleting a file during iterator or stream iterations will result in an error. These structures don't track the last successful position.
- No File Restoration: Attempting to restore a deleted file during iterations is not supported. These structures don't keep track of the file's original state.
Iterator version
Example
use FileIter;
// Note: requires a `size_format` attribute.
use IECUnit;
Stream version
Example
// Note: requires the `size_format` and `stream` attributes.
use IECUnit;
use ;
async
How it works
The calculate_chunk
function in the ChunkSize
enum determines the optimal chunk size based on various parameters. Here's a breakdown of how the size is calculated:
The variables prev
and now
represent the previous and current read time, respectively.
prev:
Definition: prev
represents the time taken to read a piece of data in the previous iteration.
now:
Definition: now
represents the current time taken to read the data fragment in the current iteration.
-
Auto Mode:
- If the previous read time (
prev
) is greater than zero:- If the current read time (
now
) is also greater than zero:- If
now
is less thanprev
, decrease the chunk size usingdecrease_chunk
method. - If
now
is greater than or equal toprev
, increase the chunk size usingincrease_chunk
method.
- If
- If
now
is zero or negative, maintain the previous chunk size (prev
).
- If the current read time (
- If the previous read time is zero or negative, use the default chunk size based on the file size and available RAM.
- If the previous read time (
-
Percent Mode:
- Calculate the chunk size as a percentage of the total file size using the
percentage_chunk
method. The percentage is capped between 0.1% and 100%.
- Calculate the chunk size as a percentage of the total file size using the
-
Bytes Mode:
- Calculate the chunk size based on the specified number of bytes using the
bytes_chunk
method. The size is capped by the file size and available RAM.
- Calculate the chunk size based on the specified number of bytes using the
Key Formulas:
- Increase Chunk Size:
.min.min
- Decrease Chunk Size:
.min.min
- Default Chunk Size:
.min.min
- Percentage Chunk Size:
.min
- Bytes Chunk Size:
.min