pub struct SamplingDispatcher {
pub default: CodecKind,
pub entropy_threshold: f64,
pub prefer_gpu: bool,
pub gpu_min_bytes: usize,
}Expand description
入力 sample を見て codec を選ぶ dispatcher。
判定順 (上位優先):
- 短すぎる入力 (<128 byte) →
default - magic bytes が既圧縮フォーマット (gzip / zstd / png / jpeg / mp4 / zip / pdf
/ 7z / xz / bzip2) →
Passthrough(再圧縮しても意味がない) - Shannon entropy が
entropy_threshold(default 7.5 bits/byte) 以上 →Passthrough(高エントロピー = ほぼランダム = 圧縮余地なし) - それ以外 →
default(text / log / parquet 数値列等、圧縮余地あり)
Phase 1 では default = CpuZstd 想定。Phase 1 後半で integer-column 検出を加え、
default 分岐を「数値列なら NvcompBitcomp、そうでなければ CpuZstd」に拡張する。
§v0.8 #56: GPU auto-detect at boot
with_gpu_preference(true, gpu_min_bytes) を呼ぶと、boot 時に
s4_codec::nvcomp::is_gpu_available() が true を返した場合に限り、
「default が CpuZstd でかつ total size >= gpu_min_bytes の object」を
NvcompZstd に昇格させる。size hint が None (chunked transfer)、
または閾値未満の小オブジェクトでは GPU upload overhead を避けるため
CPU codec のままにする。
nvcomp-gpu feature が build-time で off の場合、NvcompZstd への昇格は
行わない (registry に居ない codec を指すと dispatch 時に
UnregisteredCodec で fail するため)。orchestrator は main.rs 側で
prefer_gpu = false を強制することでこれを担保する。
Fields§
§default: CodecKind§entropy_threshold: f64§prefer_gpu: boolv0.8 #56: when set, route large CpuZstd picks through NvcompZstd.
gpu_min_bytes: usizev0.8 #56: GPU promotion only fires when the caller can prove
total_size >= gpu_min_bytes via pick_with_size_hint. Below this
threshold the GPU upload overhead exceeds the compress time so CPU
wins; the default 1 MiB is the empirical break-even point on common
text / log payloads with PCIe 4.0 + an A10G-class GPU.
Implementations§
Source§impl SamplingDispatcher
impl SamplingDispatcher
pub const DEFAULT_ENTROPY_THRESHOLD: f64 = 7.5
pub const MIN_SAMPLE_BYTES: usize = 128
Sourcepub const DEFAULT_GPU_MIN_BYTES: usize = 1_048_576
pub const DEFAULT_GPU_MIN_BYTES: usize = 1_048_576
v0.8 #56: 1 MiB. The empirical break-even point — below this, the PCIe upload + kernel launch overhead dominates the GPU’s compress throughput advantage.
pub fn new(default: CodecKind) -> Self
pub fn with_entropy_threshold(self, t: f64) -> Self
Sourcepub fn with_gpu_preference(self, prefer_gpu: bool, gpu_min_bytes: usize) -> Self
pub fn with_gpu_preference(self, prefer_gpu: bool, gpu_min_bytes: usize) -> Self
v0.8 #56: enable GPU promotion. When prefer_gpu = true, a CpuZstd
pick on a body whose total_size >= gpu_min_bytes is rewritten to
NvcompZstd. Pass prefer_gpu = false (the default) to disable.
The threshold is in bytes; 1_048_576 (1 MiB) is the recommended
default for PCIe 4.0 hosts.
Trait Implementations§
Source§impl Clone for SamplingDispatcher
impl Clone for SamplingDispatcher
Source§fn clone(&self) -> SamplingDispatcher
fn clone(&self) -> SamplingDispatcher
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl CodecDispatcher for SamplingDispatcher
impl CodecDispatcher for SamplingDispatcher
fn pick<'life0, 'life1, 'async_trait>(
&'life0 self,
sample: &'life1 [u8],
) -> Pin<Box<dyn Future<Output = CodecKind> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Source§fn pick_with_size_hint<'life0, 'life1, 'async_trait>(
&'life0 self,
sample: &'life1 [u8],
total_size: Option<u64>,
) -> Pin<Box<dyn Future<Output = CodecKind> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn pick_with_size_hint<'life0, 'life1, 'async_trait>(
&'life0 self,
sample: &'life1 [u8],
total_size: Option<u64>,
) -> Pin<Box<dyn Future<Output = CodecKind> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
pick(sample) に委譲する
ので、追加情報を活用する dispatcher (SamplingDispatcher) のみ override
すればよい。total_size = None は「chunked transfer で content-length
が無い」ケースを表す。