Struct KeyBasedRetention

Source

pub struct KeyBasedRetention { /* private fields */ }

Expand description

The goal of key-based retention is to keep the latest record for a given key within a topic. This is the same as Kafka’s key-based retention and effectively makes the commit log a key value store.

KeyBasedRetention is guaranteed to return a key that is the record’s key.

Implementations§

Source §

impl KeyBasedRetention

Source

pub fn new(max_compaction_keys: usize) -> Self

A max_compaction_keys parameter is used to limit the number of distinct topic/partition/keys processed in a single run of the compactor.

Source

pub fn with_qids( qid_from_record: fn(&ConsumerRecord) -> Option<Qid>, max_compaction_keys: usize, ) -> Self

Similar to the above, but the qualified id of a record will be determined by a function. A max_compaction_keys parameter is used to limit the number of distinct topic/partition/keys processed in a single run of the compactor.

Trait Implementations§

Source §

impl CompactionStrategy for KeyBasedRetention

Source §

type S = (HashMap<u64, u64>, usize)

The state to manage throughout a compaction run.

Source §

type KS = Option<(HashMap<SmolStr, u64>, fn(&ConsumerRecord) -> Option<SmolStr>)>

The type of state required to manage keys. It may be that no state is required i.e. if the key field from the record is used.

Source §

fn key_init(&self) -> Self::KS

Produce the initial key state for the compaction.

Source §

fn key(key_state: &mut Self::KS, r: &ConsumerRecord) -> Option<Key>

The key function computes a key that will be used by the compactor for subsequent use. In simple scenarios, this key can be the key field from the topic’s record itself, which is the default assumption here. Returning None will result in this record being excluded from compaction.

Source §

fn init<'life0, 'async_trait>( &'life0 self, ) -> Pin<Box<dyn Future<Output = Self::S> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Produce the initial state for the reducer function.

Source §

fn reduce( state: &mut Self::S, key: Key, record: ConsumerRecord, ) -> MaxKeysReached

The reducer function receives a mutable state reference, a key and a consumer record, and returns a bool indicating whether the function has reached its maximum number of distinct keys. If it has then compaction may occur again once the current consumption of records has finished. The goal is to avoid needing to process every type of topic/partition/key in one go i.e. a subset can be processed, and then another subset etc. This strategy helps to manage memory.

Source §

fn collect(state: Self::S) -> CompactionMap

The collect function is responsible for mapping the state into a map of keys and their minimum offsets. The compactor will filter out the first n keys in a subsequent run where n is the number of keys found on the first run. This permits memory to be controlled given a large number of distinct keys. The cost of the re-run strategy is that additional compaction scans will be required, resulting in more I/O.