Task-conditioned entropy compression: lines that would normally be dropped
for low entropy are kept if they contain task-relevant keywords. This is
the Information Bottleneck proxy: we compress away only what is neither
surprising (high H) nor task-relevant (mentions goal concepts).
Falls back to pure entropy when task_keywords is empty.
Normalized Shannon entropy: H(X) / log₂(n) where n = number of unique symbols.
Returns a value in [0, 1] where 0 = perfectly predictable, 1 = maximum entropy.
This makes thresholds comparable across different alphabet sizes.