pub fn split_store(_ctx: &mut Vec<Arc<UOp>>, x: &Arc<UOp>) -> Option<Arc<UOp>>
Split STORE and END operations into individual kernels.
Based on split_store. Simplified from 280 lines to ~80 lines using LocalAddBufferContext.