sync_async_proxy_shared is a synchronization fence for the experimental SM 9.0+ copy
functions, applying bidirectionally between the async proxy (i.e. TMA) and shared memory.
Should be used after intializing the barriers, and before the copy operation.
PTX: fence.proxy.async.shared::cta
Experimental and subject to change.
Sync_storage is the same but change “cube address space(shared memory)” to “storage address space(input args)”. But the set of invocations that are collaborating is still only the invocations in the same cube.There is no guarantee about using barriers alone to make the writes to storage buffer in one cube become visible to invocations in a different cube.