Intel RTM Extensions.
Please note this crate only works on x86_64 Intel processors, and only those built after the boardwell 6th generation.
RTM works very similiar to a database. You can read/write memory but you have to commit the changes. If another thread modifies the same region as you are, the other RTM transaction will abort (the second chronologically).
RTM transaction can also be cancelled. Meaning
if you do not want to commit a transaction
as in you wish to roll it back that can be
abort(x: u8) interface
within this library if you hit a condition
that requires rolling back the transaction.
Now we need to perform a deep dive into
into RTM and it's implementation. RTM works on
the cache line level. This means each region
RTM thinks it is exclusive to a cache line.
Each cache line in Intel CPU's is 64bytes,
so you will wish to ensure that your data
structures being modified WITHIN RTM
X * 64 = size_of::<T>()
0 == size_of::<T>() % 64. At the same
time you will wish to ensure the allocation
is on the 64 byte boundry (this is called
allignment) this simply means
&T % 64 == 0 (the physical pointer).
The reason for this false sharing. If a different thread modifies the same cacheline you have decared RTM your modification may abort reducing your preformance.
RTM works via the MESIF protocol. These are the states a Cache Line can be in. E (Exclusive), M (Modified), S (Shared), F (Forward), I (Invalid). Effectively RTM attempts to ensure that all the writes/reads you will perform are on E/F values (Exclusive/Forward). This means you either own the the only copy of this in Cache OR another thread may read this data, but not write to it.
If another thread attempts to write to a cacheline
during the RTM transaction the status of your cache
E -> S or
F -> I. And the other
thread is not executing RTM code, your transaction
RTM changes are buffered in L1 cache. so too many changes can result in very extreme performance penalities.
RMT changes are a full instruction barrier, but
they are not the same as an
lfence instruction (only to the local cache
lines effected by an RTM transaction).
For modification of a single cache line
AtomicPtr will be faster even
SeqCst mode. RTM transaction are typically
faster for larger transaction on the order of
several cache lines (typically
>300 bytes) or so.
Raw extension bindings
Why the transaction aborted