Crate rtm

Expand description

Intel RTM Extensions.

Please note this crate only works on x86_64 Intel processors, and only those built after the boardwell 6th generation.

#Basic Intro:

RTM works very similiar to a database. You can read/write memory but you have to commit the changes. If another thread modifies the same region as you are, the other RTM transaction will abort (the second chronologically).

RTM transaction can also be cancelled. Meaning if you do not want to commit a transaction as in you wish to roll it back that can be accomplished via abort(x: u8) interface within this library if you hit a condition that requires rolling back the transaction.

#Deep Dive:

Now we need to perform a deep dive into into RTM and it’s implementation. RTM works on the cache line level. This means each region RTM thinks it is exclusive to a cache line. Each cache line in Intel CPU’s is 64bytes, so you will wish to ensure that your data structures being modified WITHIN RTM transactions are X * 64 = size_of::<T>() or 0 == size_of::<T>() % 64. At the same time you will wish to ensure the allocation is on the 64 byte boundry (this is called allignment) this simply means &T % 64 == 0 (the physical pointer).

The reason for this false sharing. If a different thread modifies the same cacheline you have decared RTM your modification may abort reducing your preformance.

RTM works via the MESIF protocol. These are the states a Cache Line can be in. E (Exclusive), M (Modified), S (Shared), F (Forward), I (Invalid). Effectively RTM attempts to ensure that all the writes/reads you will perform are on E/F values (Exclusive/Forward). This means you either own the the only copy of this in Cache OR another thread may read this data, but not write to it.

If another thread attempts to write to a cacheline during the RTM transaction the status of your cache will change E -> S or F -> I. And the other thread is not executing RTM code, your transaction will abort.

#Architecture Notes:

RTM changes are buffered in L1 cache. so too many changes can result in very extreme performance penalities.

RMT changes are a full instruction barrier, but they are not the same as an mfence or sfence or lfence instruction (only to the local cache lines effected by an RTM transaction).

#Performance Notes:

For modification of a single cache line AtomicUsize or AtomicPtr will be faster even in SeqCst mode. RTM transaction are typically faster for larger transaction on the order of several cache lines (typically >300 bytes) or so.

Re-exports§

pub use tsx::*;

Modules§

tsx: Raw extension bindings

Enums§

Abort: Why the transaction aborted

Crate rtmCopy item path

Re-exports§

Modules§

Enums§

Crate rtm