asm-lsp 0.10.1 - Docs.rs

�osdiv
Signed divideSDIV <Wd>, <Wn>, <Wm>SDIV <Xd>, <Xn>, <Xm>+SDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>gcsstrGuarded Control Stack storeGCSSTR <Xt>, [<Xn|SP>]ldaxrh(Load-acquire exclusive register halfwordLDAXRH <Wt>, [<Xn|SP>{, #0}]ldsetab Atomic bit set on byte in memoryLDSETAB <Ws>, <Wt>, [<Xn|SP>]LDSETALB <Ws>, <Wt>, [<Xn|SP>]LDSETB <Ws>, <Wt>, [<Xn|SP>]LDSETLB <Ws>, <Wt>, [<Xn|SP>]brkas�Sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKAS <Pd>.B, <Pg>/Z, <Pn>.Bcpypwtn?Memory copy, writes unprivileged, reads and writes non-temporal!CPYPWTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMWTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYEWTN  [ <Xd>]!, [<Xs>]!, <Xn>!bfmax�Determine the maximum of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.:BFMAX { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H:BFMAX { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HGBFMAX { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }GBFMAX { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }&BFMAX <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HpacnbiasppcPPointer Authentication Code for return address, using key A, not a branch targetPACNBIASPPC fmlslt��This half-precision floating-point multiply-subtract long instruction widens the odd-numbered half-precision elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding half-precision elements in the source vectors. This instruction is unpredicated.FMLSLT <Zda>.S, <Zn>.H, <Zm>.H%FMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]bfmlalt��This BFloat16 floating-point multiply-add long instruction widens the odd-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLALT <Zda>.S, <Zn>.H, <Zm>.H&BFMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]lastb�If there is an active element then extract the last active element from the final source vector register. If there are no active elements, extract the highest-numbered element. Then zero-extend and place the extracted element in the destination general-purpose register.LASTB <R><d>, <Pg>, <Zn>.<T>LASTB <V><d>, <Pg>, <Zn>.<T>smullb�Multiply the corresponding even-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$SMULLB <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$SMULLB <Zd>.D, <Zn>.S, <Zm>.S[<imm>]frecpx+Floating-point reciprocal exponent (scalar)FRECPX <Hd>, <Hn>FRECPX <V><d>, <V><n>!FRECPX <Zd>.<T>, <Pg>/M, <Zn>.<T>smnegl
SMNEGL -- A64Signed multiply-negate longSMNEGL <Xd>, <Wn>, <Wm>SMSUBL   <Xd>, <Wn>, <Wm>, XZReorqv�Bitwise exclusive OR of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as all zeros.EORQV <Vd>.<T>, <Pg>, <Zn>.<Tb>sm3tt1bSM3TT1B(SM3TT1B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]movnMove wide with NOT!MOVN <Wd>, #<imm>{, LSL #<shift>}!MOVN <Xd>, #<imm>{, LSL #<shift>}shsubr�5Subtract active signed elements of the first source vector from corresponding signed elements of the second source vector, shift right one bit, and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.-SHSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sqdmulhDSigned saturating doubling multiply returning high half (by element)*SQDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>].SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQDMULH <V><d>, <V><n>, <V><m>$SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FSQDMULH { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>FSQDMULH { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>USQDMULH { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }USQDMULH { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }$SQDMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>%SQDMULH <Zd>.H, <Zn>.H, <Zm>.H[<imm>]%SQDMULH <Zd>.S, <Zn>.S, <Zm>.S[<imm>]%SQDMULH <Zd>.D, <Zn>.D, <Zm>.D[<imm>]fnmls�cMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.+FNMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>brkpb�}If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.$BRKPB <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bzipq2�Interleave alternating elements from high halves of the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."ZIPQ2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>bfmls�UMultiply the corresponding active BFloat16 elements of the first and second source vectors and subtract from elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.%BFMLS <Zda>.H, <Pg>/M, <Zn>.H, <Zm>.H$BFMLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]IBFMLS   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IBFMLS   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@BFMLS   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@BFMLS   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMBFMLS   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MBFMLS   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }csinvConditional select invertCSINV <Wd>, <Wn>, <Wm>, <cond>CSINV <Xd>, <Xn>, <Xm>, <cond>cpyfpn7Memory copy forward-only, reads and writes non-temporal CPYFPN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFMN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFEN  [ <Xd>]!, [<Xs>]!, <Xn>!msubMultiply-subtractMSUB <Wd>, <Wn>, <Wm>, <Wa>MSUB <Xd>, <Xn>, <Xm>, <Xa>rdffrRead the first-fault register (RDFFR <Pd>.BRDFFR <Pd>.B, <Pg>/Zsqincb�jDetermines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCB <Xdn>{, <pattern>{, MUL #<imm>}}tbxTable vector lookup extension&TBX <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>2TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>>TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>JTBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta> TBX <Zd>.<T>, <Zn>.<T>, <Zm>.<T>prfum!Prefetch memory (unscaled offset)/PRFUM  ( <prfop>|#<imm5>), [<Xn|SP>{, #<simm>}]uminvUnsigned minimum across vectorUMINV <V><d>, <Vn>.<T>UMINV <V><d>, <Pg>, <Zn>.<T>autia171615-Authenticate instruction address, using key AAUTIA171615 ld1q�Gather load of quadwords to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.)LD1Q { <Zt>.Q }, <Pg>/Z, [<Zn>.D{, <Xm>}]ELD1Q { <ZAt><HV>.Q[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #4}]blrBranch with link to registerBLR <Xn>clastb�\From the source vector register extract the last active element, and then zero-extend that element to destructively place in the destination and first source general-purpose register. If there are no active elements then destructively zero-extend the least significant element-size bits of the destination and first source general-purpose register.'CLASTB <R><dn>, <Pg>, <R><dn>, <Zm>.<T>'CLASTB <V><dn>, <Pg>, <V><dn>, <Zm>.<T>+CLASTB <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>cmhs(Compare unsigned higher or same (vector)CMHS  D <d>, D<n>, D<m>!CMHS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>pacib171615@Pointer Authentication Code for instruction address, using key BPACIB171615 cincCINC -- A64Conditional incrementCINC <Wd>, <Wn>, <invcond> CSINC   <Wd>, <Wn>, <Wm>, <cond>CINC <Xd>, <Xn>, <invcond> CSINC   <Xd>, <Xn>, <Xm>, <cond>syslSystem instruction with result%SYSL <Xt>, #<op1>, <Cn>, <Cm>, #<op2>cmhi Compare unsigned higher (vector)CMHI  D <d>, D<n>, D<m>!CMHI <Vd>.<T>, <Vn>.<T>, <Vm>.<T>ld3rMLoad single 3-element structure and replicate to all lanes of three registers3LD3R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]:LD3R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>9LD3R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>bfmul�Multiply active BFloat16 elements of the second source vector to corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.&BFMUL <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HBFMUL <Zd>.H, <Zn>.H, <Zm>.H#BFMUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]ldff1sw�WGather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector..LDFF1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]7LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]4LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]5LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]-LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]pmov�Copy a packed bitmap, where bit value 0b1 represents TRUE and bit value 0b0 represents FALSE, from a portion of the source vector register to elements of the destination SVE predicate register.PMOV <Pd>.B, <Zn>PMOV <Pd>.D, <Zn>{[<imm>]}PMOV <Pd>.H, <Zn>{[<imm>]}PMOV <Pd>.S, <Zn>{[<imm>]}PMOV <Zd>, <Pn>.BPMOV <Zd>{[<imm>]}, <Pn>.DPMOV <Zd>{[<imm>]}, <Pn>.HPMOV <Zd>{[<imm>]}, <Pn>.Ssumopa>The 8-bit integer variant works with a 32-bit element ZA tile./SUMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/SUMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hfmlalltb�This 8-bit floating-point multiply-add long-long instruction widens the third 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2 FMLALLTB <Zda>.S, <Zn>.B, <Zm>.B'FMLALLTB <Zda>.S, <Zn>.B, <Zm>.B[<imm>]subpsSubtract pointer, setting flagsSUBPS <Xd>, <Xn|SP>, <Xm|SP>tstartStart transactionTSTART <Xt>udivr� Unsigned reversed divide active elements of the second source vector by corresponding elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.,UDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sqsubr�'Subtract active signed elements of the first source vector from corresponding signed elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Each result element is saturated to the N-bit element's signed integer range -2-SQSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>incd�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements.'INCD <Zdn>.D{, <pattern>{, MUL #<imm>}}'INCH <Zdn>.H{, <pattern>{, MUL #<imm>}}'INCW <Zdn>.S{, <pattern>{, MUL #<imm>}}gcspopmGCSPOPM -- A64Guarded Control Stack popGCSPOPM    { <Xt>}SYSL   <Xt>, #3, C7, C7, #1fcvtnt}Convert each single-precision element of the group of two source vectors to 8-bit floating-point while scaling the value by 2"FCVTNT <Zd>.B, { <Zn1>.S-<Zn2>.S }FCVTNT <Zd>.H, <Pg>/M, <Zn>.SFCVTNT <Zd>.S, <Pg>/M, <Zn>.DfaddFloating-point add (vector)!FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FADD <Hd>, <Hn>, <Hm>FADD <Sd>, <Sn>, <Sm>FADD <Dd>, <Dn>, <Dm>*FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!FADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>FADD    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }8FADD    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }>FADD    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }8FADD    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }cpyptrn>Memory copy, reads and writes unprivileged, reads non-temporal!CPYPTRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMTRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYETRN  [ <Xd>]!, [<Xs>]!, <Xn>!fmad�SMultiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.*FMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>cadd��Add the real and imaginary components of the integral complex numbers from the first source vector to the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation, equivalent to multiplying the complex numbers in the second source vector by ±,CADD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, <const>ldursw$Load register signed word (unscaled)!LDURSW <Xt>, [<Xn|SP>{, #<simm>}]saddlpSigned add long pairwiseSADDLP <Vd>.<Ta>, <Vn>.<Tb>gmiTag mask insertGMI <Xd>, <Xn|SP>, <Xm>st2h�2Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]uqrshr�Shift right by an immediate value, the unsigned integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2,UQRSHR <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>4UQRSHR <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>bf1cvt�Convert each 8-bit floating-point element of the source vector to BFloat16 while downscaling the value, and place the results in the corresponding 16-bit elements of the destination vectors. BF1CVT scales the values by 2"BF1CVT { <Zd1>.H-<Zd2>.H }, <Zn>.B"BF2CVT { <Zd1>.H-<Zd2>.H }, <Zn>.BBF1CVT <Zd>.H, <Zn>.BBF2CVT <Zd>.H, <Zn>.Bfcpy�Copy a floating-point immediate into each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FCPY <Zd>.<T>, <Pg>/M, #<const>st4q�4Contiguous store four-quadword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #4]smulhSigned multiply highSMULH <Xd>, <Xn>, <Xm>,SMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"SMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ptrue�#Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to false otherwise. If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false.PTRUE <Pd>.<T>{, <pattern>}PTRUE <PNd>.<T>ushlUnsigned shift left (register)USHL  D <d>, D<n>, D<m>!USHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>st2d�4Contiguous store two-doubleword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]ldtrh%Load register halfword (unprivileged) LDTRH <Wt>, [<Xn|SP>{, #<simm>}]adcAdd with carryADC <Wd>, <Wn>, <Wm>ADC <Xd>, <Xn>, <Xm>ldtrsh,Load register signed halfword (unprivileged)!LDTRSH <Wt>, [<Xn|SP>{, #<simm>}]!LDTRSH <Xt>, [<Xn|SP>{, #<simm>}]eon+Bitwise exclusive-OR NOT (shifted register))EON <Wd>, <Wn>, <Wm>{, <shift> #<amount>})EON <Xd>, <Xn>, <Xm>{, <shift> #<amount>}EON�KBitwise exclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated."EON <Zdn>.<T>, <Zdn>.<T>, #<const>*EOR  <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)ldgmLoad tag multipleLDGM <Xt>, [<Xn|SP>]sqdecp�Counts the number of true elements in the source predicate and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.SQDECP <Xdn>, <Pm>.<T>, <Wdn>SQDECP <Xdn>, <Pm>.<T>SQDECP <Zdn>.<T>, <Pm>.<T>ldsminh;Atomic signed minimum on halfword in memory, without return4STSMINH <Ws>, [<Xn|SP>]LDSMINH  <Ws>, WZR, [<Xn|SP>]6STSMINLH <Ws>, [<Xn|SP>]LDSMINLH  <Ws>, WZR, [<Xn|SP>]smlalb�Multiply the corresponding even-numbered signed elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLALB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]cpyprnMemory copy, reads non-temporal CPYPRN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMRN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYERN  [ <Xd>]!, [<Xs>]!, <Xn>!facge>Floating-point absolute compare greater than or equal (vector)FACGE <Hd>, <Hn>, <Hm>FACGE <V><d>, <V><n>, <V><m>"FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>ld1rod�Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]sdivr�Signed reversed divide active elements of the second source vector by corresponding elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.,SDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldseth4Atomic bit set on halfword in memory, without return2STSETH <Ws>, [<Xn|SP>]LDSETH  <Ws>, WZR, [<Xn|SP>]4STSETLH <Ws>, [<Xn|SP>]LDSETLH  <Ws>, WZR, [<Xn|SP>]cselConditional selectCSEL <Wd>, <Wn>, <Wm>, <cond>CSEL <Xd>, <Xn>, <Xm>, <cond>rev
Reverse bytesREV <Wd>, <Wn>REV <Xd>, <Xn>REV <Pd>.<T>, <Pn>.<T>REV <Zd>.<T>, <Zn>.<T>fminnmqv�9Floating-point minimum number of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the default NaN."FMINNMQV <Vd>.<T>, <Pg>, <Zn>.<Tb>bgrp��This instruction separates bits in each element of the first source vector by gathering from the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector to the lowest-numbered contiguous bits of the corresponding destination element, and from positions indicated by zero bits to the highest-numbered bits of the destination element, preserving the bit order within each group. This instruction is unpredicated.!BGRP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>raddhnb�6Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant rounded half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.&RADDHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>sqshrnb�<Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's signed integer range -2%SQSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>umov8Unsigned move vector element to general-purpose registerUMOV <Wd>, <Vn>.<Ts>[<index>]UMOV <Xd>, <Vn>.D[<index>]brkBreakpoint instructionBRK  # <imm>ldsminab'Atomic signed minimum on byte in memoryLDSMINAB <Ws>, <Wt>, [<Xn|SP>]LDSMINALB <Ws>, <Wt>, [<Xn|SP>]LDSMINB <Ws>, <Wt>, [<Xn|SP>]LDSMINLB <Ws>, <Wt>, [<Xn|SP>]ldsmaxah+Atomic signed maximum on halfword in memoryLDSMAXAH <Ws>, <Wt>, [<Xn|SP>]LDSMAXALH <Ws>, <Wt>, [<Xn|SP>]LDSMAXH <Ws>, <Wt>, [<Xn|SP>]LDSMAXLH <Ws>, <Wt>, [<Xn|SP>]fcvtn9Floating-point convert to lower precision narrow (vector)FCVTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>%FCVTN <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>&FCVTN{ 2}  <Vd>.<Ta>, <Vn>.4S, <Vm>.4S!FCVTN <Zd>.B, { <Zn1>.H-<Zn2>.H }!FCVTN <Zd>.B, { <Zn1>.S-<Zn4>.S }!FCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }ext#Extract vector from pair of vectors*EXT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<index>(EXT <Zd>.B, { <Zn1>.B, <Zn2>.B }, #<imm>$EXT <Zdn>.B, <Zdn>.B, <Zm>.B, #<imm>st1w�Contiguous store of words from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]8ST1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]=ST1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]KST1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2])ST1W { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}])ST1W { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]4ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]2ST1W { <Zt>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #2].ST1W { <Zt>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]2ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]2ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]/ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]/ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]0ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2](ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]CST1W { <ZAt><HV>.S[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #2}]sqshrunt�?Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&SQSHRUNT <Zd>.<T>, <Zn>.<Tb>, #<const>sqincd�kDetermines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCD <Xdn>{, <pattern>{, MUL #<imm>}})SQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}umlalt�Multiply the corresponding odd-numbered unsigned elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLALT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]ldgLoad Allocation TagLDG <Xt>, [<Xn|SP>{, #<simm>}]sqcvt�Saturate the signed integer value in each element of the two source vectors to half the original source element width, and place the results in the half-width destination elements.!SQCVT <Zd>.H, { <Zn1>.S-<Zn2>.S })SQCVT <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }sysSystem instruction(SYS  # <op1>, <Cn>, <Cm>, #<op2>{, <Xt>}esbError synchronization barrierESB uminpUnsigned minimum pairwise"UMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UMINP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>autib-Authenticate instruction address, using key BAUTIB <Xd>, <Xn|SP>AUTIZB <Xd>
AUTIB1716 AUTIBSP AUTIBZ nors�Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #NORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bfrint32xLFloating-point round to 32-bit integer, using current rounding mode (vector)FRINT32X <Vd>.<T>, <Vn>.<T>FRINT32X <Sd>, <Sn>FRINT32X <Dd>, <Dn>prfh�Gather prefetch of halfwords from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive addresses are not prefetched from memory.&PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFH <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]+PRFH <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #1]/PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]/PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]-PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1]psb!Profiling synchronization barrierPSB  CSYNC sha512hSHA512 hash update part 1SHA512H <Qd>, <Qn>, <Vm>.2Dfmaxnm&Floating-point maximum number (vector)#FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMAXNM <Hd>, <Hn>, <Hm>FMAXNM <Sd>, <Sn>, <Sm>FMAXNM <Dd>, <Dn>, <Dm>EFMAXNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>EFMAXNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>TFMAXNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TFMAXNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>-FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>saddlt�Add the corresponding odd-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SADDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>fmaxnmp:Floating-point maximum number of pair of elements (scalar)FMAXNMP  H <d>, <Vn>.2HFMAXNMP <V><d>, <Vn>.<T>$FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>.FMAXNMP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sqshrn0Signed saturating shift right narrow (immediate)!SQSHRN <Vb><d>, <Va><n>, #<shift>*SQSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>bsl1n�CSelects bits from the inverted first source vector where the corresponding bit in the third source vector is '1', and from the second source vector where the corresponding bit in the third source vector is '0'. The result is placed destructively in the destination and first source vector. This instruction is unpredicated.&BSL1N <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dsrshr'Signed rounding shift right (immediate)SRSHR  D <d>, D<n>, #<shift>"SRSHR <Vd>.<T>, <Vn>.<T>, #<shift>,SRSHR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>ldumaxh=Atomic unsigned maximum on halfword in memory, without return4STUMAXH <Ws>, [<Xn|SP>]LDUMAXH  <Ws>, WZR, [<Xn|SP>]6STUMAXLH <Ws>, [<Xn|SP>]LDUMAXLH  <Ws>, WZR, [<Xn|SP>]bfmlalb��This BFloat16 floating-point multiply-add long instruction widens the even-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLALB <Zda>.S, <Zn>.H, <Zm>.H&BFMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]fcvtxnJFloating-point convert to lower precision narrow, rounding to odd (vector)FCVTXN  S <d>, D<n>FCVTXN{ 2}  <Vd>.<Tb>, <Vn>.2Dfmaxv$Floating-point maximum across vectorFMAXV <V><d>, <Vn>.<T>FMAXV  S <d>, <Vn>.4SFMAXV <V><d>, <Pg>, <Zn>.<T>xaflagIConvert floating-point condition flags from external format to Arm formatXAFLAG ldsmaxab'Atomic signed maximum on byte in memoryLDSMAXAB <Ws>, <Wt>, [<Xn|SP>]LDSMAXALB <Ws>, <Wt>, [<Xn|SP>]LDSMAXB <Ws>, <Wt>, [<Xn|SP>]LDSMAXLB <Ws>, <Wt>, [<Xn|SP>]ldumaxab)Atomic unsigned maximum on byte in memoryLDUMAXAB <Ws>, <Wt>, [<Xn|SP>]LDUMAXALB <Ws>, <Wt>, [<Xn|SP>]LDUMAXB <Ws>, <Wt>, [<Xn|SP>]LDUMAXLB <Ws>, <Wt>, [<Xn|SP>]ld1row�Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]stlxp)Store-release exclusive pair of registers)STLXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{, #0}])STLXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{, #0}]asrvArithmetic shift right variableASRV <Wd>, <Wn>, <Wm>ASRV <Xd>, <Xn>, <Xm>splice��Select a region from the first source vector and copy it to the lowest-numbered elements of the result. Then set any remaining elements of the result to a copy of the lowest-numbered elements from the second source vector. The region is selected using the first and last true elements in the vector select predicate register. The result is placed destructively in the destination and first source vector, or constructively in the destination vector./SPLICE <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }+SPLICE <Zdn>.<T>, <Pv>, <Zdn>.<T>, <Zm>.<T>	sha512su0SHA512 schedule update 0SHA512SU0 <Vd>.2D, <Vn>.2Dcnt
Count bitsCNT <Wd>, <Wn>CNT <Xd>, <Xn>CNT <Vd>.<T>, <Vn>.<T>CNT <Zd>.<T>, <Pg>/M, <Zn>.<T>incpxCounts the number of true elements in the source predicate and then uses the result to increment the scalar destination.INCP <Xdn>, <Pm>.<T>INCP <Zdn>.<T>, <Pm>.<T>ldap1JLoad-acquire RCpc one single-element structure to one lane of one register%LDAP1  { <Vt>.D }[<index>], [<Xn|SP>]ldrsb%Load register signed byte (immediate)
LDRSB <Wt>, [<Xn|SP>], #<simm>LDRSB <Xt>, [<Xn|SP>], #<simm>LDRSB <Wt>, [<Xn|SP>, #<simm>]!LDRSB <Xt>, [<Xn|SP>, #<simm>]! LDRSB <Wt>, [<Xn|SP>{, #<pimm>}] LDRSB <Xt>, [<Xn|SP>{, #<pimm>}]7LDRSB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]+LDRSB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]7LDRSB <Xt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]+LDRSB <Xt>, [<Xn|SP>, <Xm>{, LSL <amount>}]bfm
Bitfield move BFM <Wd>, <Wn>, #<immr>, #<imms> BFM <Xd>, <Xn>, #<immr>, #<imms>wfetWait for event with timeout	WFET <Xt>rax1Rotate and exclusive-ORRAX1 <Vd>.2D, <Vn>.2D, <Vm>.2DRAX1 <Zd>.D, <Zn>.D, <Zm>.DudfPermanently undefinedUDF  # <imm>gcspushmGCSPUSHM -- A64Guarded Control Stack push
GCSPUSHM <Xt>SYS   #3, C7, C7, #0, <Xt>	cpyfpwtwn>Memory copy forward-only, writes unprivileged and non-temporal#CPYFPWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFEWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!fminv$Floating-point minimum across vectorFMINV <V><d>, <Vn>.<T>FMINV  S <d>, <Vn>.4SFMINV <V><d>, <Pg>, <Zn>.<T>setgpt)Memory set with tag setting, unprivilegedSETGPT  [ <Xd>]!, <Xn>!, <Xs>SETGMT  [ <Xd>]!, <Xn>!, <Xs>SETGET  [ <Xd>]!, <Xn>!, <Xs>umops5This instruction works with a 32-bit element ZA tile..UMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.UMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.UMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H
retaasppcrTReturn from subroutine, with enhanced pointer authentication return using a registerRETAASPPCR <Xm>RETABSPPCR <Xm>sttrb"Store register byte (unprivileged) STTRB <Wt>, [<Xn|SP>{, #<simm>}]sm4ekeySM4 key!SM4EKEY <Vd>.4S, <Vn>.4S, <Vm>.4SSM4EKEY <Zd>.S, <Zn>.S, <Zm>.Scpypn*Memory copy, reads and writes non-temporalCPYPN  [ <Xd>]!, [<Xs>]!, <Xn>!CPYMN  [ <Xd>]!, [<Xs>]!, <Xn>!CPYEN  [ <Xd>]!, [<Xs>]!, <Xn>!uqrshl2Unsigned saturating rounding shift left (register)UQRSHL <V><d>, <V><n>, <V><m>#UQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-UQRSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>addspl�Add the Streaming SVE predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDSPL <Xd|SP>, <Xn|SP>, #<imm>ldapurb*Load-acquire RCpc register byte (unscaled)"LDAPURB <Wt>, [<Xn|SP>{, #<simm>}]rcwcas6Read check write compare and swap doubleword in memoryRCWCAS <Xs>, <Xt>, [<Xn|SP>]RCWCASA <Xs>, <Xt>, [<Xn|SP>]RCWCASAL <Xs>, <Xt>, [<Xn|SP>]RCWCASL <Xs>, <Xt>, [<Xn|SP>]urshl'Unsigned rounding shift left (register)URSHL  D <d>, D<n>, D<m>"URSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>DURSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>DURSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>SURSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SURSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },URSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sqrshrunASigned saturating rounded shift right unsigned narrow (immediate)#SQRSHRUN <Vb><d>, <Va><n>, #<shift>,SQRSHRUN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>.SQRSHRUN <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>6SQRSHRUN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>cpyfpwn-Memory copy forward-only, writes non-temporal!CPYFPWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFEWN  [ <Xd>]!, [<Xs>]!, <Xn>!ldsminah+Atomic signed minimum on halfword in memoryLDSMINAH <Ws>, <Wt>, [<Xn|SP>]LDSMINALH <Ws>, <Wt>, [<Xn|SP>]LDSMINH <Ws>, <Wt>, [<Xn|SP>]LDSMINLH <Ws>, <Wt>, [<Xn|SP>]movt�Move 8 bytes to a general-purpose register from the ZT0 register at the byte offset specified by the immediate index. This instruction is UNDEFINED in Non-debug state.MOVT <Xt>, ZT0[<offs>]MOVT    ZT0[ <offs>], <Xt>$MOVT    ZT0 {[<offs>, MUL VL]}, <Zt>	pacibsppc;Pointer Authentication Code for return address, using key B
PACIBSPPC 	sqrdcmlah��Multiply without saturation the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of the integral numbers in the first source vector by the corresponding complex number in the second source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation.0SQRDCMLAH <Zda>.<T>, <Zn>.<T>, <Zm>.<T>, <const>1SQRDCMLAH <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>1SQRDCMLAH <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>stlrStore-release registerSTLR <Wt>, [<Xn|SP>{, #0}]STLR <Xt>, [<Xn|SP>{, #0}]STLR <Wt>, [<Xn|SP>, #-4]!STLR <Xt>, [<Xn|SP>, #-8]!eorBitwise exclusive-OR (vector)	 EOR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>EOR <Wd|WSP>, <Wn>, #<imm>EOR <Xd|SP>, <Xn>, #<imm>)EOR <Wd>, <Wn>, <Wm>{, <shift> #<amount>})EOR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*EOR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"EOR <Zdn>.<T>, <Zdn>.<T>, #<const>EOR <Zd>.D, <Zn>.D, <Zm>.Duqdecw�*Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECW <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECW <Xdn>{, <pattern>{, MUL #<imm>}})UQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}ursqrte(Unsigned reciprocal square root estimateURSQRTE <Vd>.<T>, <Vn>.<T>URSQRTE <Zd>.S, <Pg>/M, <Zn>.StrcitTRCIT -- A64Trace instrumentation
TRCIT <Xt>SYS   #3, C7, C2, #7, <Xt>bmopswThis instruction works with 32-bit element ZA tile. This instruction generates an outer product of the first source SVL.BMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.ScpyfptrnKMemory copy forward-only, reads and writes unprivileged, reads non-temporal"CPYFPTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFETRN  [ <Xd>]!, [<Xs>]!, <Xn>!sabdlt�!Compute the absolute difference between odd-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in overlapping double-width elements of the destination vector. This instruction is unpredicated.%SABDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>stgStore Allocation TagSTG <Xt|SP>, [<Xn|SP>], #<simm> STG <Xt|SP>, [<Xn|SP>, #<simm>]!!STG <Xt|SP>, [<Xn|SP>{, #<simm>}]uqshl*Unsigned saturating shift left (immediate)UQSHL <V><d>, <V><n>, #<shift>"UQSHL <Vd>.<T>, <Vn>.<T>, #<shift>UQSHL <V><d>, <V><n>, <V><m>"UQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>,UQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>bfmaxnm�Determine the maximum number value of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.<BFMAXNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H<BFMAXNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HIBFMAXNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }IBFMAXNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }(BFMAXNM <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HcnegCNEG -- A64Conditional negateCNEG <Wd>, <Wn>, <invcond> CSNEG   <Wd>, <Wn>, <Wm>, <cond>CNEG <Xd>, <Xn>, <invcond> CSNEG   <Xd>, <Xn>, <Xm>, <cond>rcwswp*Read check write swap doubleword in memoryRCWSWP <Xs>, <Xt>, [<Xn|SP>]RCWSWPA <Xs>, <Xt>, [<Xn|SP>]RCWSWPAL <Xs>, <Xt>, [<Xn|SP>]RCWSWPL <Xs>, <Xt>, [<Xn|SP>]ld3w�1Contiguous load three-word structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]isb#Instruction synchronization barrierISB  { <option>|#<imm>}ldraa*Load register, with pointer authentication LDRAA <Xt>, [<Xn|SP>{, #<simm>}]!LDRAA <Xt>, [<Xn|SP>{, #<simm>}]! LDRAB <Xt>, [<Xn|SP>{, #<simm>}]!LDRAB <Xt>, [<Xn|SP>{, #<simm>}]!rev64/Reverse elements in 64-bit doublewords (vector)REV64 <Vd>.<T>, <Vn>.<T>REV64 -- A64
Reverse bytesREV64 <Xd>, <Xn>REV   <Xd>, <Xn>ldnf1h��Contiguous load with non-faulting behavior of unsigned halfwords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]sshrSigned shift right (immediate)SSHR  D <d>, D<n>, #<shift>!SSHR <Vd>.<T>, <Vn>.<T>, #<shift>lduminh=Atomic unsigned minimum on halfword in memory, without return4STUMINH <Ws>, [<Xn|SP>]LDUMINH  <Ws>, WZR, [<Xn|SP>]6STUMINLH <Ws>, [<Xn|SP>]LDUMINLH  <Ws>, WZR, [<Xn|SP>]frint32z;Floating-point round to 32-bit integer toward zero (vector)FRINT32Z <Vd>.<T>, <Vn>.<T>FRINT32Z <Sd>, <Sn>FRINT32Z <Dd>, <Dn>ldclr0Atomic bit clear on word or doubleword in memoryLDCLR <Ws>, <Wt>, [<Xn|SP>]LDCLRA <Ws>, <Wt>, [<Xn|SP>]LDCLRAL <Ws>, <Wt>, [<Xn|SP>]LDCLRL <Ws>, <Wt>, [<Xn|SP>]LDCLR <Xs>, <Xt>, [<Xn|SP>]LDCLRA <Xs>, <Xt>, [<Xn|SP>]LDCLRAL <Xs>, <Xt>, [<Xn|SP>]LDCLRL <Xs>, <Xt>, [<Xn|SP>]0STCLR <Ws>, [<Xn|SP>]LDCLR  <Ws>, WZR, [<Xn|SP>]2STCLRL <Ws>, [<Xn|SP>]LDCLRL  <Ws>, WZR, [<Xn|SP>]0STCLR <Xs>, [<Xn|SP>]LDCLR  <Xs>, XZR, [<Xn|SP>]2STCLRL <Xs>, [<Xn|SP>]LDCLRL  <Xs>, XZR, [<Xn|SP>]orqv�Bitwise inclusive OR of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as all zeros.ORQV <Vd>.<T>, <Pg>, <Zn>.<Tb>rcwset7Read check write atomic bit set on doubleword in memoryRCWSET <Xs>, <Xt>, [<Xn|SP>]RCWSETA <Xs>, <Xt>, [<Xn|SP>]RCWSETAL <Xs>, <Xt>, [<Xn|SP>]RCWSETL <Xs>, <Xt>, [<Xn|SP>]rcwswpp(Read check write swap quadword in memoryRCWSWPP <Xt1>, <Xt2>, [<Xn|SP>] RCWSWPPA <Xt1>, <Xt2>, [<Xn|SP>]!RCWSWPPAL <Xt1>, <Xt2>, [<Xn|SP>] RCWSWPPL <Xt1>, <Xt2>, [<Xn|SP>]uvdot��The unsigned integer vertical dot product instruction computes the vertical dot product of the corresponding two unsigned 16-bit integer values held in the two first source vectors and two unsigned 16-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product results are destructively added to the corresponding 32-bit element of the ZA single-vector groups.IUVDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IUVDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]IUVDOT   ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]ldnf1d��Contiguous load with non-faulting behavior of doublewords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]gcspopcxGCSPOPCX -- A64=Guarded Control Stack pop and compare exception return record	GCSPOPCX SYS   #0, C7, C7, #5{, <Xt>}uqshlr��Shift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2-UQSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>whilele�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, signed scalar operand is less than or equal to the second scalar operand and false thereafter up to the highest numbered element. WHILELE <Pd>.<T>, <R><n>, <R><m>#WHILELE <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELE { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>uaddlt�Add the corresponding odd-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UADDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>sqcvtn�Saturate the signed integer value in each element of the group of two source vectors to half the original source element width, and place the two-way interleaved results in the half-width destination elements."SQCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }*SQCVTN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }ldapursb1Load-acquire RCpc register signed byte (unscaled)#LDAPURSB <Wt>, [<Xn|SP>{, #<simm>}]#LDAPURSB <Xt>, [<Xn|SP>{, #<simm>}]umaxvUnsigned maximum across vectorUMAXV <V><d>, <Vn>.<T>UMAXV <V><d>, <Pg>, <Zn>.<T>st1d�Contiguous store of doublewords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]8ST1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]=ST1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]KST1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3])ST1D { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]2ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]2ST1D { <Zt>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}].ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3].ST1D { <Zt>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]2ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]/ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]0ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3](ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]CST1D { <ZAt><HV>.D[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #3}]andqv��Bitwise AND of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as all ones.ANDQV <Vd>.<T>, <Pg>, <Zn>.<Tb>ldaddabAtomic add on byte in memoryLDADDAB <Ws>, <Wt>, [<Xn|SP>]LDADDALB <Ws>, <Wt>, [<Xn|SP>]LDADDB <Ws>, <Wt>, [<Xn|SP>]LDADDLB <Ws>, <Wt>, [<Xn|SP>]ldarbLoad-acquire register byteLDARB <Wt>, [<Xn|SP>{, #0}]match��This instruction compares each active 8-bit or 16-bit character in the first source vector with all of the characters in the corresponding 128-bit segment of the second source vector. Where the first source element detects any matching characters in the second segment it places true in the corresponding element of the destination predicate, otherwise false. Inactive elements in the destination predicate register are set to zero. Sets the *MATCH <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>frint64xLFloating-point round to 64-bit integer, using current rounding mode (vector)FRINT64X <Vd>.<T>, <Vn>.<T>FRINT64X <Sd>, <Sn>FRINT64X <Dd>, <Dn>lduminab)Atomic unsigned minimum on byte in memoryLDUMINAB <Ws>, <Wt>, [<Xn|SP>]LDUMINALB <Ws>, <Wt>, [<Xn|SP>]LDUMINB <Ws>, <Wt>, [<Xn|SP>]LDUMINLB <Ws>, <Wt>, [<Xn|SP>]sqrshrnb�:Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's signed integer range -2&SQRSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>cpyprtMemory copy, reads unprivileged CPYPRT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMRT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYERT  [ <Xd>]!, [<Xs>]!, <Xn>!umlalb�Multiply the corresponding even-numbered unsigned elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLALB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]usmopa>The 8-bit integer variant works with a 32-bit element ZA tile./USMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/USMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hrcwsset@Read check write software atomic bit set on doubleword in memoryRCWSSET <Xs>, <Xt>, [<Xn|SP>]RCWSSETA <Xs>, <Xt>, [<Xn|SP>]RCWSSETAL <Xs>, <Xt>, [<Xn|SP>]RCWSSETL <Xs>, <Xt>, [<Xn|SP>]rcwscas?Read check write software compare and swap doubleword in memoryRCWSCAS <Xs>, <Xt>, [<Xn|SP>]RCWSCASA <Xs>, <Xt>, [<Xn|SP>]RCWSCASAL <Xs>, <Xt>, [<Xn|SP>]RCWSCASL <Xs>, <Xt>, [<Xn|SP>]cdot�The complex integer dot product instructions delimit the source vectors into pairs of 8-bit or 16-bit signed integer complex numbers. Within each pair, the complex numbers in the first source vector are multiplied by the corresponding complex numbers in the second source vector and the resulting wide real or wide imaginary part of the product is accumulated into a 32-bit or 64-bit destination vector element which overlaps all four of the elements that comprise a pair of complex number values in the first source vector.-CDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>, <const>,CDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>], <const>,CDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>], <const>xpacd!Strip Pointer Authentication Code
XPACD <Xd>
XPACI <Xd>XPACLRI sqabs Signed saturating absolute valueSQABS <V><d>, <V><n>SQABS <Vd>.<T>, <Vn>.<T> SQABS <Zd>.<T>, <Pg>/M, <Zn>.<T>pacdb9Pointer Authentication Code for data address, using key BPACDB <Xd>, <Xn|SP>PACDZB <Xd>ubfizUBFIZ -- A64!Unsigned bitfield insert in zeros"UBFIZ <Wd>, <Wn>, #<lsb>, #<width>3UBFM   <Wd>, <Wn>, #(-<lsb>  MOD  32), #(<width>-1)"UBFIZ <Xd>, <Xn>, #<lsb>, #<width>3UBFM   <Xd>, <Xn>, #(-<lsb>  MOD  64), #(<width>-1)fsub Floating-point subtract (vector)!FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FSUB <Hd>, <Hn>, <Hm>FSUB <Sd>, <Sn>, <Sm>FSUB <Dd>, <Dn>, <Dm>*FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!FSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>FSUB    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }8FSUB    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }>FSUB    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }8FSUB    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }fvdotb��The instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with the lower-numbered horizontal group of two 8-bit floating-point values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are scaled by 2GFVDOTB  ZA.S[ <Wv>, <offs>, VGx4], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]fmulx-Floating-point multiply extended (by element)	!FMULX <Hd>, <Hn>, <Vm>.H[<index>](FMULX <V><d>, <V><n>, <Vm>.<Ts>[<index>])FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>],FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]FMULX <Hd>, <Hn>, <Hm>FMULX <V><d>, <V><n>, <V><m>"FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FMULX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>pacmPointer authentication modifierPACM fcmle:Floating-point compare less than or equal to zero (vector)FCMLE <Hd>, <Hn>, #0.0FCMLE <V><d>, <V><n>, #0.0FCMLE <Vd>.<T>, <Vn>.<T>, #0.0FCMLE <Vd>.<T>, <Vn>.<T>, #0.0FCMLE (vectors)�hCompare active floating-point elements in the first source vector being less than or equal to corresponding elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FCMLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FCMGE    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>sqrshl0Signed saturating rounding shift left (register)SQRSHL <V><d>, <V><n>, <V><m>#SQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-SQRSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>subhnt�1Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.%SUBHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>subr�Reversed subtract active elements of the first source vector from corresponding elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.+SUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>,SUBR <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}ldnf1w��Contiguous load with non-faulting behavior of unsigned words to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]sqxtnt�Saturate the signed integer value in each source element to half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.SQXTNT <Zd>.<T>, <Zn>.<Tb>csetCSET -- A64Conditional setCSET <Wd>, <invcond>CSINC   <Wd>, WZR, WZR, <cond>CSET <Xd>, <invcond>CSINC   <Xd>, XZR, XZR, <cond>fscale(Floating-point adjust exponent by vector#FSCALE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FSCALE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>EFSCALE { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>EFSCALE { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>TFSCALE { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TFSCALE { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }-FSCALE <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>pfalse7Set all elements in the destination predicate to false.
PFALSE <Pd>.Bldeorb5Atomic exclusive-OR on byte in memory, without return2STEORB <Ws>, [<Xn|SP>]LDEORB  <Ws>, WZR, [<Xn|SP>]4STEORLB <Ws>, [<Xn|SP>]LDEORLB  <Ws>, WZR, [<Xn|SP>]ldarLoad-acquire registerLDAR <Wt>, [<Xn|SP>{, #0}]LDAR <Xt>, [<Xn|SP>{, #0}]uxthUXTH -- A64Unsigned extend halfwordUXTH <Wd>, <Wn>UBFM   <Wd>, <Wn>, #0, #15eor3Three-way exclusive-OR+EOR3 <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B%EOR3 <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dumlall�~This unsigned integer multiply-add long-long instruction multiplies each unsigned 8-bit or 16-bit element in the one, two, or four first source vectors with each unsigned 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively adds these values to the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=UMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=UMLALL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RUMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RUMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RUMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<UMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }setf8)Evaluation of 8-bit or 16-bit flag values
SETF8 <Wn>SETF16 <Wn>umsublUnsigned multiply-subtract longUMSUBL <Xd>, <Wn>, <Wm>, <Xa>uaddlb�Add the corresponding even-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UADDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>zip2�Interleave alternating elements from the lowest or highest halves of the first and second source predicates and place in elements of the destination predicate. This instruction is unpredicated.!ZIP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!ZIP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ZIP2 <Zd>.Q, <Zn>.Q, <Zm>.Q!ZIP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ZIP1 <Zd>.Q, <Zn>.Q, <Zm>.Q!ZIP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>addsvl�Add the Streaming SVE vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDSVL <Xd|SP>, <Xn|SP>, #<imm>setptMemory set, unprivilegedSETPT  [ <Xd>]!, <Xn>!, <Xs>SETMT  [ <Xd>]!, <Xn>!, <Xs>SETET  [ <Xd>]!, <Xn>!, <Xs>usubwb�Subtract the even-numbered unsigned elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$USUBWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>faddp,Floating-point add pair of elements (scalar)FADDP  H <d>, <Vn>.2HFADDP <V><d>, <Vn>.<T>"FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FADDP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ld1roh�Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]brkb�VSets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.BRKB <Pd>.B, <Pg>/<ZM>, <Pn>.Bldff1sh�ZGather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
.LDFF1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}].LDFF1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]5LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]7LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]7LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]4LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]4LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]5LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]-LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]rsubhn'Rounding subtract returning high narrow+RSUBHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>usdotBDot product with unsigned and signed integers (vector, by element)
,USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]%USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>USDOT <Zda>.S, <Zn>.B, <Zm>.B$USDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]IUSDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IUSDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@USDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@USDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BMUSDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }MUSDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }fcvtasXFloating-point convert to signed integer, rounding to nearest with ties to away (vector)
FCVTAS <Hd>, <Hn>FCVTAS <V><d>, <V><n>FCVTAS <Vd>.<T>, <Vn>.<T>FCVTAS <Vd>.<T>, <Vn>.<T>FCVTAS <Wd>, <Hn>FCVTAS <Xd>, <Hn>FCVTAS <Wd>, <Sn>FCVTAS <Xd>, <Sn>FCVTAS <Wd>, <Dn>FCVTAS <Xd>, <Dn>sbSpeculation barrierSB irgInsert random tagIRG <Xd|SP>, <Xn|SP>{, <Xm>}uaddlpUnsigned add long pairwiseUADDLP <Vd>.<Ta>, <Vn>.<Tb>sqdech�kDetermines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECH <Xdn>{, <pattern>{, MUL #<imm>}})SQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}sbfmSigned bitfield move!SBFM <Wd>, <Wn>, #<immr>, #<imms>!SBFM <Xd>, <Xn>, #<immr>, #<imms>mulMultiply (vector, by element)*MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>] MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*MUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> MUL <Zdn>.<T>, <Zdn>.<T>, #<imm> MUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>!MUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]!MUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]!MUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]
MUL -- A64MultiplyMUL <Wd>, <Wn>, <Wm>MADD   <Wd>, <Wn>, <Wm>, WZRMUL <Xd>, <Xn>, <Xm>MADD   <Xd>, <Xn>, <Xm>, XZRtlbipTLBIP -- A64TLB invalidate pair operation TLBIP <tlbip_op>{, <Xt1>, <Xt2>}1SYSP   #<op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}rcwcasp4Read check write compare and swap quadword in memory1RCWCASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]2RCWCASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]3RCWCASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]2RCWCASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]sqinch�kDetermines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCH <Xdn>{, <pattern>{, MUL #<imm>}})SQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}stxrh!Store exclusive register halfword!STXRH <Ws>, <Wt>, [<Xn|SP>{, #0}]ldxrh Load exclusive register halfwordLDXRH <Wt>, [<Xn|SP>{, #0}]swppSwap quadword in memorySWPP <Xt1>, <Xt2>, [<Xn|SP>]SWPPA <Xt1>, <Xt2>, [<Xn|SP>]SWPPAL <Xt1>, <Xt2>, [<Xn|SP>]SWPPL <Xt1>, <Xt2>, [<Xn|SP>]fmlalt�This 8-bit floating-point multiply-add long instruction widens the odd 8-bit elements in the first and second source vectors to half-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2FMLALT <Zda>.H, <Zn>.B, <Zm>.B%FMLALT <Zda>.H, <Zn>.B, <Zm>.B[<imm>]FMLALT <Zda>.S, <Zn>.H, <Zm>.H%FMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]cpypMemory copyCPYP  [ <Xd>]!, [<Xs>]!, <Xn>!CPYM  [ <Xd>]!, [<Xs>]!, <Xn>!CPYE  [ <Xd>]!, [<Xs>]!, <Xn>!pacnbibsppcPPointer Authentication Code for return address, using key B, not a branch targetPACNBIBSPPC smopa5This instruction works with a 32-bit element ZA tile..SMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.SMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.SMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hld1sw�:Gather load of signed words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LD1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]1LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]5LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]2LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]+LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ubfmUnsigned bitfield move!UBFM <Wd>, <Wn>, #<immr>, #<imms>!UBFM <Xd>, <Xn>, #<immr>, #<imms>fcvtl8Floating-point convert to higher precision long (vector)FCVTL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>!FCVTL { <Zd1>.S-<Zd2>.S }, <Zn>.Hsmops5This instruction works with a 32-bit element ZA tile..SMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.SMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.SMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hld2b�-Contiguous load two-byte structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]sqcadd��Add the real and imaginary components of the integral complex numbers from the first source vector to the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation, equivalent to multiplying the complex numbers in the second source vector by ±.SQCADD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, <const>sm3ss1SM3SS1)SM3SS1 <Vd>.4S, <Vn>.4S, <Vm>.4S, <Va>.4Sld23Load multiple 2-element structures to two registers'LD2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>].LD2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-LD2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>,LD2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>],LD2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>],LD2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>],LD2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]0LD2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #22LD2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>0LD2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #42LD2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>0LD2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #82LD2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>1LD2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #162LD2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>uabdlt�+Compute the absolute difference between the odd-numbered unsigned integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UABDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>cpyfpwtnLMemory copy forward-only, writes unprivileged, reads and writes non-temporal"CPYFPWTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMWTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFEWTN  [ <Xd>]!, [<Xs>]!, <Xn>!uxtb�Zero-extend the least-significant sub-element of each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.UXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>UXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>UXTW <Zd>.D, <Pg>/M, <Zn>.DUXTB -- A64Unsigned extend byteUXTB <Wd>, <Wn>UBFM   <Wd>, <Wn>, #0, #7ptrues�#Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to false otherwise. If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false.PTRUES <Pd>.<T>{, <pattern>}ld44Load multiple 4-element structures to four registers=LD4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]DLD4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CLD4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>>LD4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]>LD4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]>LD4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]>LD4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]BLD4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4DLD4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>BLD4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8DLD4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>CLD4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16DLD4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>CLD4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32DLD4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>cpyfptnHMemory copy forward-only, reads and writes unprivileged and non-temporal!CPYFPTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFETN  [ <Xd>]!, [<Xs>]!, <Xn>!sel�Read active elements from the two or four first source vectors and inactive elements from the two or four second source vectors and place in the corresponding elements of the two or four destination vectors.TSEL { <Zd1>.<T>-<Zd2>.<T> }, <PNg>, { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TSEL { <Zd1>.<T>-<Zd4>.<T> }, <PNg>, { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> } SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B&SEL <Zd>.<T>, <Pv>, <Zn>.<T>, <Zm>.<T>fmlalBFloating-point fused multiply-add long to accumulator (by element)+FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>],FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]%FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>&FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=FMLAL   ZA.H[ <Wv>, <offs1>:<offs2>], <Zn>.B, <Zm>.B[<index>]RFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]4FMLAL   ZA.H[ <Wv>, <offs1>:<offs2>], <Zn>.B, <Zm>.BIFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BIFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BVFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }VFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }=FMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4FMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }uqrshrn:Unsigned saturating rounded shift right narrow (immediate)"UQRSHRN <Vb><d>, <Va><n>, #<shift>+UQRSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>-UQRSHRN <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>5UQRSHRN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>bslBitwise select BSL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$BSL <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dsunpk�Unpack elements from one or two source vectors and then sign-extend them to place in elements of twice their size within the two or four destination vectors.(SUNPK { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<Tb>8SUNPK { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<Tb>-<Zn2>.<Tb> }swpabSwap byte in memorySWPAB <Ws>, <Wt>, [<Xn|SP>]SWPALB <Ws>, <Wt>, [<Xn|SP>]SWPB <Ws>, <Wt>, [<Xn|SP>]SWPLB <Ws>, <Wt>, [<Xn|SP>]bfcvtHFloating-point convert from single-precision to BFloat16 format (scalar)BFCVT <Hd>, <Sn>!BFCVT <Zd>.B, { <Zn1>.H-<Zn2>.H }!BFCVT <Zd>.H, { <Zn1>.S-<Zn2>.S }BFCVT <Zd>.H, <Pg>/M, <Zn>.Szero;The instruction zeroes two or four ZA single-vector groups.
!ZERO    ZA.D[ <Wv>, <offs>, VGx2]!ZERO    ZA.D[ <Wv>, <offs>, VGx4]$ZERO    ZA.D[ <Wv>, <offs1>:<offs2>]*ZERO    ZA.D[ <Wv>, <offs1>:<offs2>, VGx2]*ZERO    ZA.D[ <Wv>, <offs1>:<offs2>, VGx4]$ZERO    ZA.D[ <Wv>, <offs1>:<offs4>]*ZERO    ZA.D[ <Wv>, <offs1>:<offs4>, VGx2]*ZERO    ZA.D[ <Wv>, <offs1>:<offs4>, VGx4]ZERO { <mask> }ZERO { ZT0 }usmmlaEUnsigned and signed 8-bit integer matrix multiply-accumulate (vector)"USMMLA <Vd>.4S, <Vn>.16B, <Vm>.16BUSMMLA <Zda>.S, <Zn>.B, <Zm>.Bautdb&Authenticate data address, using key BAUTDB <Xd>, <Xn|SP>AUTDZB <Xd>lsl�GShift left by immediate each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. Inactive elements in the destination vector register remain unmodified.*LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>(LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D*LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> LSL <Zd>.<T>, <Zn>.<T>, #<const>LSL <Zd>.<T>, <Zn>.<T>, <Zm>.DLSL (register) -- A64Logical shift left (register)LSL <Wd>, <Wn>, <Wm>LSLV   <Wd>, <Wn>, <Wm>LSL <Xd>, <Xn>, <Xm>LSLV   <Xd>, <Xn>, <Xm>LSL (immediate) -- A64Logical shift left (immediate)LSL <Wd>, <Wn>, #<shift>6UBFM   <Wd>, <Wn>, #(-<shift>  MOD  32), #(31-<shift>)LSL <Xd>, <Xn>, #<shift>6UBFM   <Xd>, <Xn>, #(-<shift>  MOD  64), #(63-<shift>)sshllt�0Shift left by immediate each odd-numbered signed element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$SSHLLT <Zd>.<T>, <Zn>.<Tb>, #<const>ldeorab%Atomic exclusive-OR on byte in memoryLDEORAB <Ws>, <Wt>, [<Xn|SP>]LDEORALB <Ws>, <Wt>, [<Xn|SP>]LDEORB <Ws>, <Wt>, [<Xn|SP>]LDEORLB <Ws>, <Wt>, [<Xn|SP>]tbzTest bit and branch if zeroTBZ <R><t>, #<imm>, <label>st38Store multiple 3-element structures from three registers2ST3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]9ST3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8ST3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>5ST3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]5ST3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]5ST3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]5ST3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]9ST3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3;ST3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>9ST3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6;ST3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>:ST3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12;ST3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>:ST3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24;ST3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>bifBitwise insert if false BIF <Vd>.<T>, <Vn>.<T>, <Vm>.<T>bfmlslt��This BFloat16 floating-point multiply-subtract long instruction widens the odd-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLSLT <Zda>.S, <Zn>.H, <Zm>.H&BFMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]smaddlSigned multiply-add longSMADDL <Xd>, <Wn>, <Wm>, <Xa>st1b�Contiguous store of bytes from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]0ST1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>]=ST1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]1ST1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]CST1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>])ST1B { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}])ST1B { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]4ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}](ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>]/ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]/ST1B { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>](ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]9ST1B { ZA0<HV>.B[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>}]stllrbStore LORelease register byteSTLLRB <Wt>, [<Xn|SP>{, #0}]uzp1Unzip vectors (primary)!UZP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!UZP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!UZP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!UZP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>UZP1 <Zd>.Q, <Zn>.Q, <Zm>.Q!UZP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>UZP2 <Zd>.Q, <Zn>.Q, <Zm>.Qpunpkhi�Unpack elements from the lowest or highest half of the source predicate and place in elements of twice their size within the destination predicate. This instruction is unpredicated.PUNPKHI <Pd>.H, <Pn>.BPUNPKLO <Pd>.H, <Pn>.Bbfvdot��The instruction computes the sum-of-products of each vertical pair of BFloat16 values in the corresponding elements of the two first source vectors with the pair of BFloat16 values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are destructively added to the corresponding single-precision elements of the two ZA single-vector groups.IBFVDOT  ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]addvAdd across vectorADDV <V><d>, <Vn>.<T>casbCompare and swap byte in memory CASB <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASAB <Ws>, <Wt>, [<Xn|SP>{, #0}]"CASALB <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASLB <Ws>, <Wt>, [<Xn|SP>{, #0}]uqxtnb�Saturate the unsigned integer value in each source element to half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.UQXTNB <Zd>.<T>, <Zn>.<Tb>histcnt�_This instruction compares each active 32 or 64-bit element of the first source vector with all active elements with an element number less than or equal to its own in the second source vector, and places the count of matching elements in the corresponding element of the destination vector. Inactive elements in the destination vector are set to zero.,HISTCNT <Zd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>sbcs"Subtract with carry, setting flagsSBCS <Wd>, <Wn>, <Wm>SBCS <Xd>, <Xn>, <Xm>smaxvSigned maximum across vectorSMAXV <V><d>, <Vn>.<T>SMAXV <V><d>, <Pg>, <Zn>.<T>ldtrsb(Load register signed byte (unprivileged)!LDTRSB <Wt>, [<Xn|SP>{, #<simm>}]!LDTRSB <Xt>, [<Xn|SP>{, #<simm>}]fmsb�ZMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.*FMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>ld35Load multiple 3-element structures to three registers2LD3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]9LD3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8LD3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>5LD3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]5LD3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]5LD3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]5LD3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]9LD3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3;LD3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>9LD3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6;LD3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>:LD3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12;LD3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>:LD3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24;LD3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>ldaddah Atomic add on halfword in memoryLDADDAH <Ws>, <Wt>, [<Xn|SP>]LDADDALH <Ws>, <Wt>, [<Xn|SP>]LDADDH <Ws>, <Wt>, [<Xn|SP>]LDADDLH <Ws>, <Wt>, [<Xn|SP>]brkns�If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise leaves the destination and second source predicate unchanged. Sets the &BRKNS <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.Bfcvtnb}Convert each single-precision element of the group of two source vectors to 8-bit floating-point while scaling the value by 2"FCVTNB <Zd>.B, { <Zn1>.S-<Zn2>.S }mrrs>Move System register to two adjacent general-purpose registers=MRRS <Xt>, <Xt+1>, (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>)fmov&Floating-point move immediate (vector)FMOV <Vd>.<T>, #<imm>FMOV <Vd>.<T>, #<imm>FMOV <Vd>.2D, #<imm>FMOV <Hd>, <Hn>FMOV <Sd>, <Sn>FMOV <Dd>, <Dn>FMOV <Wd>, <Hn>FMOV <Xd>, <Hn>FMOV <Hd>, <Wn>FMOV <Sd>, <Wn>FMOV <Wd>, <Sn>FMOV <Hd>, <Xn>FMOV <Dd>, <Xn>FMOV <Vd>.D[1], <Xn>FMOV <Xd>, <Dn>FMOV <Xd>, <Vn>.D[1]FMOV <Hd>, #<imm>FMOV <Sd>, #<imm>FMOV <Dd>, #<imm>FMOV (zero, predicated)�Move floating-point constant +0.0 to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FMOV <Zd>.<T>, <Pg>/M, #0.0CPY  <Zd>.<T>, <Pg>/M, #0FMOV (zero, unpredicated)�Unconditionally broadcast the floating-point constant +0.0 into each element of the destination vector. This instruction is unpredicated.FMOV <Zd>.<T>, #0.0DUP  <Zd>.<T>, #0FMOV (immediate, predicated)�Move a floating-point immediate into each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FMOV <Zd>.<T>, <Pg>/M, #<const>#FCPY     <Zd>.<T>, <Pg>/M, #<const>FMOV (immediate, unpredicated)�Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is unpredicated.FMOV <Zd>.<T>, #<const>FDUP     <Zd>.<T>, #<const>strbStore register byte (immediate)STRB <Wt>, [<Xn|SP>], #<simm>STRB <Wt>, [<Xn|SP>, #<simm>]!STRB <Wt>, [<Xn|SP>{, #<pimm>}]6STRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]*STRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]umaxqv��Unsigned maximum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as zero. UMAXQV <Vd>.<T>, <Pg>, <Zn>.<Tb>cpyfprn,Memory copy forward-only, reads non-temporal!CPYFPRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFERN  [ <Xd>]!, [<Xs>]!, <Xn>!ld2h�1Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]uqaddUnsigned saturating addUQADD <V><d>, <V><n>, <V><m>"UQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-UQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"UQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>msrr>Move two adjacent general-purpose registers to System register?MSRR  ( <systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>), <Xt>, <Xt+1>fclamp�|Clamp each floating-point element in the two or four destination vectors to between the floating-point minimum value in the corresponding element of the first source vector and the floating-point maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.2FCLAMP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>2FCLAMP { <Zd1>.<T>-<Zd4>.<T> }, <Zn>.<T>, <Zm>.<T>#FCLAMP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>fcvtnsXFloating-point convert to signed integer, rounding to nearest with ties to even (vector)
FCVTNS <Hd>, <Hn>FCVTNS <V><d>, <V><n>FCVTNS <Vd>.<T>, <Vn>.<T>FCVTNS <Vd>.<T>, <Vn>.<T>FCVTNS <Wd>, <Hn>FCVTNS <Xd>, <Hn>FCVTNS <Wd>, <Sn>FCVTNS <Xd>, <Sn>FCVTNS <Wd>, <Dn>FCVTNS <Xd>, <Dn>sxtwSXTW -- A64Sign extend wordSXTW <Xd>, <Wn>SBFM   <Xd>, <Xn>, #0, #31cpypwt Memory copy, writes unprivileged CPYPWT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMWT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYEWT  [ <Xd>]!, [<Xs>]!, <Xn>!orn!Bitwise inclusive OR NOT (vector) ORN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)ORN <Wd>, <Wn>, <Wm>{, <shift> #<amount>})ORN <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"ORN <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BORN (immediate)�KBitwise inclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated."ORN <Zdn>.<T>, <Zdn>.<T>, #<const>*ORR  <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)cpyprtrn0Memory copy, reads unprivileged and non-temporal"CPYPRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYERTRN  [ <Xd>]!, [<Xs>]!, <Xn>!	sqrshrunb�AShift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2'SQRSHRUNB <Zd>.<T>, <Zn>.<Tb>, #<const>sturbStore register byte (unscaled) STURB <Wt>, [<Xn|SP>{, #<simm>}]sturh"Store register halfword (unscaled) STURH <Wt>, [<Xn|SP>{, #<simm>}]fmlall�7This 8-bit floating-point multiply-add long long instruction widens all 8-bit floating-point elements in the one, two, or four first source vectors and the indexed element of the second source vector to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2=FMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]RFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]4FMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.BIFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BIFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BVFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }VFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }fmsub/Floating-point fused multiply-subtract (scalar)FMSUB <Hd>, <Hn>, <Hm>, <Ha>FMSUB <Sd>, <Sn>, <Sm>, <Sa>FMSUB <Dd>, <Dn>, <Dm>, <Da>saddlSigned add long (vector)*SADDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>rdsvl�Multiply the Streaming SVE vector register size in bytes by an immediate in the range -32 to 31 and place the result in the 64-bit destination general-purpose register.RDSVL <Xd>, #<imm>stlxrb%Store-release exclusive register byte"STLXRB <Ws>, <Wt>, [<Xn|SP>{, #0}]fcmgt,Floating-point compare greater than (vector)FCMGT <Hd>, <Hn>, <Hm>FCMGT <V><d>, <V><n>, <V><m>"FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FCMGT <Hd>, <Hn>, #0.0FCMGT <V><d>, <V><n>, #0.0FCMGT <Vd>.<T>, <Vn>.<T>, #0.0FCMGT <Vd>.<T>, <Vn>.<T>, #0.0histseg�*This instruction compares each 8-bit byte element of the first source vector with all of the elements in the corresponding 128-bit segment of the second source vector and places the count of matching elements in the corresponding element of the destination vector. This instruction is unpredicated.HISTSEG <Zd>.B, <Zn>.B, <Zm>.Baddva��Add each element of the source vector to the corresponding active element of each vertical slice of a ZA tile. The tile elements are predicated by a pair of governing predicates. An element of a vertical slice is considered active if its corresponding element in the first governing predicate is TRUE and the element corresponding to its vertical slice number in the second governing predicate is TRUE. Inactive elements in the destination tile remain unmodified.&ADDVA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S&ADDVA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.D
autiasppcr9Authenticate return address using key A, using a registerAUTIASPPCR <Xn>eretaa-Exception return, with pointer authenticationERETAA ERETAB retReturn from subroutineRET  { <Xn>}saddlbt��Add the even-numbered signed elements of the first source vector to the odd-numbered signed elements of the second source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.&SADDLBT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>tcancelCancel current transactionTCANCEL  # <imm>fmul$Floating-point multiply (by element) FMUL <Hd>, <Hn>, <Vm>.H[<index>]'FMUL <V><d>, <V><n>, <Vm>.<Ts>[<index>](FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]+FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]!FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMUL <Hd>, <Hn>, <Hm>FMUL <Sd>, <Sn>, <Sm>FMUL <Dd>, <Dn>, <Dm>*FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!FMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>"FMUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]"FMUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]"FMUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]cpyfptwnLMemory copy forward-only, reads and writes unprivileged, writes non-temporal"CPYFPTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFETWN  [ <Xd>]!, [<Xs>]!, <Xn>!smlal-Signed multiply-add long (vector, by element)
3SMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*SMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=SMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4SMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HISMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HISMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }str-Store SIMD&amp;FP register (immediate offset)!STR <Bt>, [<Xn|SP>], #<simm>STR <Ht>, [<Xn|SP>], #<simm>STR <St>, [<Xn|SP>], #<simm>STR <Dt>, [<Xn|SP>], #<simm>STR <Qt>, [<Xn|SP>], #<simm>STR <Bt>, [<Xn|SP>, #<simm>]!STR <Ht>, [<Xn|SP>, #<simm>]!STR <St>, [<Xn|SP>, #<simm>]!STR <Dt>, [<Xn|SP>, #<simm>]!STR <Qt>, [<Xn|SP>, #<simm>]!STR <Bt>, [<Xn|SP>{, #<pimm>}]STR <Ht>, [<Xn|SP>{, #<pimm>}]STR <St>, [<Xn|SP>{, #<pimm>}]STR <Dt>, [<Xn|SP>{, #<pimm>}]STR <Qt>, [<Xn|SP>{, #<pimm>}]STR <Wt>, [<Xn|SP>], #<simm>STR <Xt>, [<Xn|SP>], #<simm>STR <Wt>, [<Xn|SP>, #<simm>]!STR <Xt>, [<Xn|SP>, #<simm>]!STR <Wt>, [<Xn|SP>{, #<pimm>}]STR <Xt>, [<Xn|SP>{, #<pimm>}]%STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]5STR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}])STR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]7STR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]%STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]7STR     ZA[ <Wv>, <offs>], [<Xn|SP>{, #<offs>, MUL VL}]STR     ZT0, [ <Xn|SP>]rmifRotate, mask insert flagsRMIF <Xn>, #<shift>, #<mask>dcps2Debug change PE state to EL2DCPS2  {# <imm>}ldnt1b�Contiguous load non-temporal of bytes to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LDNT1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]4LDNT1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]ALDNT1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LDNT1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]GLDNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]+LDNT1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]+LDNT1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]*LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]ld1rqd�Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]stxrStore exclusive register STXR <Ws>, <Wt>, [<Xn|SP>{, #0}] STXR <Ws>, <Xt>, [<Xn|SP>{, #0}]lduminb9Atomic unsigned minimum on byte in memory, without return4STUMINB <Ws>, [<Xn|SP>]LDUMINB  <Ws>, WZR, [<Xn|SP>]6STUMINLB <Ws>, [<Xn|SP>]LDUMINLB  <Ws>, WZR, [<Xn|SP>]urhaddUnsigned rounding halving add#URHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-URHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>decd�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements.'DECD <Zdn>.D{, <pattern>{, MUL #<imm>}}'DECH <Zdn>.H{, <pattern>{, MUL #<imm>}}'DECW <Zdn>.S{, <pattern>{, MUL #<imm>}}srsra6Signed rounding shift right and accumulate (immediate)SRSRA  D <d>, D<n>, #<shift>"SRSRA <Vd>.<T>, <Vn>.<T>, #<shift>#SRSRA <Zda>.<T>, <Zn>.<T>, #<const>urshr)Unsigned rounding shift right (immediate)URSHR  D <d>, D<n>, #<shift>"URSHR <Vd>.<T>, <Vn>.<T>, #<shift>,URSHR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>
autibsppcr9Authenticate return address using key B, using a registerAUTIBSPPCR <Xn>st2w�.Contiguous store two-word structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]adcsAdd with carry, setting flagsADCS <Wd>, <Wn>, <Wm>ADCS <Xd>, <Xn>, <Xm>sqincw�kDetermines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCW <Xdn>{, <pattern>{, MUL #<imm>}})SQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}ldapursw1Load-acquire RCpc register signed word (unscaled)#LDAPURSW <Xt>, [<Xn|SP>{, #<simm>}]sqrshlr��Shift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's signed integer range -2.SQRSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ctermeq�Detect termination conditions in serialized vector loops. Tests whether the comparison between the scalar source operands holds true and if not tests the state of the CTERMEQ <R><n>, <R><m>CTERMNE <R><n>, <R><m>bitBitwise insert if true BIT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>sbfxSBFX -- A64Signed bitfield extract!SBFX <Wd>, <Wn>, #<lsb>, #<width>-SBFM   <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)!SBFX <Xd>, <Xn>, #<lsb>, #<width>-SBFM   <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)sabal.Signed absolute difference and accumulate long*SABAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>fcsel*Floating-point conditional select (scalar)FCSEL <Hd>, <Hn>, <Hm>, <cond>FCSEL <Sd>, <Sn>, <Sm>, <cond>FCSEL <Dd>, <Dn>, <Dm>, <cond>ldff1sb�FGather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector..LDFF1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}].LDFF1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]-LDFF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}]-LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}]-LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]4LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]4LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]-LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]uxtlUXTL, UXTL2 -- A64Unsigned extend longUXTL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>#USHLL {2}  <Vd>.<Ta>, <Vn>.<Tb>, #0fmaxp3Floating-point maximum of pair of elements (scalar)FMAXP  H <d>, <Vn>.2HFMAXP <V><d>, <Vn>.<T>"FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FMAXP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldlarh Load LOAcquire register halfwordLDLARH <Wt>, [<Xn|SP>{, #0}]stgp*Store Allocation Tag and pair of registers$STGP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>%STGP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!&STGP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]fcvtxnt�?Convert active double-precision floating-point elements from the source vector to single-precision, rounding to Odd, and place the results in the odd-numbered 32-bit elements of the destination vector, leaving the even-numbered elements unchanged. Inactive elements in the destination vector register remain unmodified.FCVTXNT <Zd>.S, <Pg>/M, <Zn>.DaesdAES single round decryptionAESD <Vd>.16B, <Vn>.16BAESD <Zdn>.B, <Zdn>.B, <Zm>.BbfxilBFXIL -- A64&Bitfield extract and insert at low end"BFXIL <Wd>, <Wn>, #<lsb>, #<width>,BFM   <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)"BFXIL <Xd>, <Xn>, #<lsb>, #<width>,BFM   <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)ngc
NGC -- A64Negate with carryNGC <Wd>, <Wm>SBC   <Wd>, WZR, <Wm>NGC <Xd>, <Xm>SBC   <Xd>, XZR, <Xm>wfitWait for interrupt with timeout	WFIT <Xt>shaddSigned halving add"SHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>rev32)Reverse elements in 32-bit words (vector)REV32 <Vd>.<T>, <Vn>.<T>REV32 <Xd>, <Xn>addhnAdd returning high narrow*ADDHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>fmla=Floating-point fused multiply-add to accumulator (by element) FMLA <Hd>, <Hn>, <Vm>.H[<index>]'FMLA <V><d>, <V><n>, <Vm>.<Ts>[<index>](FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]+FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]!FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*FMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>#FMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]#FMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>]#FMLA <Zda>.D, <Zn>.D, <Zm>.D[<imm>]IFMLA    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IFMLA    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.S-<Zn2>.S }, <Zm>.S[<index>]IFMLA    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }, <Zm>.D[<index>]IFMLA    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]IFMLA    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.S-<Zn4>.S }, <Zm>.S[<index>]IFMLA    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }, <Zm>.D[<index>]HFMLA    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>@FMLA    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HHFMLA    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>@FMLA    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HWFMLA    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }MFMLA    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }WFMLA    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }MFMLA    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }ld4w�/Contiguous load four-word structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]	retaasppc]Return from subroutine, with enhanced pointer authentication return using an immediate offsetRETAASPPC <label>RETABSPPC <label>shlShift left (immediate)SHL  D <d>, D<n>, #<shift> SHL <Vd>.<T>, <Vn>.<T>, #<shift>uclamp�jClamp each unsigned element in the two or four destination vectors to between the unsigned minimum value in the corresponding element of the first source vector and the unsigned maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.2UCLAMP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>2UCLAMP { <Zd1>.<T>-<Zd4>.<T> }, <Zn>.<T>, <Zm>.<T>#UCLAMP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>rprfmRange prefetch memory+RPRFM  ( <rprfop>|#<imm6>), <Xm>, [<Xn|SP>]prfmPrefetch memory (immediate).PRFM  ( <prfop>|#<imm5>), [<Xn|SP>{, #<pimm>}]!PRFM  ( <prfop>|#<imm5>), <label>GPRFM  ( <prfop>|#<imm5>), [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]ldumaxah-Atomic unsigned maximum on halfword in memoryLDUMAXAH <Ws>, <Wt>, [<Xn|SP>]LDUMAXALH <Ws>, <Wt>, [<Xn|SP>]LDUMAXH <Ws>, <Wt>, [<Xn|SP>]LDUMAXLH <Ws>, <Wt>, [<Xn|SP>]cmppCMPP -- A64Compare with tagCMPP <Xn|SP>, <Xm|SP>SUBPS   XZR, <Xn|SP>, <Xm|SP>rcwclrp7Read check write atomic bit clear on quadword in memoryRCWCLRP <Xt1>, <Xt2>, [<Xn|SP>] RCWCLRPA <Xt1>, <Xt2>, [<Xn|SP>]!RCWCLRPAL <Xt1>, <Xt2>, [<Xn|SP>] RCWCLRPL <Xt1>, <Xt2>, [<Xn|SP>]ands&Bitwise AND (immediate), setting flagsANDS <Wd>, <Wn>, #<imm>ANDS <Xd>, <Xn>, #<imm>*ANDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*ANDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}#ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bbfcvtnt�0Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the results in the odd-numbered 16-bit elements of the destination vector, leaving the even-numbered elements unchanged. Inactive elements in the destination vector register remain unmodified.BFCVTNT <Zd>.H, <Pg>/M, <Zn>.Sldtrsw(Load register signed word (unprivileged)!LDTRSW <Xt>, [<Xn|SP>{, #<simm>}]cfp
CFP -- A64.Control flow prediction restriction by contextCFP  RCTX, <Xt>SYS   #3, C7, C3, #4, <Xt>ldclrah&Atomic bit clear on halfword in memoryLDCLRAH <Ws>, <Wt>, [<Xn|SP>]LDCLRALH <Ws>, <Wt>, [<Xn|SP>]LDCLRH <Ws>, <Wt>, [<Xn|SP>]LDCLRLH <Ws>, <Wt>, [<Xn|SP>]umullt�Multiply the corresponding odd-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$UMULLT <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$UMULLT <Zd>.D, <Zn>.S, <Zm>.S[<imm>]usra/Unsigned shift right and accumulate (immediate)USRA  D <d>, D<n>, #<shift>!USRA <Vd>.<T>, <Vn>.<T>, #<shift>"USRA <Zda>.<T>, <Zn>.<T>, #<const>stxrbStore exclusive register byte!STXRB <Ws>, <Wt>, [<Xn|SP>{, #0}]stgmStore Allocation Tag multipleSTGM <Xt>, [<Xn|SP>]uabal0Unsigned absolute difference and accumulate long*UABAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>uadalp)Unsigned add and accumulate long pairwiseUADALP <Vd>.<Ta>, <Vn>.<Tb>#UADALP <Zda>.<T>, <Pg>/M, <Zn>.<Tb>raddhnt�2Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant rounded half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.&RADDHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>subhnb�5Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.%SUBHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>ldrbLoad register byte (immediate)LDRB <Wt>, [<Xn|SP>], #<simm>LDRB <Wt>, [<Xn|SP>, #<simm>]!LDRB <Wt>, [<Xn|SP>{, #<pimm>}]6LDRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]*LDRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]umopa5This instruction works with a 32-bit element ZA tile..UMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.UMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.UMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hmvni Move inverted immediate (vector)'MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}'MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}%MVNI <Vd>.<T>, #<imm8>, MSL #<amount>ins1Insert vector element from another vector element,INS <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]INS <Vd>.<Ts>[<index>], <R><n>setgptn:Memory set with tag setting, unprivileged and non-temporalSETGPTN  [ <Xd>]!, <Xn>!, <Xs>SETGMTN  [ <Xd>]!, <Xn>!, <Xs>SETGETN  [ <Xd>]!, <Xn>!, <Xs>ssubltb�Subtract the even-numbered signed elements of the second source vector from the odd-numbered signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.&SSUBLTB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>ldaddh0Atomic add on halfword in memory, without return2STADDH <Ws>, [<Xn|SP>]LDADDH  <Ws>, WZR, [<Xn|SP>]4STADDLH <Ws>, [<Xn|SP>]LDADDLH  <Ws>, WZR, [<Xn|SP>]fcmge5Floating-point compare greater than or equal (vector)FCMGE <Hd>, <Hn>, <Hm>FCMGE <V><d>, <V><n>, <V><m>"FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FCMGE <Hd>, <Hn>, #0.0FCMGE <V><d>, <V><n>, #0.0FCMGE <Vd>.<T>, <Vn>.<T>, #0.0FCMGE <Vd>.<T>, <Vn>.<T>, #0.0frecpe"Floating-point reciprocal estimateFRECPE <Hd>, <Hn>FRECPE <V><d>, <V><n>FRECPE <Vd>.<T>, <Vn>.<T>FRECPE <Vd>.<T>, <Vn>.<T>FRECPE <Zd>.<T>, <Zn>.<T>subgSubtract with tag)SUBG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>uabd%Unsigned absolute difference (vector)!UABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+UABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uaddv�Unsigned add horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Narrow elements are first zero-extended to 64 bits. Inactive elements in the source vector are treated as zero.UADDV <Dd>, <Pg>, <Zn>.<T>umull+Unsigned multiply long (vector, by element)3UMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*UMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>UMULL -- A64Unsigned multiply longUMULL <Xd>, <Wn>, <Wm>UMADDL   <Xd>, <Wn>, <Wm>, XZRfvdot�yThe instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with horizontal group of two 8-bit floating-point values in the indexed 16-bit group of the corresponding 128-bit segment of the second source vector. The half-precision sum-of-products are scaled by 2IFVDOT   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IFVDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]bcBranch consistent conditionallyBC. <cond>  <label>ldaxp(Load-acquire exclusive pair of registers#LDAXP <Wt1>, <Wt2>, [<Xn|SP>{, #0}]#LDAXP <Xt1>, <Xt2>, [<Xn|SP>{, #0}]sabalt�Compute the absolute difference between odd-numbered signed elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SABALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>sabdSigned absolute difference!SABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+SABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>shsubSigned halving subtract"SHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SHSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uqincd�*Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCD <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCD <Xdn>{, <pattern>{, MUL #<imm>}})UQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}uzp2Unzip vectors (secondary)!UZP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>stz2gStore Allocation Tags, zeroing!STZ2G <Xt|SP>, [<Xn|SP>], #<simm>"STZ2G <Xt|SP>, [<Xn|SP>, #<simm>]!#STZ2G <Xt|SP>, [<Xn|SP>{, #<simm>}]cmla��Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of the integral numbers in the first source vector by the corresponding complex number in the second source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation.+CMLA <Zda>.<T>, <Zn>.<T>, <Zm>.<T>, <const>,CMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>,CMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>fmlsDFloating-point fused multiply-subtract from accumulator (by element) FMLS <Hd>, <Hn>, <Vm>.H[<index>]'FMLS <V><d>, <V><n>, <Vm>.<Ts>[<index>](FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]+FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]!FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*FMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>#FMLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]#FMLS <Zda>.S, <Zn>.S, <Zm>.S[<imm>]#FMLS <Zda>.D, <Zn>.D, <Zm>.D[<imm>]IFMLS    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IFMLS    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.S-<Zn2>.S }, <Zm>.S[<index>]IFMLS    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }, <Zm>.D[<index>]IFMLS    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]IFMLS    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.S-<Zn4>.S }, <Zm>.S[<index>]IFMLS    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }, <Zm>.D[<index>]HFMLS    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>@FMLS    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HHFMLS    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>@FMLS    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HWFMLS    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }MFMLS    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }WFMLS    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }MFMLS    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }st2gStore Allocation Tags ST2G <Xt|SP>, [<Xn|SP>], #<simm>!ST2G <Xt|SP>, [<Xn|SP>, #<simm>]!"ST2G <Xt|SP>, [<Xn|SP>{, #<simm>}]st4h�4Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]lsrvLogical shift right variableLSRV <Wd>, <Wn>, <Wm>LSRV <Xd>, <Xn>, <Xm>uhsubUnsigned halving subtract"UHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UHSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uqdecd�*Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECD <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECD <Xdn>{, <pattern>{, MUL #<imm>}})UQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}fnegFloating-point negate (vector)FNEG <Vd>.<T>, <Vn>.<T>FNEG <Vd>.<T>, <Vn>.<T>FNEG <Hd>, <Hn>FNEG <Sd>, <Sn>FNEG <Dd>, <Dn>FNEG <Zd>.<T>, <Pg>/M, <Zn>.<T>setp
Memory setSETP  [ <Xd>]!, <Xn>!, <Xs>SETM  [ <Xd>]!, <Xn>!, <Xs>SETE  [ <Xd>]!, <Xn>!, <Xs>mla0Multiply-add to accumulator (vector, by element)*MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>] MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)MLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>"MLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]"MLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>]"MLA <Zda>.D, <Zn>.D, <Zm>.D[<imm>]autia-Authenticate instruction address, using key AAUTIA <Xd>, <Xn|SP>AUTIZA <Xd>
AUTIA1716 AUTIASP AUTIAZ hvcHypervisor callHVC  # <imm>srshl%Signed rounding shift left (register)SRSHL  D <d>, D<n>, D<m>"SRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>DSRSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>DSRSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>SSRSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SSRSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },SRSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldrsh)Load register signed halfword (immediate)LDRSH <Wt>, [<Xn|SP>], #<simm>LDRSH <Xt>, [<Xn|SP>], #<simm>LDRSH <Wt>, [<Xn|SP>, #<simm>]!LDRSH <Xt>, [<Xn|SP>, #<simm>]! LDRSH <Wt>, [<Xn|SP>{, #<pimm>}] LDRSH <Xt>, [<Xn|SP>{, #<pimm>}]9LDRSH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]9LDRSH <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]rcwclr9Read check write atomic bit clear on doubleword in memoryRCWCLR <Xs>, <Xt>, [<Xn|SP>]RCWCLRA <Xs>, <Xt>, [<Xn|SP>]RCWCLRAL <Xs>, <Xt>, [<Xn|SP>]RCWCLRL <Xs>, <Xt>, [<Xn|SP>]smov6Signed move vector element to general-purpose registerSMOV <Wd>, <Vn>.<Ts>[<index>]SMOV <Xd>, <Vn>.<Ts>[<index>]stlxrh)Store-release exclusive register halfword"STLXRH <Ws>, <Wt>, [<Xn|SP>{, #0}]cnot�Logically invert the boolean value in each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.CNOT <Zd>.<T>, <Pg>/M, <Zn>.<T>sumlall�aThis signed by unsigned integer multiply-add long-long instruction multiplies each signed 8-bit element in the one, two, or four first source vectors with each unsigned 8-bit indexed element of the second source vector, widens each product to 32-bits and destructively adds these values to the corresponding 32-bit elements of the ZA quad-vector groups.=SUMLALL ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]RSUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RSUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]ISUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BISUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.Bldnt1sb�,Gather load non-temporal of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LDNT1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}],LDNT1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]ldurbLoad register byte (unscaled) LDURB <Wt>, [<Xn|SP>{, #<simm>}]extrExtract registerEXTR <Wd>, <Wn>, <Wm>, #<lsb>EXTR <Xd>, <Xn>, <Xm>, #<lsb>ldnf1b��Contiguous load with non-faulting behavior of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]uzpq2�Concatenate adjacent odd-numbered elements from the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."UZPQ2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>zipq1�Interleave alternating elements from low halves of the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."ZIPQ1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>bfmin�Determine the minimum of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.:BFMIN { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H:BFMIN { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HGBFMIN { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }GBFMIN { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }&BFMIN <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.Hfminqv�,Floating-point minimum of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as +Infinity. FMINQV <Vd>.<T>, <Pg>, <Zn>.<Tb>uqrshrnb�CShift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&UQRSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>luti2$Lookup table read with 2-bit indices	+LUTI2 <Vd>.16B, { <Vn>.16B }, <Vm>[<index>])LUTI2 <Vd>.8H, { <Vn>.8H }, <Vm>[<index>]1LUTI2 { <Zd1>.<T>-<Zd2>.<T> }, ZT0, <Zn>[<index>]2LUTI2 { <Zd1>.<T>, <Zd2>.<T> }, ZT0, <Zn>[<index>]1LUTI2 { <Zd1>.<T>-<Zd4>.<T> }, ZT0, <Zn>[<index>]HLUTI2 { <Zd1>.<T>, <Zd2>.<T>, <Zd3>.<T>, <Zd4>.<T> }, ZT0, <Zn>[<index>]"LUTI2 <Zd>.<T>, ZT0, <Zn>[<index>]'LUTI2 <Zd>.B, { <Zn>.B }, <Zm>[<index>]'LUTI2 <Zd>.H, { <Zn>.H }, <Zm>[<index>]	autibsppcBAuthenticate return address using key B, using an immediate offsetAUTIBSPPC <label>sqdmlsl>Signed saturating doubling multiply-subtract long (by element),SQDMLSL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]5SQDMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]!SQDMLSL <Va><d>, <Vb><n>, <Vb><m>,SQDMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>ld1rsh�Load a single signed halfword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 2 in the range 0 to 126..LD1RSH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}].LD1RSH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]	cpyfpwtrnAMemory copy forward-only, writes unprivileged, reads non-temporal#CPYFPWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFEWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!uqrshrnt�9Shift each unsigned integer value in the source vector elements by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&UQRSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>fdotG8-bit floating-point dot product to half-precision (vector, by element)+FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.2B[<index>]$FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>+FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>FDOT <Zda>.S, <Zn>.B, <Zm>.B#FDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]FDOT <Zda>.H, <Zn>.B, <Zm>.B#FDOT <Zda>.H, <Zn>.B, <Zm>.B[<imm>]FDOT <Zda>.S, <Zn>.H, <Zm>.H#FDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]IFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@FDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@FDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BMFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }MFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }IFDOT    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IFDOT    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@FDOT    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@FDOT    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BMFDOT    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }MFDOT    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }IFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@FDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@FDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }fmaxnmv+Floating-point maximum number across vectorFMAXNMV <V><d>, <Vn>.<T>FMAXNMV  S <d>, <Vn>.4SFMAXNMV <V><d>, <Pg>, <Zn>.<T>adrp$Form PC-relative address to 4KB pageADRP <Xd>, <label>sqshrun9Signed saturating shift right unsigned narrow (immediate)"SQSHRUN <Vb><d>, <Va><n>, #<shift>+SQSHRUN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>st3q�6Contiguous store three-quadword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #4]stnt1b�Contiguous store non-temporal of bytes from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]2STNT1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]2STNT1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>]?STNT1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]3STNT1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]ESTNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>])STNT1B { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}])STNT1B { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}](STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>, <Xm>]stp#Store pair of SIMD&amp;FP registers#STP <St1>, <St2>, [<Xn|SP>], #<imm>#STP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>#STP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>$STP <St1>, <St2>, [<Xn|SP>, #<imm>]!$STP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!$STP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!%STP <St1>, <St2>, [<Xn|SP>{, #<imm>}]%STP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]%STP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]#STP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>#STP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>$STP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!$STP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!%STP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]%STP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]suqadd.Signed saturating accumulate of unsigned valueSUQADD <V><d>, <V><n>SUQADD <Vd>.<T>, <Vn>.<T>-SUQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>rcwssetp>Read check write software atomic bit set on quadword in memory RCWSSETP <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSETPA <Xt1>, <Xt2>, [<Xn|SP>]"RCWSSETPAL <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSETPL <Xt1>, <Xt2>, [<Xn|SP>]ldclrab"Atomic bit clear on byte in memoryLDCLRAB <Ws>, <Wt>, [<Xn|SP>]LDCLRALB <Ws>, <Wt>, [<Xn|SP>]LDCLRB <Ws>, <Wt>, [<Xn|SP>]LDCLRLB <Ws>, <Wt>, [<Xn|SP>]stzgm&Store Allocation Tag and zero multipleSTZGM <Xt>, [<Xn|SP>]uminqv�)Unsigned minimum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the maximum unsigned integer for the element size. UMINQV <Vd>.<T>, <Pg>, <Zn>.<Tb>axflagBConvert floating-point condition flags from Arm to external formatAXFLAG clz Count leading zero bits (vector)CLZ <Vd>.<T>, <Vn>.<T>CLZ <Wd>, <Wn>CLZ <Xd>, <Xn>CLZ <Zd>.<T>, <Pg>/M, <Zn>.<T>clrbhbClear branch historyCLRBHB rshrn'Rounding shift right narrow (immediate))RSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>ldsmin5Atomic signed minimum on word or doubleword in memoryLDSMIN <Ws>, <Wt>, [<Xn|SP>]LDSMINA <Ws>, <Wt>, [<Xn|SP>]LDSMINAL <Ws>, <Wt>, [<Xn|SP>]LDSMINL <Ws>, <Wt>, [<Xn|SP>]LDSMIN <Xs>, <Xt>, [<Xn|SP>]LDSMINA <Xs>, <Xt>, [<Xn|SP>]LDSMINAL <Xs>, <Xt>, [<Xn|SP>]LDSMINL <Xs>, <Xt>, [<Xn|SP>]2STSMIN <Ws>, [<Xn|SP>]LDSMIN  <Ws>, WZR, [<Xn|SP>]4STSMINL <Ws>, [<Xn|SP>]LDSMINL  <Ws>, WZR, [<Xn|SP>]2STSMIN <Xs>, [<Xn|SP>]LDSMIN  <Xs>, XZR, [<Xn|SP>]4STSMINL <Xs>, [<Xn|SP>]LDSMINL  <Xs>, XZR, [<Xn|SP>]scvtf5Signed fixed-point convert to floating-point (vector)SCVTF <V><d>, <V><n>, #<fbits>"SCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>SCVTF <Hd>, <Hn>SCVTF <V><d>, <V><n>SCVTF <Vd>.<T>, <Vn>.<T>SCVTF <Vd>.<T>, <Vn>.<T>SCVTF <Hd>, <Wn>, #<fbits>SCVTF <Hd>, <Xn>, #<fbits>SCVTF <Sd>, <Wn>, #<fbits>SCVTF <Sd>, <Xn>, #<fbits>SCVTF <Dd>, <Wn>, #<fbits>SCVTF <Dd>, <Xn>, #<fbits>SCVTF <Hd>, <Wn>SCVTF <Sd>, <Wn>SCVTF <Dd>, <Wn>SCVTF <Hd>, <Xn>SCVTF <Sd>, <Xn>SCVTF <Dd>, <Xn>.SCVTF { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }.SCVTF { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }SCVTF <Zd>.H, <Pg>/M, <Zn>.HSCVTF <Zd>.H, <Pg>/M, <Zn>.SSCVTF <Zd>.S, <Pg>/M, <Zn>.SSCVTF <Zd>.D, <Pg>/M, <Zn>.SSCVTF <Zd>.H, <Pg>/M, <Zn>.DSCVTF <Zd>.S, <Pg>/M, <Zn>.DSCVTF <Zd>.D, <Pg>/M, <Zn>.Dumlal/Unsigned multiply-add long (vector, by element)
3UMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*UMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=UMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4UMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }whilewrnThis instruction checks two addresses for a conflict or overlap between address ranges of the form [addr,addr+WHILEWR <Pd>.<T>, <Xn>, <Xm>smullt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$SMULLT <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$SMULLT <Zd>.D, <Zn>.S, <Zm>.S[<imm>]ldnt1sh�0Gather load non-temporal of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LDNT1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}],LDNT1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]ldursb$Load register signed byte (unscaled)!LDURSB <Wt>, [<Xn|SP>{, #<simm>}]!LDURSB <Xt>, [<Xn|SP>{, #<simm>}]fcmeq%Floating-point compare equal (vector)FCMEQ <Hd>, <Hn>, <Hm>FCMEQ <V><d>, <V><n>, <V><m>"FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FCMEQ <Hd>, <Hn>, #0.0FCMEQ <V><d>, <V><n>, #0.0FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0&FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0*FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMUO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>ld1rqb�Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]*LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]ldarhLoad-acquire register halfwordLDARH <Wt>, [<Xn|SP>{, #0}]fmlslb��This half-precision floating-point multiply-subtract long instruction widens the even-numbered half-precision elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding half-precision elements in the source vectors. This instruction is unpredicated.FMLSLB <Zda>.S, <Zn>.H, <Zm>.H%FMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]sqsubSigned saturating subtractSQSUB <V><d>, <V><n>, <V><m>"SQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SQSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-SQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"SQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>sminvSigned minimum across vectorSMINV <V><d>, <Vn>.<T>SMINV <V><d>, <Pg>, <Zn>.<T>dcps1Debug change PE state to EL1DCPS1  {# <imm>}cpypt*Memory copy, reads and writes unprivilegedCPYPT  [ <Xd>]!, [<Xs>]!, <Xn>!CPYMT  [ <Xd>]!, [<Xs>]!, <Xn>!CPYET  [ <Xd>]!, [<Xs>]!, <Xn>!sqshlu1Signed saturating shift left unsigned (immediate)SQSHLU <V><d>, <V><n>, #<shift>#SQSHLU <Vd>.<T>, <Vn>.<T>, #<shift>-SQSHLU <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>umullb�Multiply the corresponding even-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$UMULLB <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$UMULLB <Zd>.D, <Zn>.S, <Zm>.S[<imm>]stxp!Store exclusive pair of registers(STXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{, #0}](STXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{, #0}]brb
BRB -- A64Branch record bufferBRB <brb_op> SYS   #1, C7, C2, #<op2>{, <Xt>}sqshrunb�CShift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&SQSHRUNB <Zd>.<T>, <Zn>.<Tb>, #<const>whilelo�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, unsigned scalar operand is lower than the second scalar operand and false thereafter up to the highest numbered element. WHILELO <Pd>.<T>, <R><n>, <R><m>#WHILELO <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELO { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>prfw�Gather prefetch of words from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive addresses are not prefetched from memory.&PRFW <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFW <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]+PRFW <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #2]/PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]/PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]-PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2]bsl2n�CSelects bits from the first source vector where the corresponding bit in the third source vector is '1', and from the inverted second source vector where the corresponding bit in the third source vector is '0'. The result is placed destructively in the destination and first source vector. This instruction is unpredicated.&BSL2N <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.DsubhnSubtract returning high narrow*SUBHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>fadda�PFloating-point add a SIMD&amp;FP scalar source and all active lanes of the vector source and place the result destructively in the SIMD&amp;FP scalar source register. Vector elements are processed strictly in order from low to high, with the scalar source providing the initial value. Inactive elements in the source vector are ignored.&FADDA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>subpSubtract pointerSUBP <Xd>, <Xn|SP>, <Xm|SP>ldxp Load exclusive pair of registers"LDXP <Wt1>, <Wt2>, [<Xn|SP>{, #0}]"LDXP <Xt1>, <Xt2>, [<Xn|SP>{, #0}]cbzCompare and branch on zeroCBZ <Wt>, <label>CBZ <Xt>, <label>movaz�The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size. The tile slices are zeroed after moving their contents to the destination vectors.;MOVAZ { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>]=MOVAZ { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>]=MOVAZ { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>]=MOVAZ { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>];MOVAZ { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>]=MOVAZ { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>]=MOVAZ { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>]=MOVAZ { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]5MOVAZ { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]5MOVAZ { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}]%MOVAZ <Zd>.B, ZA0<HV>.B[<Ws>, <offs>]'MOVAZ <Zd>.H, <ZAn><HV>.H[<Ws>, <offs>]'MOVAZ <Zd>.S, <ZAn><HV>.S[<Ws>, <offs>]'MOVAZ <Zd>.D, <ZAn><HV>.D[<Ws>, <offs>]'MOVAZ <Zd>.Q, <ZAn><HV>.Q[<Ws>, <offs>]movsMOVS (predicated)�Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the MOVS <Pd>.B, <Pg>/Z, <Pn>.B$ANDS  <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.BMOVS (unpredicated)Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated. Sets the MOVS <Pd>.B, <Pn>.B$ORRS  <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.Bcmlt&Compare signed less than zero (vector)CMLT  D <d>, D<n>, #0CMLT <Vd>.<T>, <Vn>.<T>, #0maddptMultiply-add checked pointerMADDPT <Xd>, <Xn>, <Xm>, <Xa>ldapurh.Load-acquire RCpc register halfword (unscaled)"LDAPURH <Wt>, [<Xn|SP>{, #<simm>}]decb�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination.%DECB <Xdn>{, <pattern>{, MUL #<imm>}}%DECD <Xdn>{, <pattern>{, MUL #<imm>}}%DECH <Xdn>{, <pattern>{, MUL #<imm>}}%DECW <Xdn>{, <pattern>{, MUL #<imm>}}bfmlal?BFloat16 floating-point widening multiply-add long (by element)
.BFMLAL <bt>  <Vd>.4S, <Vn>.8H, <Vm>.H[<index>]&BFMLAL <bt>  <Vd>.4S, <Vn>.8H, <Vm>.8H=BFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4BFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }ccmpConditional compare (immediate)"CCMP <Wn>, #<imm>, #<nzcv>, <cond>"CCMP <Xn>, #<imm>, #<nzcv>, <cond> CCMP <Wn>, <Wm>, #<nzcv>, <cond> CCMP <Xn>, <Xm>, #<nzcv>, <cond>fabs&Floating-point absolute value (vector)FABS <Vd>.<T>, <Vn>.<T>FABS <Vd>.<T>, <Vn>.<T>FABS <Hd>, <Hn>FABS <Sd>, <Sn>FABS <Dd>, <Dn>FABS <Zd>.<T>, <Pg>/M, <Zn>.<T>fminp3Floating-point minimum of pair of elements (scalar)FMINP  H <d>, <Vn>.2HFMINP <V><d>, <Vn>.<T>"FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FMINP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldsmax5Atomic signed maximum on word or doubleword in memoryLDSMAX <Ws>, <Wt>, [<Xn|SP>]LDSMAXA <Ws>, <Wt>, [<Xn|SP>]LDSMAXAL <Ws>, <Wt>, [<Xn|SP>]LDSMAXL <Ws>, <Wt>, [<Xn|SP>]LDSMAX <Xs>, <Xt>, [<Xn|SP>]LDSMAXA <Xs>, <Xt>, [<Xn|SP>]LDSMAXAL <Xs>, <Xt>, [<Xn|SP>]LDSMAXL <Xs>, <Xt>, [<Xn|SP>]2STSMAX <Ws>, [<Xn|SP>]LDSMAX  <Ws>, WZR, [<Xn|SP>]4STSMAXL <Ws>, [<Xn|SP>]LDSMAXL  <Ws>, WZR, [<Xn|SP>]2STSMAX <Xs>, [<Xn|SP>]LDSMAX  <Xs>, XZR, [<Xn|SP>]4STSMAXL <Xs>, [<Xn|SP>]LDSMAXL  <Xs>, XZR, [<Xn|SP>]ldnp:Load pair of SIMD&amp;FP registers, with non-temporal hint&LDNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]&LDNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]&LDNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]&LDNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]&LDNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]subptSubtract checked pointer-SUBPT <Xd|SP>, <Xn|SP>, <Xm>{, LSL #<amount>}&SUBPT <Zdn>.D, <Pg>/M, <Zdn>.D, <Zm>.DSUBPT <Zd>.D, <Zn>.D, <Zm>.Dbfdot8BFloat16 floating-point dot product (vector, by element)
,BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.2H[<index>]%BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>BFDOT <Zda>.S, <Zn>.H, <Zm>.H$BFDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]IBFDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IBFDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@BFDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@BFDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMBFDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MBFDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }fmlalltt�This 8-bit floating-point multiply-add long-long instruction widens the fourth 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2 FMLALLTT <Zda>.S, <Zn>.B, <Zm>.B'FMLALLTT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]addvl�	Add the current vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDVL <Xd|SP>, <Xn|SP>, #<imm>fcmlt.Floating-point compare less than zero (vector)FCMLT <Hd>, <Hn>, #0.0FCMLT <V><d>, <V><n>, #0.0FCMLT <Vd>.<T>, <Vn>.<T>, #0.0FCMLT <Vd>.<T>, <Vn>.<T>, #0.0FCMLT (vectors)�\Compare active floating-point elements in the first source vector being less than corresponding elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FCMLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FCMGT    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>frsqrte.Floating-point reciprocal square root estimateFRSQRTE <Hd>, <Hn>FRSQRTE <V><d>, <V><n>FRSQRTE <Vd>.<T>, <Vn>.<T>FRSQRTE <Vd>.<T>, <Vn>.<T>FRSQRTE <Zd>.<T>, <Zn>.<T>uunpk�Unpack elements from one or two source vectors and then zero-extend them to place in elements of twice their size within the two or four destination vectors.(UUNPK { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<Tb>8UUNPK { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<Tb>-<Zn2>.<Tb> }sqincp�Counts the number of true elements in the source predicate and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.SQINCP <Xdn>, <Pm>.<T>, <Wdn>SQINCP <Xdn>, <Pm>.<T>SQINCP <Zdn>.<T>, <Pm>.<T>ld4b�/Contiguous load four-byte structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]DLD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]ldclrh6Atomic bit clear on halfword in memory, without return2STCLRH <Ws>, [<Xn|SP>]LDCLRH  <Ws>, WZR, [<Xn|SP>]4STCLRLH <Ws>, [<Xn|SP>]LDCLRLH  <Ws>, WZR, [<Xn|SP>]dsbData synchronization barrierDSB  ( <option>|#<imm>)DSB <option>nXSldset.Atomic bit set on word or doubleword in memoryLDSET <Ws>, <Wt>, [<Xn|SP>]LDSETA <Ws>, <Wt>, [<Xn|SP>]LDSETAL <Ws>, <Wt>, [<Xn|SP>]LDSETL <Ws>, <Wt>, [<Xn|SP>]LDSET <Xs>, <Xt>, [<Xn|SP>]LDSETA <Xs>, <Xt>, [<Xn|SP>]LDSETAL <Xs>, <Xt>, [<Xn|SP>]LDSETL <Xs>, <Xt>, [<Xn|SP>]0STSET <Ws>, [<Xn|SP>]LDSET  <Ws>, WZR, [<Xn|SP>]2STSETL <Ws>, [<Xn|SP>]LDSETL  <Ws>, WZR, [<Xn|SP>]0STSET <Xs>, [<Xn|SP>]LDSET  <Xs>, XZR, [<Xn|SP>]2STSETL <Xs>, [<Xn|SP>]LDSETL  <Xs>, XZR, [<Xn|SP>]umlslb�Multiply the corresponding even-numbered unsigned elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLSLB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLSLB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]fcmla7Floating-point complex multiply accumulate (by element)7FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>], #<rotate>-FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>4FCMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>, <const>-FCMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>-FCMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>ldff1b�HGather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.	-LDFF1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]-LDFF1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}],LDFF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, <Xm>}],LDFF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}],LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}],LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]3LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>],LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]sabalb�!Compute the absolute difference between even-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SABALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>hintHint instruction
HINT  # <imm>	sqdmlslbt�Multiply then double the corresponding even-numbered signed elements of the first and odd-numbered signed elements of the second source vector. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2)SQDMLSLBT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>ldsmaxh;Atomic signed maximum on halfword in memory, without return4STSMAXH <Ws>, [<Xn|SP>]LDSMAXH  <Ws>, WZR, [<Xn|SP>]6STSMAXLH <Ws>, [<Xn|SP>]LDSMAXLH  <Ws>, WZR, [<Xn|SP>]smlall�xThis signed integer multiply-add long-long instruction multiplies each signed 8-bit or 16-bit element in the one, two, or four first source vectors with each signed 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively adds these values to the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=SMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=SMLALL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RSMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RSMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RSMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<SMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }stllrStore LORelease registerSTLLR <Wt>, [<Xn|SP>{, #0}]STLLR <Xt>, [<Xn|SP>{, #0}]prfd�Gather prefetch of doublewords from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive addresses are not prefetched from memory.&PRFD <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFD <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]+PRFD <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #3]/PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #3]/PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]-PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3]facgt��Compare active absolute values of floating-point elements in the first source vector with corresponding absolute values of elements in the second source vector, and place the boolean results of the specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FACGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FACGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>FACGT <Hd>, <Hn>, <Hm>FACGT <V><d>, <V><n>, <V><m>"FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>casp7Compare and swap pair of words or doublewords in memory4CASP <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]5CASPA <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]6CASPAL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]5CASPL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]4CASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]5CASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]6CASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]5CASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]usmops>The 8-bit integer variant works with a 32-bit element ZA tile./USMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/USMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hrevb�
Reverse the order of 8-bit bytes, 16-bit halfwords or 32-bit words within each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.REVB <Zd>.<T>, <Pg>/M, <Zn>.<T>REVH <Zd>.<T>, <Pg>/M, <Zn>.<T>REVW <Zd>.D, <Pg>/M, <Zn>.Dsqshl(Signed saturating shift left (immediate)SQSHL <V><d>, <V><n>, #<shift>"SQSHL <Vd>.<T>, <Vn>.<T>, #<shift>SQSHL <V><d>, <V><n>, <V><m>"SQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>,SQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ssbbSSBB -- A64 Speculative store bypass barrierSSBB DSB   #0ldnt1h�Contiguous load non-temporal of halfwords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]<LDNT1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]<LDNT1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]ALDNT1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]=LDNT1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]OLDNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]+LDNT1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]+LDNT1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]svcSupervisor callSVC  # <imm>orv�Bitwise inclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as zero.ORV <V><d>, <Pg>, <Zn>.<T>fdup�Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is unpredicated.FDUP <Zd>.<T>, #<const>gcspushxGCSPUSHX -- A642Guarded Control Stack push exception return record	GCSPUSHX SYS   #0, C7, C7, #4{, <Xt>}fnmsub7Floating-point negated fused multiply-subtract (scalar)FNMSUB <Hd>, <Hn>, <Hm>, <Ha>FNMSUB <Sd>, <Sn>, <Sm>, <Sa>FNMSUB <Dd>, <Dn>, <Dm>, <Da>stlur4Store-release SIMD&amp;FP register (unscaled offset) STLUR <Bt>, [<Xn|SP>{, #<simm>}] STLUR <Ht>, [<Xn|SP>{, #<simm>}] STLUR <St>, [<Xn|SP>{, #<simm>}] STLUR <Dt>, [<Xn|SP>{, #<simm>}] STLUR <Qt>, [<Xn|SP>{, #<simm>}] STLUR <Wt>, [<Xn|SP>{, #<simm>}] STLUR <Xt>, [<Xn|SP>{, #<simm>}]pnext�]An instruction used to construct a loop which iterates over all true elements in the vector select predicate register. If all elements in the first source predicate register are false it determines the first true element in the vector select predicate register, otherwise it determines the next true element in the vector select predicate register that follows the last true element in the first source predicate register. All elements of the destination predicate register are set to false, except the element corresponding to the determined vector select element, if any, which is set to true. Sets the  PNEXT <Pdn>.<T>, <Pv>, <Pdn>.<T>crc32bCRC32 checksumCRC32B <Wd>, <Wn>, <Wm>CRC32H <Wd>, <Wn>, <Wm>CRC32W <Wd>, <Wn>, <Wm>CRC32X <Wd>, <Wn>, <Xm>mls7Multiply-subtract from accumulator (vector, by element)*MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>] MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)MLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>"MLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]"MLS <Zda>.S, <Zn>.S, <Zm>.S[<imm>]"MLS <Zda>.D, <Zn>.D, <Zm>.D[<imm>]tbnzTest bit and branch if nonzeroTBNZ <R><t>, #<imm>, <label>cpyprtn>Memory copy, reads unprivileged, reads and writes non-temporal!CPYPRTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMRTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYERTN  [ <Xd>]!, [<Xs>]!, <Xn>!adclb�TAdd the even-numbered elements of the first source vector and the 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector to the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#ADCLB <Zda>.<T>, <Zn>.<T>, <Zm>.<T>fmlalbM8-bit floating-point multiply-add long to half-precision (vector, by element))FMLALB <Vd>.8H, <Vn>.16B, <Vm>.B[<index>])FMLALT <Vd>.8H, <Vn>.16B, <Vm>.B[<index>]"FMLALB <Vd>.8H, <Vn>.16B, <Vm>.16B"FMLALT <Vd>.8H, <Vn>.16B, <Vm>.16BFMLALB <Zda>.H, <Zn>.B, <Zm>.B%FMLALB <Zda>.H, <Zn>.B, <Zm>.B[<imm>]FMLALB <Zda>.S, <Zn>.H, <Zm>.H%FMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]bfadd�Add active BFloat16 elements of the second source vector to corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.&BFADD <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HBFADD <Zd>.H, <Zn>.H, <Zm>.H8BFADD   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }8BFADD   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }ldr,Load SIMD&amp;FP register (immediate offset)&LDR <Bt>, [<Xn|SP>], #<simm>LDR <Ht>, [<Xn|SP>], #<simm>LDR <St>, [<Xn|SP>], #<simm>LDR <Dt>, [<Xn|SP>], #<simm>LDR <Qt>, [<Xn|SP>], #<simm>LDR <Bt>, [<Xn|SP>, #<simm>]!LDR <Ht>, [<Xn|SP>, #<simm>]!LDR <St>, [<Xn|SP>, #<simm>]!LDR <Dt>, [<Xn|SP>, #<simm>]!LDR <Qt>, [<Xn|SP>, #<simm>]!LDR <Bt>, [<Xn|SP>{, #<pimm>}]LDR <Ht>, [<Xn|SP>{, #<pimm>}]LDR <St>, [<Xn|SP>{, #<pimm>}]LDR <Dt>, [<Xn|SP>{, #<pimm>}]LDR <Qt>, [<Xn|SP>{, #<pimm>}]LDR <Wt>, [<Xn|SP>], #<simm>LDR <Xt>, [<Xn|SP>], #<simm>LDR <Wt>, [<Xn|SP>, #<simm>]!LDR <Xt>, [<Xn|SP>, #<simm>]!LDR <Wt>, [<Xn|SP>{, #<pimm>}]LDR <Xt>, [<Xn|SP>{, #<pimm>}]LDR <St>, <label>LDR <Dt>, <label>LDR <Qt>, <label>LDR <Wt>, <label>LDR <Xt>, <label>%LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]5LDR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}])LDR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]7LDR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]%LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]7LDR     ZA[ <Wv>, <offs>], [<Xn|SP>{, #<offs>, MUL VL}]LDR     ZT0, [ <Xn|SP>]frecpsFloating-point reciprocal stepFRECPS <Hd>, <Hn>, <Hm>FRECPS <V><d>, <V><n>, <V><m>#FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FRECPS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>fmopanThe 8-bit floating-point sum of outer products and accumulate instruction works with a 16-bit element ZA tile..FMOPA <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.FMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.FMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPA <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.S.FMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.D, <Zm>.DldxrLoad exclusive registerLDXR <Wt>, [<Xn|SP>{, #0}]LDXR <Xt>, [<Xn|SP>{, #0}]cpyfpwt-Memory copy forward-only, writes unprivileged!CPYFPWT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMWT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFEWT  [ <Xd>]!, [<Xs>]!, <Xn>!sbclb�dSubtract the even-numbered elements of the first source vector and the inverted 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector from the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#SBCLB <Zda>.<T>, <Zn>.<T>, <Zm>.<T>sbfizSBFIZ -- A64Signed bitfield insert in zeros"SBFIZ <Wd>, <Wn>, #<lsb>, #<width>3SBFM   <Wd>, <Wn>, #(-<lsb>  MOD  32), #(<width>-1)"SBFIZ <Xd>, <Xn>, #<lsb>, #<width>3SBFM   <Xd>, <Xn>, #(-<lsb>  MOD  64), #(<width>-1)ld1MLoad multiple single-element structures to one, two, three, or four registersLD1  { <Vt>.<T> }, [<Xn|SP>]'LD1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]2LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]=LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]#LD1  { <Vt>.<T> }, [<Xn|SP>], <imm>"LD1  { <Vt>.<T> }, [<Xn|SP>], <Xm>.LD1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-LD1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>9LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>DLD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CLD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>#LD1  { <Vt>.B }[<index>], [<Xn|SP>]#LD1  { <Vt>.H }[<index>], [<Xn|SP>]#LD1  { <Vt>.S }[<index>], [<Xn|SP>]#LD1  { <Vt>.D }[<index>], [<Xn|SP>]'LD1  { <Vt>.B }[<index>], [<Xn|SP>], #1)LD1  { <Vt>.B }[<index>], [<Xn|SP>], <Xm>'LD1  { <Vt>.D }[<index>], [<Xn|SP>], #8)LD1  { <Vt>.D }[<index>], [<Xn|SP>], <Xm>'LD1  { <Vt>.H }[<index>], [<Xn|SP>], #2)LD1  { <Vt>.H }[<index>], [<Xn|SP>], <Xm>'LD1  { <Vt>.S }[<index>], [<Xn|SP>], #4)LD1  { <Vt>.S }[<index>], [<Xn|SP>], <Xm>sqxtun)Signed saturating extract unsigned narrowSQXTUN <Vb><d>, <Va><n> SQXTUN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>smull)Signed multiply long (vector, by element)3SMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*SMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>SMULL -- A64Signed multiply longSMULL <Xd>, <Wn>, <Wm>SMADDL   <Xd>, <Wn>, <Wm>, XZRuqsubUnsigned saturating subtractUQSUB <V><d>, <V><n>, <V><m>"UQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UQSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-UQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"UQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>saddwSigned add wide*SADDW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>fcvtnuZFloating-point convert to unsigned integer, rounding to nearest with ties to even (vector)
FCVTNU <Hd>, <Hn>FCVTNU <V><d>, <V><n>FCVTNU <Vd>.<T>, <Vn>.<T>FCVTNU <Vd>.<T>, <Vn>.<T>FCVTNU <Wd>, <Hn>FCVTNU <Xd>, <Hn>FCVTNU <Wd>, <Sn>FCVTNU <Xd>, <Sn>FCVTNU <Wd>, <Dn>FCVTNU <Xd>, <Dn>ldsetb0Atomic bit set on byte in memory, without return2STSETB <Ws>, [<Xn|SP>]LDSETB  <Ws>, WZR, [<Xn|SP>]4STSETLB <Ws>, [<Xn|SP>]LDSETLB  <Ws>, WZR, [<Xn|SP>]stlurb&Store-release register byte (unscaled)!STLURB <Wt>, [<Xn|SP>{, #<simm>}]ctzCount trailing zerosCTZ <Wd>, <Wn>CTZ <Xd>, <Xn>fmlallbt�This 8-bit floating-point multiply-add long-long instruction widens the second 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2 FMLALLBT <Zda>.S, <Zn>.B, <Zm>.B'FMLALLBT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]ssublt�Subtract the odd-numbered signed elements of the second source vector from the corresponding signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SSUBLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>psel�{If the indexed element of the second source predicate is true, place the contents of the first source predicate register into the destination predicate register, otherwise set the destination predicate to all-false. The indexed element is determined by the sum of a general-purpose index register and an immediate, modulo the number of elements. Does not set the condition flags.&PSEL <Pd>, <Pn>, <Pm>.<T>[<Wv>, <imm>]ldapur8Load-acquire RCpc SIMD&amp;FP register (unscaled offset)!LDAPUR <Bt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Ht>, [<Xn|SP>{, #<simm>}]!LDAPUR <St>, [<Xn|SP>{, #<simm>}]!LDAPUR <Dt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Qt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Wt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Xt>, [<Xn|SP>{, #<simm>}]tblTable vector lookup&TBL <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>2TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>>TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>JTBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta>$TBL <Zd>.<T>, { <Zn>.<T> }, <Zm>.<T>0TBL <Zd>.<T>, { <Zn1>.<T>, <Zn2>.<T> }, <Zm>.<T>ldaxrb$Load-acquire exclusive register byteLDAXRB <Wt>, [<Xn|SP>{, #0}]addAdd (extended register)4ADD <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}4ADD <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}})ADD <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}'ADD <Xd|SP>, <Xn|SP>, #<imm>{, <shift>})ADD <Wd>, <Wn>, <Wm>{, <shift> #<amount>})ADD <Xd>, <Xn>, <Xm>{, <shift> #<amount>}ADD  D <d>, D<n>, D<m> ADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>BADD { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>BADD { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>*ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>+ADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>} ADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>ADD     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }>ADD     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }HADD     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>HADD     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>WADD     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }WADD     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }	sha512su1SHA512 schedule update 1#SHA512SU1 <Vd>.2D, <Vn>.2D, <Vm>.2Deors�"Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #EORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bcpy�Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register are set to zero.'CPY <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}'CPY <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}CPY <Zd>.<T>, <Pg>/M, <R><n|SP>CPY <Zd>.<T>, <Pg>/M, <V><n>uqrshlr��Shift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2.UQRSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uabalb�Compute the absolute difference between even-numbered unsigned elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UABALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>sha1cSHA1 hash update (choose)SHA1C <Qd>, <Sn>, <Vm>.4Smsubpt!Multiply-subtract checked pointerMSUBPT <Xd>, <Xn>, <Xm>, <Xa>ld1rd�Load a single doubleword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 8 in the range 0 to 504.-LD1RD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]suvdot��The signed by unsigned integer vertical dot product instruction computes the vertical dot product of the corresponding signed 8-bit elements from the four first source vectors and four unsigned 8-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product result is destructively added to the corresponding 32-bit element of the ZA single-vector groups.ISUVDOT  ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]uabdlb�0Compute the absolute difference between the even-numbered unsigned integer values in elements of the second source vector and the corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UABDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>rcwsswpp1Read check write software swap quadword in memory RCWSSWPP <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSWPPA <Xt1>, <Xt2>, [<Xn|SP>]"RCWSSWPPAL <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSWPPL <Xt1>, <Xt2>, [<Xn|SP>]st64b6Single-copy atomic 64-byte store without status resultST64B <Xt>, [<Xn|SP> {, #0}]tsbTrace synchronization barrierTSB  CSYNC cospCOSP -- A649Clear other speculative prediction restriction by contextCOSP  RCTX, <Xt>SYS   #3, C7, C3, #6, <Xt>fminnmv+Floating-point minimum number across vectorFMINNMV <V><d>, <Vn>.<T>FMINNMV  S <d>, <Vn>.4SFMINNMV <V><d>, <Pg>, <Zn>.<T>bfsub�#Subtract active BFloat16 elements of the second source vector from corresponding BFloat16 elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.&BFSUB <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HBFSUB <Zd>.H, <Zn>.H, <Zm>.H8BFSUB   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }8BFSUB   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }st1q�Scatter store of quadwords from the active elements of a vector register to the memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements are not written to memory.'ST1Q { <Zt>.Q }, <Pg>, [<Zn>.D{, <Xm>}]CST1Q { <ZAt><HV>.Q[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #4}]mvn
MVN -- A64Bitwise NOT (vector)MVN <Vd>.<T>, <Vn>.<T>NOT   <Vd>.<T>, <Vn>.<T>
MVN -- A64Bitwise NOT#MVN <Wd>, <Wm>{, <shift> #<amount>}*ORN   <Wd>, WZR, <Wm>{, <shift> #<amount>}#MVN <Xd>, <Xm>{, <shift> #<amount>}*ORN   <Xd>, XZR, <Xm>{, <shift> #<amount>}rshrnb�aShift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.$RSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>uminUnsigned minimum (vector)!UMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>UMIN <Wd>, <Wn>, #<uimm>UMIN <Xd>, <Xn>, #<uimm>CUMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CUMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RUMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RUMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }UMIN <Wd>, <Wn>, <Wm>UMIN <Xd>, <Xn>, <Xm>+UMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!UMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>saba)Signed absolute difference and accumulate!SABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"SABA <Zda>.<T>, <Zn>.<T>, <Zm>.<T>ucvtf7Unsigned fixed-point convert to floating-point (vector)UCVTF <V><d>, <V><n>, #<fbits>"UCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>UCVTF <Hd>, <Hn>UCVTF <V><d>, <V><n>UCVTF <Vd>.<T>, <Vn>.<T>UCVTF <Vd>.<T>, <Vn>.<T>UCVTF <Hd>, <Wn>, #<fbits>UCVTF <Hd>, <Xn>, #<fbits>UCVTF <Sd>, <Wn>, #<fbits>UCVTF <Sd>, <Xn>, #<fbits>UCVTF <Dd>, <Wn>, #<fbits>UCVTF <Dd>, <Xn>, #<fbits>UCVTF <Hd>, <Wn>UCVTF <Sd>, <Wn>UCVTF <Dd>, <Wn>UCVTF <Hd>, <Xn>UCVTF <Sd>, <Xn>UCVTF <Dd>, <Xn>.UCVTF { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }.UCVTF { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }UCVTF <Zd>.H, <Pg>/M, <Zn>.HUCVTF <Zd>.H, <Pg>/M, <Zn>.SUCVTF <Zd>.S, <Pg>/M, <Zn>.SUCVTF <Zd>.D, <Pg>/M, <Zn>.SUCVTF <Zd>.H, <Pg>/M, <Zn>.DUCVTF <Zd>.S, <Pg>/M, <Zn>.DUCVTF <Zd>.D, <Pg>/M, <Zn>.DsmaxpSigned maximum pairwise"SMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SMAXP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>rdvl�Multiply the current vector register size in bytes by an immediate in the range -32 to 31 and place the result in the 64-bit destination general-purpose register.RDVL <Xd>, #<imm>stlxr Store-release exclusive register!STLXR <Ws>, <Wt>, [<Xn|SP>{, #0}]!STLXR <Ws>, <Xt>, [<Xn|SP>{, #0}]ttestTest transaction state
TTEST <Xt>ldlarbLoad LOAcquire register byteLDLARB <Wt>, [<Xn|SP>{, #0}]fvdott��The instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with the higher-numbered horizontal group of two 8-bit floating-point values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are scaled by 2GFVDOTT  ZA.S[ <Wv>, <offs>, VGx4], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]st3d�8Contiguous store three-doubleword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]wfeWait for eventWFE mnegMNEG -- A64Multiply-negateMNEG <Wd>, <Wn>, <Wm>MSUB   <Wd>, <Wn>, <Wm>, WZRMNEG <Xd>, <Xn>, <Xm>MSUB   <Xd>, <Xn>, <Xm>, XZRldaprLoad-acquire RCpc registerLDAPR <Wt>, [<Xn|SP>], #4LDAPR <Xt>, [<Xn|SP>], #8LDAPR <Wt>, [<Xn|SP> {, #0}]LDAPR <Xt>, [<Xn|SP> {, #0}]shrnb�cShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.#SHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>strh#Store register halfword (immediate)STRH <Wt>, [<Xn|SP>], #<simm>STRH <Wt>, [<Xn|SP>, #<simm>]!STRH <Wt>, [<Xn|SP>{, #<pimm>}]8STRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]stnt1h� Contiguous store non-temporal of halfwords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]:STNT1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]:STNT1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]?STNT1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}];STNT1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]MSTNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1])STNT1H { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}])STNT1H { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]sri"Shift right and insert (immediate)SRI  D <d>, D<n>, #<shift> SRI <Vd>.<T>, <Vn>.<T>, #<shift> SRI <Zd>.<T>, <Zn>.<T>, #<const>	cpyfprtwnAMemory copy forward-only, reads unprivileged, writes non-temporal#CPYFPRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFERTWN  [ <Xd>]!, [<Xs>]!, <Xn>!cpypwn Memory copy, writes non-temporal CPYPWN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMWN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYEWN  [ <Xd>]!, [<Xs>]!, <Xn>!zip�Place the four-way interleaved elements from the four source vectors in the corresponding elements of the four destination vectors.4ZIP { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<T>-<Zn4>.<T> },ZIP { <Zd1>.Q-<Zd4>.Q }, { <Zn1>.Q-<Zn4>.Q }/ZIP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>'ZIP { <Zd1>.Q-<Zd2>.Q }, <Zn>.Q, <Zm>.Qgcsss1
GCSSS1 -- A64$Guarded Control Stack switch stack 1GCSSS1 <Xt>SYS   #3, C7, C7, #2, <Xt>sqxtunb�Saturate the signed integer value in each source element to an unsigned integer value that is half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.SQXTUNB <Zd>.<T>, <Zn>.<Tb>setptn)Memory set, unprivileged and non-temporalSETPTN  [ <Xd>]!, <Xn>!, <Xs>SETMTN  [ <Xd>]!, <Xn>!, <Xs>SETETN  [ <Xd>]!, <Xn>!, <Xs>st26Store multiple 2-element structures from two registers'ST2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>].ST2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-ST2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>,ST2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>],ST2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>],ST2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>],ST2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]0ST2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #22ST2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>0ST2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #42ST2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>0ST2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #82ST2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>1ST2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #162ST2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>ld4h�3Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]trn1Transpose vectors (primary)!TRN1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!TRN1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!TRN2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!TRN1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>TRN1 <Zd>.Q, <Zn>.Q, <Zm>.Q!TRN2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>TRN2 <Zd>.Q, <Zn>.Q, <Zm>.Qlslr��Reversed shift left active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.+LSLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uaddlvUnsigned sum long across vectorUADDLV <V><d>, <Vn>.<T>tstTST (immediate) -- A64Test bits (immediate)TST <Wn>, #<imm>ANDS   WZR, <Wn>, #<imm>TST <Xn>, #<imm>ANDS   XZR, <Xn>, #<imm>TST (shifted register) -- A64Test (shifted register)#TST <Wn>, <Wm>{, <shift> #<amount>}+ANDS   WZR, <Wn>, <Wm>{, <shift> #<amount>}#TST <Xn>, <Xm>{, <shift> #<amount>}+ANDS   XZR, <Xn>, <Xm>{, <shift> #<amount>}ldeorah)Atomic exclusive-OR on halfword in memoryLDEORAH <Ws>, <Wt>, [<Xn|SP>]LDEORALH <Ws>, <Wt>, [<Xn|SP>]LDEORH <Ws>, <Wt>, [<Xn|SP>]LDEORLH <Ws>, <Wt>, [<Xn|SP>]ldtrLoad register (unprivileged)LDTR <Wt>, [<Xn|SP>{, #<simm>}]LDTR <Xt>, [<Xn|SP>{, #<simm>}]sqaddSigned saturating addSQADD <V><d>, <V><n>, <V><m>"SQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-SQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"SQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>orr(Bitwise inclusive OR (vector, immediate)&ORR <Vd>.<T>, #<imm8>{, LSL #<amount>}&ORR <Vd>.<T>, #<imm8>{, LSL #<amount>} ORR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>ORR <Wd|WSP>, <Wn>, #<imm>ORR <Xd|SP>, <Xn>, #<imm>)ORR <Wd>, <Wn>, <Wm>{, <shift> #<amount>})ORR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*ORR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"ORR <Zdn>.<T>, <Zdn>.<T>, #<const>ORR <Zd>.D, <Zn>.D, <Zm>.Deortb�?Interleaving exclusive OR between the odd-numbered elements of the first source vector register and the even-numbered elements of the second source vector register, placing the result in the odd-numbered elements of the destination vector, leaving the even-numbered elements unchanged. This instruction is unpredicated."EORTB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>saddlvSigned add long across vectorSADDLV <V><d>, <Vn>.<T>saddwb�Add the even-numbered signed elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SADDWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>fmopsuThe half-precision floating-point sum of outer products and subtract instruction works with a 32-bit element ZA tile..FMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPS <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.S.FMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.D, <Zm>.Dursra8Unsigned rounding shift right and accumulate (immediate)URSRA  D <d>, D<n>, #<shift>"URSRA <Vd>.<T>, <Vn>.<T>, #<shift>#URSRA <Zda>.<T>, <Zn>.<T>, #<const>cmplsCMPLS (vectors)�[Compare active unsigned integer elements in the first source vector being lower than or same as corresponding unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLS <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPHS    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>andv�Bitwise AND horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as all ones.ANDV <V><d>, <Pg>, <Zn>.<T>sxtb�Sign-extend the least-significant sub-element of each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.SXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>SXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>SXTW <Zd>.D, <Pg>/M, <Zn>.DSXTB -- A64Signed extend byteSXTB <Wd>, <Wn>SBFM   <Wd>, <Wn>, #0, #7SXTB <Xd>, <Wn>SBFM   <Xd>, <Xn>, #0, #7pacia171615@Pointer Authentication Code for instruction address, using key APACIA171615 pacib@Pointer Authentication Code for instruction address, using key BPACIB <Xd>, <Xn|SP>PACIZB <Xd>
PACIB1716 PACIBSP PACIBZ sha256hSHA256 hash update (part 1)SHA256H <Qd>, <Qn>, <Vm>.4Sst4d�6Contiguous store four-doubleword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]fcvtmsQFloating-point convert to signed integer, rounding toward minus infinity (vector)
FCVTMS <Hd>, <Hn>FCVTMS <V><d>, <V><n>FCVTMS <Vd>.<T>, <Vn>.<T>FCVTMS <Vd>.<T>, <Vn>.<T>FCVTMS <Wd>, <Hn>FCVTMS <Xd>, <Hn>FCVTMS <Wd>, <Sn>FCVTMS <Xd>, <Sn>FCVTMS <Wd>, <Dn>FCVTMS <Xd>, <Dn>sqxtunt��Saturate the signed integer value in each source element to an unsigned integer value that is half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.SQXTUNT <Zd>.<T>, <Zn>.<Tb>fcvtmuSFloating-point convert to unsigned integer, rounding toward minus infinity (vector)
FCVTMU <Hd>, <Hn>FCVTMU <V><d>, <V><n>FCVTMU <Vd>.<T>, <Vn>.<T>FCVTMU <Vd>.<T>, <Vn>.<T>FCVTMU <Wd>, <Hn>FCVTMU <Xd>, <Hn>FCVTMU <Wd>, <Sn>FCVTMU <Xd>, <Sn>FCVTMU <Wd>, <Dn>FCVTMU <Xd>, <Dn>nand�2Bitwise NAND active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.#NAND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bld1rw�Load a single unsigned word from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 4 in the range 0 to 252.-LD1RW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]ssublbt�Subtract the odd-numbered signed elements of the second source vector from the even-numbered signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.&SSUBLBT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>st1h�Contiguous store of halfwords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]8ST1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]=ST1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]KST1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1])ST1H { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}])ST1H { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]4ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]2ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]2ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]/ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]/ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]0ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1](ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]CST1H { <ZAt><HV>.H[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #1}]	sha256su0SHA256 schedule update 0SHA256SU0 <Vd>.4S, <Vn>.4Ssqdmlal9Signed saturating doubling multiply-add long (by element),SQDMLAL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]5SQDMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]!SQDMLAL <Va><d>, <Vb><n>, <Vb><m>,SQDMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>umaxUnsigned maximum (vector)!UMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>UMAX <Wd>, <Wn>, #<uimm>UMAX <Xd>, <Xn>, #<uimm>CUMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CUMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RUMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RUMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }UMAX <Wd>, <Wn>, <Wm>UMAX <Xd>, <Xn>, <Xm>+UMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!UMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>extq��For each 128-bit vector segment of the result, copy the indexed byte up to and including the last byte of the corresponding first source vector segment to the bottom of the result segment, then fill the remainder of the result segment starting from the first byte of the corresponding second source vector segment. The result segments are destructively placed in the corresponding first source vector segment. This instruction is unpredicated.%EXTQ <Zdn>.B, <Zdn>.B, <Zm>.B, #<imm>frinti�!Round to an integral floating-point value with the specified rounding option from each active floating-point element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.!FRINTI <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTX <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTA <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTN <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTZ <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTM <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTP <Zd>.<T>, <Pg>/M, <Zn>.<T>FRINTI <Vd>.<T>, <Vn>.<T>FRINTI <Vd>.<T>, <Vn>.<T>FRINTI <Hd>, <Hn>FRINTI <Sd>, <Sn>FRINTI <Dd>, <Dn>blBranch with link
BL <label>dmbData memory barrierDMB  ( <option>|#<imm>)ld3d�7Contiguous load three-doubleword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]sevlSend event localSEVL umlsl4Unsigned multiply-subtract long (vector, by element)
3UMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*UMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=UMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4UMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }cmpltCMPLT (vectors)�KCompare active signed integer elements in the first source vector being less than corresponding signed elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPGT    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>stur,Store SIMD&amp;FP register (unscaled offset)STUR <Bt>, [<Xn|SP>{, #<simm>}]STUR <Ht>, [<Xn|SP>{, #<simm>}]STUR <St>, [<Xn|SP>{, #<simm>}]STUR <Dt>, [<Xn|SP>{, #<simm>}]STUR <Qt>, [<Xn|SP>{, #<simm>}]STUR <Wt>, [<Xn|SP>{, #<simm>}]STUR <Xt>, [<Xn|SP>{, #<simm>}]ld2w�-Contiguous load two-word structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]fjcvtzsMFloating-point Javascript convert to signed fixed-point, rounding toward zeroFJCVTZS <Wd>, <Dn>ushllb�3Shift left by immediate each even-numbered unsigned element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$USHLLB <Zd>.<T>, <Zn>.<Tb>, #<const>setffr%Initialise the first-fault register (SETFFR ldaprbLoad-acquire RCpc register byteLDAPRB <Wt>, [<Xn|SP> {, #0}]usqadd.Unsigned saturating accumulate of signed valueUSQADD <V><d>, <V><n>USQADD <Vd>.<T>, <Vn>.<T>-USQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>gcssttr(Guarded Control Stack unprivileged storeGCSSTTR <Xt>, [<Xn|SP>]cas-Compare and swap word or doubleword in memoryCAS <Ws>, <Wt>, [<Xn|SP>{, #0}] CASA <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASAL <Ws>, <Wt>, [<Xn|SP>{, #0}] CASL <Ws>, <Wt>, [<Xn|SP>{, #0}]CAS <Xs>, <Xt>, [<Xn|SP>{, #0}] CASA <Xs>, <Xt>, [<Xn|SP>{, #0}]!CASAL <Xs>, <Xt>, [<Xn|SP>{, #0}] CASL <Xs>, <Xt>, [<Xn|SP>{, #0}]ldumax7Atomic unsigned maximum on word or doubleword in memoryLDUMAX <Ws>, <Wt>, [<Xn|SP>]LDUMAXA <Ws>, <Wt>, [<Xn|SP>]LDUMAXAL <Ws>, <Wt>, [<Xn|SP>]LDUMAXL <Ws>, <Wt>, [<Xn|SP>]LDUMAX <Xs>, <Xt>, [<Xn|SP>]LDUMAXA <Xs>, <Xt>, [<Xn|SP>]LDUMAXAL <Xs>, <Xt>, [<Xn|SP>]LDUMAXL <Xs>, <Xt>, [<Xn|SP>]2STUMAX <Ws>, [<Xn|SP>]LDUMAX  <Ws>, WZR, [<Xn|SP>]4STUMAXL <Ws>, [<Xn|SP>]LDUMAXL  <Ws>, WZR, [<Xn|SP>]2STUMAX <Xs>, [<Xn|SP>]LDUMAX  <Xs>, XZR, [<Xn|SP>]4STUMAXL <Xs>, [<Xn|SP>]LDUMAXL  <Xs>, XZR, [<Xn|SP>]brBranch to registerBR <Xn>st4w�0Contiguous store four-word structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]smstop
SMSTOP -- A64ADisables access to Streaming SVE mode and SME architectural stateSMSTOP  { <option>}MSR   <pstatefield>,   #0ld1d�Contiguous load of unsigned doublewords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]:LD1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]?LD1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]MLD1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]+LD1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1D { <Zt>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]0LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]0LD1D { <Zt>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]4LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]1LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]2LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3]*LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ELD1D { <ZAt><HV>.D[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #3}]ssublb�Subtract the even-numbered signed elements of the second source vector from the corresponding signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SSUBLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>usubwUnsigned subtract wide*USUBW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>fabd+Floating-point absolute difference (vector)FABD <Hd>, <Hn>, <Hm>FABD <V><d>, <V><n>, <V><m>!FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+FABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>movzMove wide with zero!MOVZ <Wd>, #<imm>{, LSL #<shift>}!MOVZ <Xd>, #<imm>{, LSL #<shift>}luti4$Lookup table read with 4-bit indices+LUTI4 <Vd>.16B, { <Vn>.16B }, <Vm>[<index>]4LUTI4 <Vd>.8H, { <Vn1>.8H, <Vn2>.8H }, <Vm>[<index>]1LUTI4 { <Zd1>.<T>-<Zd2>.<T> }, ZT0, <Zn>[<index>]2LUTI4 { <Zd1>.<T>, <Zd2>.<T> }, ZT0, <Zn>[<index>]/LUTI4 { <Zd1>.B-<Zd4>.B }, ZT0, { <Zn1>-<Zn2> }BLUTI4 { <Zd1>.B, <Zd2>.B, <Zd3>.B, <Zd4>.B }, ZT0, { <Zn1>-<Zn2> }1LUTI4 { <Zd1>.<T>-<Zd4>.<T> }, ZT0, <Zn>[<index>]@LUTI4 { <Zd1>.H, <Zd2>.H, <Zd3>.H, <Zd4>.H }, ZT0, <Zn>[<index>]"LUTI4 <Zd>.<T>, ZT0, <Zn>[<index>]'LUTI4 <Zd>.B, { <Zn>.B }, <Zm>[<index>]1LUTI4 <Zd>.H, { <Zn1>.H, <Zn2>.H }, <Zm>[<index>]'LUTI4 <Zd>.H, { <Zn>.H }, <Zm>[<index>]gcsss2
GCSSS2 -- A64$Guarded Control Stack switch stack 2GCSSS2 <Xt>SYSL   <Xt>, #3, C7, C7, #3bext�xThis instruction gathers bits in each element of the first source vector from the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector to the lowest-numbered contiguous bits of the corresponding destination element, preserving their order, and sets the remaining higher-numbered bits to zero. This instruction is unpredicated.!BEXT <Zd>.<T>, <Zn>.<T>, <Zm>.<T>sha1su1SHA1 schedule update 1SHA1SU1 <Vd>.4S, <Vn>.4Sldaprh#Load-acquire RCpc register halfwordLDAPRH <Wt>, [<Xn|SP> {, #0}]tcommitCommit current transactionTCOMMIT gcsbGuarded Control Stack barrierGCSB  DSYNC dcps3Debug change PE state to EL3DCPS3  {# <imm>}fmaxqv�,Floating-point maximum of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as -Infinity. FMAXQV <Vd>.<T>, <Pg>, <Zn>.<Tb>sqrdmlshVSigned saturating rounding doubling multiply subtract returning high half (by element)+SQRDMLSH <V><d>, <V><n>, <Vm>.<Ts>[<index>]/SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQRDMLSH <V><d>, <V><n>, <V><m>%SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>&SQRDMLSH <Zda>.<T>, <Zn>.<T>, <Zm>.<T>'SQRDMLSH <Zda>.H, <Zn>.H, <Zm>.H[<imm>]'SQRDMLSH <Zda>.S, <Zn>.S, <Zm>.S[<imm>]'SQRDMLSH <Zda>.D, <Zn>.D, <Zm>.D[<imm>]wfiWait for interruptWFI uqinch�*Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCH <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCH <Xdn>{, <pattern>{, MUL #<imm>}})UQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}sha1su0SHA1 schedule update 0!SHA1SU0 <Vd>.4S, <Vn>.4S, <Vm>.4Ssha512h2SHA512 hash update part 2SHA512H2 <Qd>, <Qn>, <Vm>.2DfcvtpsPFloating-point convert to signed integer, rounding toward plus infinity (vector)
FCVTPS <Hd>, <Hn>FCVTPS <V><d>, <V><n>FCVTPS <Vd>.<T>, <Vn>.<T>FCVTPS <Vd>.<T>, <Vn>.<T>FCVTPS <Wd>, <Hn>FCVTPS <Xd>, <Hn>FCVTPS <Wd>, <Sn>FCVTPS <Xd>, <Sn>FCVTPS <Wd>, <Dn>FCVTPS <Xd>, <Dn>ldclrb2Atomic bit clear on byte in memory, without return2STCLRB <Ws>, [<Xn|SP>]LDCLRB  <Ws>, WZR, [<Xn|SP>]4STCLRLB <Ws>, [<Xn|SP>]LDCLRLB  <Ws>, WZR, [<Xn|SP>]fsubr�0Reversed subtract from an immediate each active floating-point element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive elements in the destination vector register remain unmodified.+FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>,FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>addpAdd pair of elements (scalar)ADDP  D <d>, <Vn>.2D!ADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+ADDP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>braaz/Branch to register, with pointer authentication
BRAAZ <Xn>BRAA <Xn>, <Xm|SP>
BRABZ <Xn>BRAB <Xn>, <Xm|SP>sm4e
SM4 encodeSM4E <Vd>.4S, <Vn>.4SSM4E <Zdn>.S, <Zdn>.S, <Zm>.Sfcvtx�CConvert active double-precision floating-point elements from the source vector to single-precision, rounding to Odd, and place the results in the even-numbered 32-bit elements of the destination vector, while setting the odd-numbered elements to zero. Inactive elements in the destination vector register remain unmodified.FCVTX <Zd>.S, <Pg>/M, <Zn>.Df1cvtl78-bit floating-point convert to half-precision (vector)F1CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>F2CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>"F1CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.B"F2CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.Bsqdecb�jDetermines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECB <Xdn>{, <pattern>{, MUL #<imm>}}mov!MOV (to/from SP) -- A64Move (to/from SP)MOV <Wd|WSP>, <Wn|WSP>ADD   <Wd|WSP>, <Wn|WSP>, #0MOV <Xd|SP>, <Xn|SP>ADD   <Xd|SP>, <Xn|SP>, #0$MOV (predicate, predicated, zeroing)�Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.MOV <Pd>.B, <Pg>/Z, <Pn>.B#AND  <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.B$MOV (immediate, predicated, zeroing)�Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register are set to zero.'MOV <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>},CPY      <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}$MOV (immediate, predicated, merging)�Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.'MOV <Zd>.<T>, <Pg>/M, #<imm>{, <shift>},CPY      <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}MOV (scalar, predicated)�Move the general-purpose scalar source register to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.MOV <Zd>.<T>, <Pg>/M, <R><n|SP>$CPY      <Zd>.<T>, <Pg>/M, <R><n|SP>$MOV (SIMD&amp;FP scalar, predicated)�Move the SIMD &amp; floating-point scalar source register to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.MOV <Zd>.<T>, <Pg>/M, <V><n>!CPY      <Zd>.<T>, <Pg>/M, <V><n>MOV (scalar) -- A64Move vector element to scalarMOV <V><d>, <Vn>.<T>[<index>]DUP   <V><d>, <Vn>.<T>[<index>]MOV (immediate, unpredicated)�Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is unpredicated.MOV <Zd>.<T>, #<imm>{, <shift>}$DUP      <Zd>.<T>, #<imm>{, <shift>}MOV (scalar, unpredicated)�Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This instruction is unpredicated.MOV <Zd>.<T>, <R><n|SP>DUP      <Zd>.<T>, <R><n|SP>&MOV (SIMD&amp;FP scalar, unpredicated)Unconditionally broadcast the SIMD&amp;FP scalar into each element of the destination vector. This instruction is unpredicated.MOV <Zd>.<T>, <Zn>.<T>[<imm>]"DUP      <Zd>.<T>, <Zn>.<T>[<imm>]MOV <Zd>.<T>, <V><n>DUP  <Zd>.<T>, <Zn>.<T>[0]MOV��Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits.MOV <Zd>.<T>, #<const>DUPM     <Zd>.<T>, #<const>MOV (element) -- A64-Move vector element to another vector element,MOV <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>].INS   <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]MOV (from general) -- A641Move general-purpose register to a vector elementMOV <Vd>.<Ts>[<index>], <R><n> INS   <Vd>.<Ts>[<index>], <R><n>#MOV (tile to vector, two registers)The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.9MOV { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>]>MOVA     { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>];MOV { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>]@MOVA     { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>];MOV { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>]@MOVA     { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>];MOV { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>]@MOVA     { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>]$MOV (tile to vector, four registers)�The instruction operates on four consecutive horizontal or vertical slices within a named ZA tile of the specified element size.9MOV { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>]>MOVA     { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>];MOV { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>]@MOVA     { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>];MOV { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>]@MOVA     { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>];MOV { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]@MOVA     { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]$MOV (array to vector, two registers)8The instruction operates on two ZA single-vector groups.3MOV { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]8MOVA     { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]%MOV (array to vector, four registers)9The instruction operates on four ZA single-vector groups.3MOV { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}]8MOVA     { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}]MOV (tile to vector, single)�zThe instruction operates on individual horizontal or vertical slices within a named ZA tile of the specified element size. The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of such elements in a vector. The immediate offset is in the range 0 to the number of elements in a 128-bit vector segment minus 1.
+MOV <Zd>.B, <Pg>/M, ZA0<HV>.B[<Ws>, <offs>]0MOVA     <Zd>.B, <Pg>/M, ZA0<HV>.B[<Ws>, <offs>]-MOV <Zd>.H, <Pg>/M, <ZAn><HV>.H[<Ws>, <offs>]2MOVA     <Zd>.H, <Pg>/M, <ZAn><HV>.H[<Ws>, <offs>]-MOV <Zd>.S, <Pg>/M, <ZAn><HV>.S[<Ws>, <offs>]2MOVA     <Zd>.S, <Pg>/M, <ZAn><HV>.S[<Ws>, <offs>]-MOV <Zd>.D, <Pg>/M, <ZAn><HV>.D[<Ws>, <offs>]2MOVA     <Zd>.D, <Pg>/M, <ZAn><HV>.D[<Ws>, <offs>]-MOV <Zd>.Q, <Pg>/M, <ZAn><HV>.Q[<Ws>, <offs>]2MOVA     <Zd>.Q, <Pg>/M, <ZAn><HV>.Q[<Ws>, <offs>]#MOV (vector to tile, two registers)The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.>MOV     ZA0 <HV>.B[<Ws>, <offs1>:<offs2>], { <Zn1>.B-<Zn2>.B }>MOVA     ZA0<HV>.B[<Ws>, <offs1>:<offs2>], { <Zn1>.B-<Zn2>.B };MOV <ZAd><HV>.H[<Ws>, <offs1>:<offs2>], { <Zn1>.H-<Zn2>.H }@MOVA     <ZAd><HV>.H[<Ws>, <offs1>:<offs2>], { <Zn1>.H-<Zn2>.H };MOV <ZAd><HV>.S[<Ws>, <offs1>:<offs2>], { <Zn1>.S-<Zn2>.S }@MOVA     <ZAd><HV>.S[<Ws>, <offs1>:<offs2>], { <Zn1>.S-<Zn2>.S };MOV <ZAd><HV>.D[<Ws>, <offs1>:<offs2>], { <Zn1>.D-<Zn2>.D }@MOVA     <ZAd><HV>.D[<Ws>, <offs1>:<offs2>], { <Zn1>.D-<Zn2>.D }$MOV (vector to tile, four registers)�The instruction operates on four consecutive horizontal or vertical slices within a named ZA tile of the specified element size.>MOV     ZA0 <HV>.B[<Ws>, <offs1>:<offs4>], { <Zn1>.B-<Zn4>.B }>MOVA     ZA0<HV>.B[<Ws>, <offs1>:<offs4>], { <Zn1>.B-<Zn4>.B };MOV <ZAd><HV>.H[<Ws>, <offs1>:<offs4>], { <Zn1>.H-<Zn4>.H }@MOVA     <ZAd><HV>.H[<Ws>, <offs1>:<offs4>], { <Zn1>.H-<Zn4>.H };MOV <ZAd><HV>.S[<Ws>, <offs1>:<offs4>], { <Zn1>.S-<Zn4>.S }@MOVA     <ZAd><HV>.S[<Ws>, <offs1>:<offs4>], { <Zn1>.S-<Zn4>.S };MOV <ZAd><HV>.D[<Ws>, <offs1>:<offs4>], { <Zn1>.D-<Zn4>.D }@MOVA     <ZAd><HV>.D[<Ws>, <offs1>:<offs4>], { <Zn1>.D-<Zn4>.D }$MOV (vector to array, two registers)8The instruction operates on two ZA single-vector groups.8MOV     ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }8MOVA     ZA.D[<Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }%MOV (vector to array, four registers)9The instruction operates on four ZA single-vector groups.8MOV     ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }8MOVA     ZA.D[<Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }MOV (vector to tile, single)�zThe instruction operates on individual horizontal or vertical slices within a named ZA tile of the specified element size. The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of such elements in a vector. The immediate offset is in the range 0 to the number of elements in a 128-bit vector segment minus 1.
0MOV     ZA0 <HV>.B[<Ws>, <offs>], <Pg>/M, <Zn>.B0MOVA     ZA0<HV>.B[<Ws>, <offs>], <Pg>/M, <Zn>.B-MOV <ZAd><HV>.H[<Ws>, <offs>], <Pg>/M, <Zn>.H2MOVA     <ZAd><HV>.H[<Ws>, <offs>], <Pg>/M, <Zn>.H-MOV <ZAd><HV>.S[<Ws>, <offs>], <Pg>/M, <Zn>.S2MOVA     <ZAd><HV>.S[<Ws>, <offs>], <Pg>/M, <Zn>.S-MOV <ZAd><HV>.D[<Ws>, <offs>], <Pg>/M, <Zn>.D2MOVA     <ZAd><HV>.D[<Ws>, <offs>], <Pg>/M, <Zn>.D-MOV <ZAd><HV>.Q[<Ws>, <offs>], <Pg>/M, <Zn>.Q2MOVA     <ZAd><HV>.Q[<Ws>, <offs>], <Pg>/M, <Zn>.Q$MOV (inverted wide immediate) -- A64Move (inverted wide immediate)MOV <Wd>, #<imm>$MOVN   <Wd>, #<imm16>, LSL  #<shift>MOV <Xd>, #<imm>$MOVN   <Xd>, #<imm16>, LSL  #<shift>MOV (wide immediate) -- A64Move (wide immediate)MOV <Wd>, #<imm>$MOVZ   <Wd>, #<imm16>, LSL  #<shift>MOV <Xd>, #<imm>$MOVZ   <Xd>, #<imm16>, LSL  #<shift>MOV (vector) -- A64Move vectorMOV <Vd>.<T>, <Vn>.<T>"ORR   <Vd>.<T>, <Vn>.<T>, <Vn>.<T>MOV (bitmask immediate) -- A64Move (bitmask immediate)MOV <Wd|WSP>, #<imm>ORR   <Wd|WSP>, WZR, #<imm>MOV <Xd|SP>, #<imm>ORR   <Xd|SP>, XZR, #<imm>MOV (register) -- A64Move (register)MOV <Wd>, <Wm>ORR   <Wd>, WZR, <Wm>MOV <Xd>, <Xm>ORR   <Xd>, XZR, <Xm>MOV�Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated. Does not set the condition flags.MOV <Pd>.B, <Pn>.B#ORR  <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.BMOV (vector, unpredicated)7Move vector register. This instruction is unpredicated.MOV <Zd>.D, <Zn>.DORR  <Zd>.D, <Zn>.D, <Zn>.D$MOV (predicate, predicated, merging)�Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register remain unmodified. Does not set the condition flags.MOV <Pd>.B, <Pg>/M, <Pn>.B!SEL  <Pd>.B, <Pg>, <Pn>.B, <Pd>.BMOV (vector, predicated)�Move elements from the source vector to the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.MOV <Zd>.<T>, <Pv>/M, <Zn>.<T>'SEL  <Zd>.<T>, <Pv>, <Zn>.<T>, <Zd>.<T>MOV (to general) -- A64/Move vector element to general-purpose registerMOV <Wd>, <Vn>.S[<index>]UMOV   <Wd>, <Vn>.S[<index>]MOV <Xd>, <Vn>.D[<index>]UMOV   <Xd>, <Vn>.D[<index>]addgAdd with tag)ADDG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>ld3q�5Contiguous load three-quadword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #4]ldumin7Atomic unsigned minimum on word or doubleword in memoryLDUMIN <Ws>, <Wt>, [<Xn|SP>]LDUMINA <Ws>, <Wt>, [<Xn|SP>]LDUMINAL <Ws>, <Wt>, [<Xn|SP>]LDUMINL <Ws>, <Wt>, [<Xn|SP>]LDUMIN <Xs>, <Xt>, [<Xn|SP>]LDUMINA <Xs>, <Xt>, [<Xn|SP>]LDUMINAL <Xs>, <Xt>, [<Xn|SP>]LDUMINL <Xs>, <Xt>, [<Xn|SP>]2STUMIN <Ws>, [<Xn|SP>]LDUMIN  <Ws>, WZR, [<Xn|SP>]4STUMINL <Ws>, [<Xn|SP>]LDUMINL  <Ws>, WZR, [<Xn|SP>]2STUMIN <Xs>, [<Xn|SP>]LDUMIN  <Xs>, XZR, [<Xn|SP>]4STUMINL <Xs>, [<Xn|SP>]LDUMINL  <Xs>, XZR, [<Xn|SP>]	sha256su1SHA256 schedule update 1#SHA256SU1 <Vd>.4S, <Vn>.4S, <Vm>.4SngcsNGCS -- A64 Negate with carry, setting flagsNGCS <Wd>, <Wm>SBCS   <Wd>, WZR, <Wm>NGCS <Xd>, <Xm>SBCS   <Xd>, XZR, <Xm>ld3h�5Contiguous load three-halfword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]smsublSigned multiply-subtract longSMSUBL <Xd>, <Wn>, <Wm>, <Xa>frintp?Floating-point round to integral, toward plus infinity (vector)FRINTP <Vd>.<T>, <Vn>.<T>FRINTP <Vd>.<T>, <Vn>.<T>FRINTP <Hd>, <Hn>FRINTP <Sd>, <Sn>FRINTP <Dd>, <Dn>/FRINTP { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTP { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }pmullPolynomial multiply long*PMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>uzpq1�Concatenate adjacent even-numbered elements from the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."UZPQ1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>mad�&Multiply the corresponding active elements of the first and second source vectors and add to elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.)MAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>bics3Bitwise bit clear (shifted register), setting flags*BICS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*BICS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}#BICS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bld2rKLoad single 2-element structure and replicate to all lanes of two registers(LD2R  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]/LD2R  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>.LD2R  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>setpnMemory set, non-temporalSETPN  [ <Xd>]!, <Xn>!, <Xs>SETMN  [ <Xd>]!, <Xn>!, <Xs>SETEN  [ <Xd>]!, <Xn>!, <Xs>adrForm PC-relative addressADR <Xd>, <label>4ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>{, <mod> <amount>}]-ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW{ <amount>}]-ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW{ <amount>}]sqdmlslb�Multiply then double the corresponding even-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLSLB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLSLB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]cmge-Compare signed greater than or equal (vector)CMGE  D <d>, D<n>, D<m>!CMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>CMGE  D <d>, D<n>, #0CMGE <Vd>.<T>, <Vn>.<T>, #0ldnt1d�!Contiguous load non-temporal of doublewords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]<LDNT1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]<LDNT1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]ALDNT1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]=LDNT1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]OLDNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]+LDNT1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]eorv�Bitwise exclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as zero.EORV <V><d>, <Pg>, <Zn>.<T>sclamp�dClamp each signed element in the two or four destination vectors to between the signed minimum value in the corresponding element of the first source vector and the signed maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.2SCLAMP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>2SCLAMP { <Zd1>.<T>-<Zd4>.<T> }, <Zn>.<T>, <Zm>.<T>#SCLAMP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>shrnShift right narrow (immediate)(SHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>cpyprtwn4Memory copy, reads unprivileged, writes non-temporal"CPYPRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYERTWN  [ <Xd>]!, [<Xs>]!, <Xn>!cash#Compare and swap halfword in memory CASH <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASAH <Ws>, <Wt>, [<Xn|SP>{, #0}]"CASALH <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASLH <Ws>, <Wt>, [<Xn|SP>{, #0}]sqdmlslt�Multiply then double the corresponding odd-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLSLT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLSLT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]ld1h�Contiguous load of unsigned halfwords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]:LD1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]?LD1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]MLD1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]+LD1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]+LD1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]0LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]0LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]0LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]4LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]4LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]1LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]1LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]2LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]*LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ELD1H { <ZAt><HV>.H[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]sli!Shift left and insert (immediate)SLI  D <d>, D<n>, #<shift> SLI <Vd>.<T>, <Vn>.<T>, #<shift> SLI <Zd>.<T>, <Zn>.<T>, #<const>fccmp1Floating-point conditional quiet compare (scalar)!FCCMP <Hn>, <Hm>, #<nzcv>, <cond>!FCCMP <Sn>, <Sm>, #<nzcv>, <cond>!FCCMP <Dn>, <Dm>, #<nzcv>, <cond>shll!Shift left long (by element size)(SHLL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, #<shift>insr�Shift the destination vector left by one element, and then place a copy of the least-significant bits of the general-purpose register in element 0 of the destination vector. This instruction is unpredicated.INSR <Zdn>.<T>, <R><m>INSR <Zdn>.<T>, <V><m>madpt�Multiply with overflow check the elements of the first and second source vectors and add with pointer check to elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector.MADPT <Zdn>.D, <Zm>.D, <Za>.Dasr�ZShift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.*ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>(ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D*ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> ASR <Zd>.<T>, <Zn>.<T>, #<const>ASR <Zd>.<T>, <Zn>.<T>, <Zm>.DASR (register) -- A64!Arithmetic shift right (register)ASR <Wd>, <Wn>, <Wm>ASRV   <Wd>, <Wn>, <Wm>ASR <Xd>, <Xn>, <Xm>ASRV   <Xd>, <Xn>, <Xm>ASR (immediate) -- A64"Arithmetic shift right (immediate)ASR <Wd>, <Wn>, #<shift> SBFM   <Wd>, <Wn>, #<shift>, #31ASR <Xd>, <Xn>, #<shift> SBFM   <Xd>, <Xn>, #<shift>, #63facleFACLE��Compare active absolute values of floating-point elements in the first source vector being less than or equal to corresponding absolute values of elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FACLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FACGE    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>whilelt�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, signed scalar operand is less than the second scalar operand and false thereafter up to the highest numbered element. WHILELT <Pd>.<T>, <R><n>, <R><m>#WHILELT <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELT { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>yieldYieldYIELD andBitwise AND (vector)	 AND <Vd>.<T>, <Vn>.<T>, <Vm>.<T>AND <Wd|WSP>, <Wn>, #<imm>AND <Xd|SP>, <Xn>, #<imm>)AND <Wd>, <Wn>, <Wm>{, <shift> #<amount>})AND <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"AND <Zdn>.<T>, <Zdn>.<T>, #<const>AND <Zd>.D, <Zn>.D, <Zm>.DstlrbStore-release register byteSTLRB <Wt>, [<Xn|SP>{, #0}]ushll$Unsigned shift left long (immediate))USHLL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, #<shift>bfmlsl��This BFloat16 floating-point multiply-subtract long instruction widens all 16-bit BFloat16 elements in the one, two, or four first source vectors and the indexed element of the second source vector to single-precision format, then multiplies the corresponding elements and destructively subtracts these values without intermediate rounding from the overlapping 32-bit single-precision elements of the ZA double-vector groups.=BFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4BFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }pext�$Converts the source predicate-as-counter into a four register wide predicate-as-mask, and copies the portion of the mask value selected by the portion index to the destination predicate register. A portion corresponds to a one predicate register fraction of the wider predicate-as-mask value.PEXT <Pd>.<T>, <PNn>[<imm>]+PEXT { <Pd1>.<T>, <Pd2>.<T> }, <PNn>[<imm>]eorbt�?Interleaving exclusive OR between the even-numbered elements of the first source vector register and the odd-numbered elements of the second source vector register, placing the result in the even-numbered elements of the destination vector, leaving the odd-numbered elements unchanged. This instruction is unpredicated."EORBT <Zd>.<T>, <Zn>.<T>, <Zm>.<T>smlsll��This signed integer multiply-subtract long-long instruction multiplies each signed 8-bit or 16-bit element in the one, two, or four first source vectors with each signed 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively subtracts these values from the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=SMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=SMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RSMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RSMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RSMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<SMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }svdot��The signed integer vertical dot product instruction computes the vertical dot product of the corresponding two signed 16-bit integer values held in the two first source vectors and two signed 16-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product results are destructively added to the corresponding 32-bit element of the ZA single-vector groups.ISVDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]ISVDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]ISVDOT   ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]st64bv3Single-copy atomic 64-byte store with status resultST64BV <Xs>, <Xt>, [<Xn|SP>]	sm3partw1	SM3PARTW1#SM3PARTW1 <Vd>.4S, <Vn>.4S, <Vm>.4SbBranch conditionallyB. <cond>  <label>	B <label>brkpbs�eIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the %BRKPBS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bbrkpa�yIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.$BRKPA <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BcsnegConditional select negationCSNEG <Wd>, <Wn>, <Wm>, <cond>CSNEG <Xd>, <Xn>, <Xm>, <cond>uqincw�*Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCW <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCW <Xdn>{, <pattern>{, MUL #<imm>}})UQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}sqrdmulhMSigned saturating rounding doubling multiply returning high half (by element)+SQRDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>]/SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQRDMULH <V><d>, <V><n>, <V><m>%SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>%SQRDMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>&SQRDMULH <Zd>.H, <Zn>.H, <Zm>.H[<imm>]&SQRDMULH <Zd>.S, <Zn>.S, <Zm>.S[<imm>]&SQRDMULH <Zd>.D, <Zn>.D, <Zm>.D[<imm>]drpsDebug restore PE stateDRPS sqxtn Signed saturating extract narrowSQXTN <Vb><d>, <Va><n>SQXTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>cpyfprtnKMemory copy forward-only, reads unprivileged, reads and writes non-temporal"CPYFPRTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMRTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFERTN  [ <Xd>]!, [<Xs>]!, <Xn>!sqrshru�
Shift right by an immediate value, the signed integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2-SQRSHRU <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>5SQRSHRU <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>xtnExtract narrowXTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>uqxtn"Unsigned saturating extract narrowUQXTN <Vb><d>, <Va><n>UQXTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>cpyfpt7Memory copy forward-only, reads and writes unprivileged CPYFPT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFMT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFET  [ <Xd>]!, [<Xs>]!, <Xn>!ldclrp&Atomic bit clear on quadword in memoryLDCLRP <Xt1>, <Xt2>, [<Xn|SP>]LDCLRPA <Xt1>, <Xt2>, [<Xn|SP>] LDCLRPAL <Xt1>, <Xt2>, [<Xn|SP>]LDCLRPL <Xt1>, <Xt2>, [<Xn|SP>]whilehs�Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, unsigned scalar operand is higher or same as the second scalar operand and false thereafter down to the lowest numbered element. WHILEHS <Pd>.<T>, <R><n>, <R><m>#WHILEHS <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEHS { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>tlbiTLBI -- A64TLB invalidate operationTLBI <tlbi_op>{, <Xt>}(SYS   #<op1>, <Cn>, <Cm>, #<op2>{, <Xt>}ldp"Load pair of SIMD&amp;FP registers#LDP <St1>, <St2>, [<Xn|SP>], #<imm>#LDP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>#LDP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>$LDP <St1>, <St2>, [<Xn|SP>, #<imm>]!$LDP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!$LDP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!%LDP <St1>, <St2>, [<Xn|SP>{, #<imm>}]%LDP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]%LDP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]#LDP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>#LDP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>$LDP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!$LDP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!%LDP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]%LDP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]sxtlSXTL, SXTL2 -- A64Signed extend longSXTL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>#SSHLL {2}  <Vd>.<Ta>, <Vn>.<Tb>, #0orns�+Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #ORNS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BxarExclusive-OR and rotate&XAR <Vd>.2D, <Vn>.2D, <Vm>.2D, #<imm6>,XAR <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, #<const>rshrnt�]Shift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.$RSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>uhaddUnsigned halving add"UHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>swp!Swap word or doubleword in memorySWP <Ws>, <Wt>, [<Xn|SP>]SWPA <Ws>, <Wt>, [<Xn|SP>]SWPAL <Ws>, <Wt>, [<Xn|SP>]SWPL <Ws>, <Wt>, [<Xn|SP>]SWP <Xs>, <Xt>, [<Xn|SP>]SWPA <Xs>, <Xt>, [<Xn|SP>]SWPAL <Xs>, <Xt>, [<Xn|SP>]SWPL <Xs>, <Xt>, [<Xn|SP>]fcaddFloating-point complex add-FCADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>5FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>urshlr��Shift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Inactive elements in the destination vector register remain unmodified.-URSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sqdmlalb�Multiply then double the corresponding even-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLALB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]uqcvt�Saturate the unsigned integer value in each element of the two source vectors to half the original source element width, and place the results in the half-width destination elements.!UQCVT <Zd>.H, { <Zn1>.S-<Zn2>.S })UQCVT <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }sabdlSigned absolute difference long*SABDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>sqshrnt�8Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's signed integer range -2%SQSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>setgpMemory set with tag settingSETGP  [ <Xd>]!, <Xn>!, <Xs>SETGM  [ <Xd>]!, <Xn>!, <Xs>SETGE  [ <Xd>]!, <Xn>!, <Xs>frintz6Floating-point round to integral, toward zero (vector)FRINTZ <Vd>.<T>, <Vn>.<T>FRINTZ <Vd>.<T>, <Vn>.<T>FRINTZ <Hd>, <Hn>FRINTZ <Sd>, <Sn>FRINTZ <Dd>, <Dn>ssublSigned subtract long*SSUBL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>uqdecb�)Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECB <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECB <Xdn>{, <pattern>{, MUL #<imm>}}crc32cbCRC32C checksumCRC32CB <Wd>, <Wn>, <Wm>CRC32CH <Wd>, <Wn>, <Wm>CRC32CW <Wd>, <Wn>, <Wm>CRC32CX <Wd>, <Wn>, <Xm>uqxtnt�Saturate the unsigned integer value in each source element to half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.UQXTNT <Zd>.<T>, <Zn>.<Tb>bfc
BFC -- A64Bitfield clearBFC <Wd>, #<lsb>, #<width>1BFM   <Wd>, WZR, #(-<lsb>  MOD  32), #(<width>-1)BFC <Xd>, #<lsb>, #<width>1BFM   <Xd>, XZR, #(-<lsb>  MOD  64), #(<width>-1)cmpleCMPLE (vectors)�WCompare active signed integer elements in the first source vector being less than or equal to corresponding signed elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPGE    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>addhnb�.Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.%ADDHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>compact�Read the active elements from the source vector and pack them into the lowest-numbered elements of the destination vector. Then set any remaining elements of the destination vector to zero. COMPACT <Zd>.<T>, <Pg>, <Zn>.<T>faddqv�(Floating-point addition of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as +0.0. FADDQV <Vd>.<T>, <Pg>, <Zn>.<Tb>uqshrn2Unsigned saturating shift right narrow (immediate)!UQSHRN <Vb><d>, <Va><n>, #<shift>*UQSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>absAbsolute valueABS <Wd>, <Wn>ABS <Xd>, <Xn>ABS  D <d>, D<n>ABS <Vd>.<T>, <Vn>.<T>ABS <Zd>.<T>, <Pg>/M, <Zn>.<T>bfmla�NMultiply the corresponding active BFloat16 elements of the first and second source vectors and add to elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.%BFMLA <Zda>.H, <Pg>/M, <Zn>.H, <Zm>.H$BFMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]IBFMLA   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IBFMLA   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@BFMLA   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@BFMLA   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMBFMLA   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MBFMLA   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }sdot2Dot product signed arithmetic (vector, by element)+SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>SDOT <Zda>.S, <Zn>.H, <Zm>.H#SDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]$SDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>#SDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]#SDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]ISDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]ISDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@SDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@SDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMSDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MSDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }ISDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]ISDOT    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]ISDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]ISDOT    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]KSDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>KSDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>[SDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }[SDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }aeseAES single round encryptionAESE <Vd>.16B, <Vn>.16BAESE <Zdn>.B, <Zdn>.B, <Zm>.Bcls Count leading sign bits (vector)CLS <Vd>.<T>, <Vn>.<T>CLS <Wd>, <Wn>CLS <Xd>, <Xn>CLS <Zd>.<T>, <Pg>/M, <Zn>.<T>fcvtzsKFloating-point convert to signed fixed-point, rounding toward zero (vector)FCVTZS <V><d>, <V><n>, #<fbits>#FCVTZS <Vd>.<T>, <Vn>.<T>, #<fbits>FCVTZS <Hd>, <Hn>FCVTZS <V><d>, <V><n>FCVTZS <Vd>.<T>, <Vn>.<T>FCVTZS <Vd>.<T>, <Vn>.<T>FCVTZS <Wd>, <Hn>, #<fbits>FCVTZS <Xd>, <Hn>, #<fbits>FCVTZS <Wd>, <Sn>, #<fbits>FCVTZS <Xd>, <Sn>, #<fbits>FCVTZS <Wd>, <Dn>, #<fbits>FCVTZS <Xd>, <Dn>, #<fbits>FCVTZS <Wd>, <Hn>FCVTZS <Xd>, <Hn>FCVTZS <Wd>, <Sn>FCVTZS <Xd>, <Sn>FCVTZS <Wd>, <Dn>FCVTZS <Xd>, <Dn>/FCVTZS { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FCVTZS { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }FCVTZS <Zd>.H, <Pg>/M, <Zn>.HFCVTZS <Zd>.S, <Pg>/M, <Zn>.HFCVTZS <Zd>.D, <Pg>/M, <Zn>.HFCVTZS <Zd>.S, <Pg>/M, <Zn>.SFCVTZS <Zd>.D, <Pg>/M, <Zn>.SFCVTZS <Zd>.S, <Pg>/M, <Zn>.DFCVTZS <Zd>.D, <Pg>/M, <Zn>.Dnbsl�CSelects bits from the first source vector where the corresponding bit in the third source vector is '1', and from the second source vector where the corresponding bit in the third source vector is '0'. The inverted result is placed destructively in the destination and first source vector. This instruction is unpredicated.%NBSL <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dretaa3Return from subroutine, with pointer authenticationRETAA RETAB rsubhnt�9Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant rounded half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.&RSUBHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>eretException returnERET ld1b�Contiguous load of unsigned bytes to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LD1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]2LD1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]?LD1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]3LD1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]ELD1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]+LD1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]+LD1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}](LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>](LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>](LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>](LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]1LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]1LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]*LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D];LD1B { ZA0<HV>.B[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>}]ssubwb�
Subtract the even-numbered signed elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SSUBWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>bfmopsoThe BFloat16 floating-point sum of outer products and subtract instruction works with a 32-bit element ZA tile./BFMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H/BFMOPS <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.HbmopawThis instruction works with 32-bit element ZA tile. This instruction generates an outer product of the first source SVL.BMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.Sic	IC -- A64Instruction cache operationIC <ic_op>{, <Xt>}&SYS   #<op1>, C7, <Cm>, #<op2>{, <Xt>}uzp�Concatenate every fourth element from each of the four source vectors and place them in the corresponding elements of the four destination vectors.4UZP { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<T>-<Zn4>.<T> },UZP { <Zd1>.Q-<Zd4>.Q }, { <Zn1>.Q-<Zn4>.Q }/UZP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>'UZP { <Zd1>.Q-<Zd2>.Q }, <Zn>.Q, <Zm>.QfminFloating-point minimum (vector)!FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMIN <Hd>, <Hn>, <Hm>FMIN <Sd>, <Sn>, <Sm>FMIN <Dd>, <Dn>, <Dm>CFMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CFMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RFMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RFMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }*FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ftmadThe ,FTMAD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, #<imm>uqshrnb�EShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2%UQSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>st4b�0Contiguous store four-byte structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]BST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>, <Xm>]cmnCMN (extended register) -- A64$Compare negative (extended register)*CMN <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2ADDS   WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}+CMN <Xn|SP>, <R><m>{, <extend> {#<amount>}}3ADDS   XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}CMN (immediate) -- A64Compare negative (immediate)CMN <Wn|WSP>, #<imm>{, <shift>}'ADDS   WZR, <Wn|WSP>, #<imm>{, <shift>}CMN <Xn|SP>, #<imm>{, <shift>}&ADDS   XZR, <Xn|SP>, #<imm>{, <shift>}CMN (shifted register) -- A64#Compare negative (shifted register)#CMN <Wn>, <Wm>{, <shift> #<amount>}+ADDS   WZR, <Wn>, <Wm>{, <shift> #<amount>}#CMN <Xn>, <Xm>{, <shift> #<amount>}+ADDS   XZR, <Xn>, <Xm>{, <shift> #<amount>}st3b�2Contiguous store three-byte structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>, <Xm>]sqxtnb�Saturate the signed integer value in each source element to half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.SQXTNB <Zd>.<T>, <Zn>.<Tb>umlsll��This unsigned integer multiply-subtract long-long instruction multiplies each unsigned 8-bit or 16-bit element in the one, two, or four first source vectors with each unsigned 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively subtracts these values from the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=UMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=UMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RUMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RUMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RUMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<UMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }sqrdmlahXSigned saturating rounding doubling multiply accumulate returning high half (by element)+SQRDMLAH <V><d>, <V><n>, <Vm>.<Ts>[<index>]/SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQRDMLAH <V><d>, <V><n>, <V><m>%SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>&SQRDMLAH <Zda>.<T>, <Zn>.<T>, <Zm>.<T>'SQRDMLAH <Zda>.H, <Zn>.H, <Zm>.H[<imm>]'SQRDMLAH <Zda>.S, <Zn>.S, <Zm>.S[<imm>]'SQRDMLAH <Zda>.D, <Zn>.D, <Zm>.D[<imm>]cpyptwn?Memory copy, reads and writes unprivileged, writes non-temporal!CPYPTWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMTWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYETWN  [ <Xd>]!, [<Xs>]!, <Xn>!fmadd*Floating-point fused multiply-add (scalar)FMADD <Hd>, <Hn>, <Hm>, <Ha>FMADD <Sd>, <Sn>, <Sm>, <Sa>FMADD <Dd>, <Dn>, <Dm>, <Da>dghData gathering hintDGH pssbbPSSBB -- A64)Physical speculative store bypass barrierPSSBB DSB   #4cpypwtwn1Memory copy, writes unprivileged and non-temporal"CPYPWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYEWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!stnp;Store pair of SIMD&amp;FP registers, with non-temporal hint&STNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]&STNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]&STNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]&STNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]&STNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]cntp�Counts the number of active and true elements in the source predicate and places the scalar result in the destination general-purpose register. Inactive predicate elements are not counted.CNTP <Xd>, <Pg>, <Pn>.<T>CNTP <Xd>, <PNn>.<T>, <vl>sminSigned minimum (vector)!SMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SMIN <Wd>, <Wn>, #<simm>SMIN <Xd>, <Xn>, #<simm>CSMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CSMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RSMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RSMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }SMIN <Wd>, <Wn>, <Wm>SMIN <Xd>, <Xn>, <Xm>+SMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!SMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>pmullb�Polynomial multiply over [0, 1] the corresponding even-numbered elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%PMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>PMULLB <Zd>.Q, <Zn>.D, <Zm>.DumaxpUnsigned maximum pairwise"UMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UMAXP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>bfclamp�jClamp each BFloat16 element in the two or four destination vectors to between the BFloat16 minimum value in the corresponding element of the first source vector and the BFloat16 maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.+BFCLAMP { <Zd1>.H-<Zd2>.H }, <Zn>.H, <Zm>.H+BFCLAMP { <Zd1>.H-<Zd4>.H }, <Zn>.H, <Zm>.HBFCLAMP <Zd>.H, <Zn>.H, <Zm>.Hsaddwt�Add the odd-numbered signed elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SADDWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>st47Store multiple 4-element structures from four registers=ST4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]DST4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CST4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>>ST4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]>ST4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]>ST4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]>ST4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]BST4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4DST4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>BST4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8DST4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>CST4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16DST4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>CST4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32DST4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>frintxLFloating-point round to integral exact, using current rounding mode (vector)FRINTX <Vd>.<T>, <Vn>.<T>FRINTX <Vd>.<T>, <Vn>.<T>FRINTX <Hd>, <Hn>FRINTX <Sd>, <Sn>FRINTX <Dd>, <Dn>ldumaxb9Atomic unsigned maximum on byte in memory, without return4STUMAXB <Ws>, [<Xn|SP>]LDUMAXB  <Ws>, WZR, [<Xn|SP>]6STUMAXLB <Ws>, [<Xn|SP>]LDUMAXLB  <Ws>, WZR, [<Xn|SP>]udot4Dot product unsigned arithmetic (vector, by element)+UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>UDOT <Zda>.S, <Zn>.H, <Zm>.H#UDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]$UDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>#UDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]#UDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]IUDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IUDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@UDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@UDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMUDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MUDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }IUDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IUDOT    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IUDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]IUDOT    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]KUDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>KUDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>[UDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }[UDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }trn2Transpose vectors (secondary)!TRN2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>sqdmullb�Multiply the corresponding even-numbered signed elements of the first and second source vectors, double and place the results in the overlapping double-width elements of the destination vector. Each result element is saturated to the double-width N-bit element's signed integer range -2'SQDMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>&SQDMULLB <Zd>.S, <Zn>.H, <Zm>.H[<imm>]&SQDMULLB <Zd>.D, <Zn>.S, <Zm>.S[<imm>]cpyptn;Memory copy, reads and writes unprivileged and non-temporal CPYPTN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMTN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYETN  [ <Xd>]!, [<Xs>]!, <Xn>!bfi
BFI -- A64Bitfield insert BFI <Wd>, <Wn>, #<lsb>, #<width>2BFM   <Wd>, <Wn>, #(-<lsb>  MOD  32), #(<width>-1) BFI <Xd>, <Xn>, #<lsb>, #<width>2BFM   <Xd>, <Xn>, #(-<lsb>  MOD  64), #(<width>-1)dc	DC -- A64Data cache operationDC <dc_op>, <Xt>$SYS   #<op1>, C7, <Cm>, #<op2>, <Xt>cfinvInvert carry flagCFINV sbclt�cSubtract the odd-numbered elements of the first source vector and the inverted 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector from the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#SBCLT <Zda>.<T>, <Zn>.<T>, <Zm>.<T>adclt�SAdd the odd-numbered elements of the first source vector and the 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector to the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#ADCLT <Zda>.<T>, <Zn>.<T>, <Zm>.<T>uqsubr�2Subtract active unsigned elements of the first source vector from corresponding unsigned elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2-UQSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ld1rqh�Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]dup,Duplicate vector element to vector or scalarDUP <V><d>, <Vn>.<T>[<index>] DUP <Vd>.<T>, <Vn>.<Ts>[<index>]DUP <Vd>.<T>, <R><n>DUP <Zd>.<T>, #<imm>{, <shift>}DUP <Zd>.<T>, <R><n|SP>DUP <Zd>.<T>, <Zn>.<T>[<imm>]dvp
DVP -- A64,Data value prediction restriction by contextDVP  RCTX, <Xt>SYS   #3, C7, C3, #5, <Xt>fminnmp:Floating-point minimum number of pair of elements (scalar)FMINNMP  H <d>, <Vn>.2HFMINNMP <V><d>, <Vn>.<T>$FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>.FMINNMP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>fmmla��The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2 matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the Effective SVE vector length is at least 256 bits.FMMLA <Zda>.S, <Zn>.S, <Zm>.SFMMLA <Zda>.D, <Zn>.D, <Zm>.Dsshll"Signed shift left long (immediate))SSHLL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, #<shift>cmpeq�Compare active integer elements in the source vector with an immediate, and place the boolean results of the specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the (CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D*CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>addha��Add each element of the source vector to the corresponding active element of each horizontal slice of a ZA tile. The tile elements are predicated by a pair of governing predicates. An element of a horizontal slice is considered active if its corresponding element in the second governing predicate is TRUE and the element corresponding to its horizontal slice number in the first governing predicate is TRUE. Inactive elements in the destination tile remain unmodified.&ADDHA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S&ADDHA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.Dsm3tt2bSM3TT2B(SM3TT2B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]ushllt�2Shift left by immediate each odd-numbered unsigned element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$USHLLT <Zd>.<T>, <Zn>.<Tb>, #<const>smstartSMSTART -- A64@Enables access to Streaming SVE mode and SME architectural stateSMSTART  { <option>}MSR   <pstatefield>,   #1umnegl
UMNEGL -- A64Unsigned multiply-negate longUMNEGL <Xd>, <Wn>, <Wm>UMSUBL   <Xd>, <Wn>, <Wm>, XZRmsb�-Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.)MSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>fmaxnmqv�9Floating-point maximum number of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the default NaN."FMAXNMQV <Vd>.<T>, <Pg>, <Zn>.<Tb>cntb�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then places the result in the scalar destination.$CNTB <Xd>{, <pattern>{, MUL #<imm>}}$CNTD <Xd>{, <pattern>{, MUL #<imm>}}$CNTH <Xd>{, <pattern>{, MUL #<imm>}}$CNTW <Xd>{, <pattern>{, MUL #<imm>}}ldxrbLoad exclusive register byteLDXRB <Wt>, [<Xn|SP>{, #0}]shrnt�_Shift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.#SHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>mrs0Move System register to general-purpose register4MRS <Xt>, (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>)st64bv07Single-copy atomic 64-byte EL0 store with status resultST64BV0 <Xs>, <Xt>, [<Xn|SP>]swpahSwap halfword in memorySWPAH <Ws>, <Wt>, [<Xn|SP>]SWPALH <Ws>, <Wt>, [<Xn|SP>]SWPH <Ws>, <Wt>, [<Xn|SP>]SWPLH <Ws>, <Wt>, [<Xn|SP>]fdivFloating-point divide (vector)!FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FDIV <Hd>, <Hn>, <Hm>FDIV <Sd>, <Sn>, <Sm>FDIV <Dd>, <Dn>, <Dm>+FDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>blraaz9Branch with link to register, with pointer authenticationBLRAAZ <Xn>BLRAA <Xn>, <Xm|SP>BLRABZ <Xn>BLRAB <Xn>, <Xm|SP>autib171615-Authenticate instruction address, using key BAUTIB171615 decpxCounts the number of true elements in the source predicate and then uses the result to decrement the scalar destination.DECP <Xdn>, <Pm>.<T>DECP <Zdn>.<T>, <Pm>.<T>sha1pSHA1 hash update (parity)SHA1P <Qd>, <Sn>, <Vm>.4S	autiasppcBAuthenticate return address using key A, using an immediate offsetAUTIASPPC <label>ld4rLLoad single 4-element structure and replicate to all lanes of four registers>LD4R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]ELD4R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>DLD4R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>dupq�Unconditionally broadcast the indexed element within each 128-bit source vector segment to all elements of the corresponding destination vector segment. This instruction is unpredicated.DUPQ <Zd>.<T>, <Zn>.<T>[<imm>]	sm3partw2	SM3PARTW2#SM3PARTW2 <Vd>.4S, <Vn>.4S, <Vm>.4Sbdep�PThis instruction scatters the lowest-numbered contiguous bits within each element of the first source vector to the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector, preserving their order, and set the bits corresponding to a zero mask bit to zero. This instruction is unpredicated.!BDEP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>wrffrJRead the source predicate register and place in the first-fault register (WRFFR <Pn>.Bsm3tt2aSM3TT2A(SM3TT2A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]addhnt�*Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.%ADDHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>fcvtlt�
Convert odd-numbered floating-point elements from the source vector to the next higher precision, and place the results in the active overlapping double-width elements of the destination vector. Inactive elements in the destination vector register remain unmodified.FCVTLT <Zd>.S, <Pg>/M, <Zn>.HFCVTLT <Zd>.D, <Pg>/M, <Zn>.Sldnf1sb��Contiguous load with non-faulting behavior of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.7LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]7LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]7LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]csdb'Consumption of speculative data barrierCSDB fcvtpuRFloating-point convert to unsigned integer, rounding toward plus infinity (vector)
FCVTPU <Hd>, <Hn>FCVTPU <V><d>, <V><n>FCVTPU <Vd>.<T>, <Vn>.<T>FCVTPU <Vd>.<T>, <Vn>.<T>FCVTPU <Wd>, <Hn>FCVTPU <Xd>, <Hn>FCVTPU <Wd>, <Sn>FCVTPU <Xd>, <Sn>FCVTPU <Wd>, <Dn>FCVTPU <Xd>, <Dn>ld2q�1Contiguous load two-quadword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #4]cmeqCompare bitwise equal (vector)CMEQ  D <d>, D<n>, D<m>!CMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>CMEQ  D <d>, D<n>, #0CMEQ <Vd>.<T>, <Vn>.<T>, #0ldff1w�YGather load with first-faulting behavior of unsigned words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
-LDFF1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]-LDFF1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]4LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]6LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]6LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]3LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]4LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2],LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]sbcSubtract with carrySBC <Wd>, <Wn>, <Wm>SBC <Xd>, <Xn>, <Xm>sqcvtun�Saturate the signed integer value in each element of the group of two source vectors to unsigned integer value that is half the original source element width, and place the two-way interleaved results in the half-width destination elements.#SQCVTUN <Zd>.H, { <Zn1>.S-<Zn2>.S }+SQCVTUN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }stl1FStore-release a single-element structure from one lane of one register$STL1  { <Vt>.D }[<index>], [<Xn|SP>]uaba+Unsigned absolute difference and accumulate!UABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"UABA <Zda>.<T>, <Zn>.<T>, <Zm>.<T>lsrr��Reversed shift right, inserting zeroes, active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.+LSRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>tbxq�(For each 128-bit destination vector segment, reads each element of the corresponding second source (index) vector segment and uses its value to select an indexed element from the corresponding first source (table) vector segment. The indexed table element is placed in the element of the destination vector that corresponds to the index vector element. If an index value is greater than or equal to the number of elements in a 128-bit vector segment then the corresponding destination vector element is left unchanged. This instruction is unpredicated.!TBXQ <Zd>.<T>, <Zn>.<T>, <Zm>.<T>msr(Move immediate value to special registerMSR <pstatefield>, #<imm>6MSR  ( <systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>), <Xt>whilehi�Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, unsigned scalar operand is higher than the second scalar operand and false thereafter down to the lowest numbered element. WHILEHI <Pd>.<T>, <R><n>, <R><m>#WHILEHI <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEHI { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>uqdecp�Counts the number of true elements in the source predicate and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.UQDECP <Wdn>, <Pm>.<T>UQDECP <Xdn>, <Pm>.<T>UQDECP <Zdn>.<T>, <Pm>.<T>saddv�Signed add horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Narrow elements are first sign-extended to 64 bits. Inactive elements in the source vector are treated as zero.SADDV <Dd>, <Pg>, <Zn>.<T>chkfeatCheck feature status
CHKFEAT  X16 sqrshrn8Signed saturating rounded shift right narrow (immediate)"SQRSHRN <Vb><d>, <Va><n>, #<shift>+SQRSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>-SQRSHRN <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>5SQRSHRN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>cpypwtrn4Memory copy, writes unprivileged, reads non-temporal"CPYPWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYEWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!movkMove wide with keep!MOVK <Wd>, #<imm>{, LSL #<shift>}!MOVK <Xd>, #<imm>{, LSL #<shift>}fmlslIFloating-point fused multiply-subtract long from accumulator (by element)+FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>],FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]%FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>&FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=FMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4FMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }uaddwb�Add the even-numbered unsigned elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$UADDWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>ftsselThe #FTSSEL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>sqrshr�Shift right by an immediate value, the signed integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's signed integer range -2,SQRSHR <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>4SQRSHR <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>uqdech�*Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECH <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECH <Xdn>{, <pattern>{, MUL #<imm>}})UQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}fnmadd2Floating-point negated fused multiply-add (scalar)FNMADD <Hd>, <Hn>, <Hm>, <Ha>FNMADD <Sd>, <Sn>, <Sm>, <Sa>FNMADD <Dd>, <Dn>, <Dm>, <Da>sqdmlalt�Multiply then double the corresponding odd-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLALT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]cpyfprt,Memory copy forward-only, reads unprivileged!CPYFPRT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMRT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFERT  [ <Xd>]!, [<Xs>]!, <Xn>!cpyfpMemory copy forward-onlyCPYFP  [ <Xd>]!, [<Xs>]!, <Xn>!CPYFM  [ <Xd>]!, [<Xs>]!, <Xn>!CPYFE  [ <Xd>]!, [<Xs>]!, <Xn>!ld1rsb�Load a single signed byte from a memory address generated by a 64-bit scalar base address plus an immediate offset which is in the range 0 to 63..LD1RSB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}].LD1RSB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}].LD1RSB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]	sqrshrunt�=Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2'SQRSHRUNT <Zd>.<T>, <Zn>.<Tb>, #<const>sshlSigned shift left (register)SSHL  D <d>, D<n>, D<m>!SSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>fcmpe)Floating-point signaling compare (scalar)FCMPE <Hn>, <Hm>FCMPE <Hn>, #0.0FCMPE <Sn>, <Sm>FCMPE <Sn>, #0.0FCMPE <Dn>, <Dm>FCMPE <Dn>, #0.0rdffrsRead the first-fault register (RDFFRS <Pd>.B, <Pg>/Zstilp'Store-release ordered pair of registers#STILP <Wt1>, <Wt2>, [<Xn|SP>, #-8]!STILP <Wt1>, <Wt2>, [<Xn|SP>]$STILP <Xt1>, <Xt2>, [<Xn|SP>, #-16]!STILP <Xt1>, <Xt2>, [<Xn|SP>]fnmsb�bMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.+FNMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>fnmul'Floating-point multiply-negate (scalar)FNMUL <Hd>, <Hn>, <Hm>FNMUL <Sd>, <Sn>, <Sm>FNMUL <Dd>, <Dn>, <Dm>csincConditional select incrementCSINC <Wd>, <Wn>, <Wm>, <cond>CSINC <Xd>, <Xn>, <Xm>, <cond>sqshlr��Shift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's signed integer range -2-SQSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uabalt�Compute the absolute difference between odd-numbered unsigned elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UABALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>usvdot��The unsigned by signed integer vertical dot product instruction computes the vertical dot product of corresponding unsigned 8-bit elements from the four first source vectors and four signed 8-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product result is destructively added to the corresponding 32-bit element of the ZA single-vector groups.IUSVDOT  ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]sunpkhi�Unpack elements from the lowest or highest half of the source vector and then sign-extend them to place in elements of twice their size within the destination vector. This instruction is unpredicated.SUNPKHI <Zd>.<T>, <Zn>.<Tb>SUNPKLO <Zd>.<T>, <Zn>.<Tb>bfmopaqThe BFloat16 floating-point sum of outer products and accumulate instruction works with a 32-bit element ZA tile./BFMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H/BFMOPA <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hld3b�1Contiguous load three-byte structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]st2b�.Contiguous store two-byte structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>, <Xm>]sudotBDot product with signed and unsigned integers (vector, by element),SUDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$SUDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]ISUDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]ISUDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@SUDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@SUDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.Bptest	Sets the PTEST <Pg>, <Pn>.Bzip1Zip vectors (primary)!ZIP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>cpp
CPP -- A640Cache prefetch prediction restriction by contextCPP  RCTX, <Xt>SYS   #3, C7, C3, #7, <Xt>ldsetp$Atomic bit set on quadword in memoryLDSETP <Xt1>, <Xt2>, [<Xn|SP>]LDSETPA <Xt1>, <Xt2>, [<Xn|SP>] LDSETPAL <Xt1>, <Xt2>, [<Xn|SP>]LDSETPL <Xt1>, <Xt2>, [<Xn|SP>]ssubwSigned subtract wide*SSUBW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>bfcvtnHFloating-point convert from single-precision to BFloat16 format (vector)BFCVTN{ 2}  <Vd>.<Ta>, <Vn>.4S"BFCVTN <Zd>.B, { <Zn1>.H-<Zn2>.H }"BFCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }fexpaThe FEXPA <Zd>.<T>, <Zn>.<T>fmaxFloating-point maximum (vector)!FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMAX <Hd>, <Hn>, <Hm>FMAX <Sd>, <Sn>, <Sm>FMAX <Dd>, <Dn>, <Dm>CFMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CFMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RFMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RFMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }*FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>fsqrt#Floating-point square root (vector)FSQRT <Vd>.<T>, <Vn>.<T>FSQRT <Vd>.<T>, <Vn>.<T>FSQRT <Hd>, <Hn>FSQRT <Sd>, <Sn>FSQRT <Dd>, <Dn> FSQRT <Zd>.<T>, <Pg>/M, <Zn>.<T>frintnGFloating-point round to integral, to nearest with ties to even (vector)FRINTN <Vd>.<T>, <Vn>.<T>FRINTN <Vd>.<T>, <Vn>.<T>FRINTN <Hd>, <Hn>FRINTN <Sd>, <Sn>FRINTN <Dd>, <Dn>/FRINTN { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTN { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }lsr�SShift right by immediate, inserting zeroes, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.*LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>(LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D*LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> LSR <Zd>.<T>, <Zn>.<T>, #<const>LSR <Zd>.<T>, <Zn>.<T>, <Zm>.DLSR (register) -- A64Logical shift right (register)LSR <Wd>, <Wn>, <Wm>LSRV   <Wd>, <Wn>, <Wm>LSR <Xd>, <Xn>, <Xm>LSRV   <Xd>, <Xn>, <Xm>LSR (immediate) -- A64Logical shift right (immediate)LSR <Wd>, <Wn>, #<shift> UBFM   <Wd>, <Wn>, #<shift>, #31LSR <Xd>, <Xn>, #<shift> UBFM   <Xd>, <Xn>, #<shift>, #63dupm��Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits.DUPM <Zd>.<T>, #<const>ld1rsw�Load a single signed word from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 4 in the range 0 to 252..LD1RSW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]frintm@Floating-point round to integral, toward minus infinity (vector)FRINTM <Vd>.<T>, <Vn>.<T>FRINTM <Vd>.<T>, <Vn>.<T>FRINTM <Hd>, <Hn>FRINTM <Sd>, <Sn>FRINTM <Dd>, <Dn>/FRINTM { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTM { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }raddhn"Rounding add returning high narrow+RADDHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>rbitReverse bit order (vector)RBIT <Vd>.<T>, <Vn>.<T>RBIT <Wd>, <Wn>RBIT <Xd>, <Xn>RBIT <Zd>.<T>, <Pg>/M, <Zn>.<T>lslvLogical shift left variableLSLV <Wd>, <Wn>, <Wm>LSLV <Xd>, <Xn>, <Xm>ld1rqw�Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]pmulPolynomial multiply!PMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>PMUL <Zd>.B, <Zn>.B, <Zm>.BsubSubtract (extended register)4SUB <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}4SUB <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}})SUB <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}'SUB <Xd|SP>, <Xn|SP>, #<imm>{, <shift>})SUB <Wd>, <Wn>, <Wm>{, <shift> #<amount>})SUB <Xd>, <Xn>, <Xm>{, <shift> #<amount>}SUB  D <d>, D<n>, D<m> SUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*SUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>+SUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>} SUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>SUB     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }>SUB     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }HSUB     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>HSUB     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>WSUB     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }WSUB     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }sqrshrnt�6Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's signed integer range -2&SQRSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>flogbcThis instruction returns the signed integer base 2 logarithm of each floating-point input element | FLOGB <Zd>.<T>, <Pg>/M, <Zn>.<T>ld1rb�Load a single unsigned byte from a memory address generated by a 64-bit scalar base address plus an immediate offset which is in the range 0 to 63.-LD1RB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]negsNEGS -- A64Negate, setting flags$NEGS <Wd>, <Wm>{, <shift> #<amount>}+SUBS   <Wd>, WZR, <Wm>{, <shift> #<amount>}$NEGS <Xd>, <Xm>{, <shift> #<amount>}+SUBS   <Xd>, XZR, <Xm>{, <shift> #<amount>}famaxFloating-point absolute maximum"FAMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FAMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SFAMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SFAMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FAMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>cmtst*Compare bitwise test bits nonzero (vector)CMTST  D <d>, D<n>, D<m>"CMTST <Vd>.<T>, <Vn>.<T>, <Vm>.<T>urecpeUnsigned reciprocal estimateURECPE <Vd>.<T>, <Vn>.<T>URECPE <Zd>.S, <Pg>/M, <Zn>.Sincb�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination.%INCB <Xdn>{, <pattern>{, MUL #<imm>}}%INCD <Xdn>{, <pattern>{, MUL #<imm>}}%INCH <Xdn>{, <pattern>{, MUL #<imm>}}%INCW <Xdn>{, <pattern>{, MUL #<imm>}}ldnt1w�Contiguous load non-temporal of words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]<LDNT1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]<LDNT1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]ALDNT1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]=LDNT1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]OLDNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]+LDNT1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]+LDNT1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]ssra-Signed shift right and accumulate (immediate)SSRA  D <d>, D<n>, #<shift>!SSRA <Vd>.<T>, <Vn>.<T>, #<shift>"SSRA <Zda>.<T>, <Zn>.<T>, #<const>smlslt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLSLT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLSLT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]usublUnsigned subtract long*USUBL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>movaThe instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.:MOVA { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>]<MOVA { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>]<MOVA { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>]<MOVA { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>]:MOVA { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>]<MOVA { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>]<MOVA { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>]<MOVA { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]4MOVA { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]4MOVA { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}],MOVA <Zd>.B, <Pg>/M, ZA0<HV>.B[<Ws>, <offs>].MOVA <Zd>.H, <Pg>/M, <ZAn><HV>.H[<Ws>, <offs>].MOVA <Zd>.S, <Pg>/M, <ZAn><HV>.S[<Ws>, <offs>].MOVA <Zd>.D, <Pg>/M, <ZAn><HV>.D[<Ws>, <offs>].MOVA <Zd>.Q, <Pg>/M, <ZAn><HV>.Q[<Ws>, <offs>]>MOVA    ZA0 <HV>.B[<Ws>, <offs1>:<offs2>], { <Zn1>.B-<Zn2>.B }<MOVA <ZAd><HV>.H[<Ws>, <offs1>:<offs2>], { <Zn1>.H-<Zn2>.H }<MOVA <ZAd><HV>.S[<Ws>, <offs1>:<offs2>], { <Zn1>.S-<Zn2>.S }<MOVA <ZAd><HV>.D[<Ws>, <offs1>:<offs2>], { <Zn1>.D-<Zn2>.D }>MOVA    ZA0 <HV>.B[<Ws>, <offs1>:<offs4>], { <Zn1>.B-<Zn4>.B }<MOVA <ZAd><HV>.H[<Ws>, <offs1>:<offs4>], { <Zn1>.H-<Zn4>.H }<MOVA <ZAd><HV>.S[<Ws>, <offs1>:<offs4>], { <Zn1>.S-<Zn4>.S }<MOVA <ZAd><HV>.D[<Ws>, <offs1>:<offs4>], { <Zn1>.D-<Zn4>.D }8MOVA    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }8MOVA    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }0MOVA    ZA0 <HV>.B[<Ws>, <offs>], <Pg>/M, <Zn>.B.MOVA <ZAd><HV>.H[<Ws>, <offs>], <Pg>/M, <Zn>.H.MOVA <ZAd><HV>.S[<Ws>, <offs>], <Pg>/M, <Zn>.S.MOVA <ZAd><HV>.D[<Ws>, <offs>], <Pg>/M, <Zn>.D.MOVA <ZAd><HV>.Q[<Ws>, <offs>], <Pg>/M, <Zn>.QfmlallbbT8-bit floating-point multiply-add long-long to single-precision (vector, by element)
+FMLALLBB <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]+FMLALLBT <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]+FMLALLTB <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]+FMLALLTT <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]$FMLALLBB <Vd>.4S, <Vn>.16B, <Vm>.16B$FMLALLBT <Vd>.4S, <Vn>.16B, <Vm>.16B$FMLALLTB <Vd>.4S, <Vn>.16B, <Vm>.16B$FMLALLTT <Vd>.4S, <Vn>.16B, <Vm>.16B FMLALLBB <Zda>.S, <Zn>.B, <Zm>.B'FMLALLBB <Zda>.S, <Zn>.B, <Zm>.B[<imm>]	paciasppc;Pointer Authentication Code for return address, using key A
PACIASPPC pfirst�Sets the first active element in the destination predicate to true, otherwise elements from the source predicate are passed through unchanged. Sets the PFIRST <Pdn>.B, <Pg>, <Pdn>.Bsmlalt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLALT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]sqdmullt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors, double and place the results in the overlapping double-width elements of the destination vector. Each result element is saturated to the double-width N-bit element's signed integer range -2'SQDMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>&SQDMULLT <Zd>.S, <Zn>.H, <Zm>.H[<imm>]&SQDMULLT <Zd>.D, <Zn>.S, <Zm>.S[<imm>]stnt1d�"Contiguous store non-temporal of doublewords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]:STNT1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]:STNT1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]?STNT1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}];STNT1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]MSTNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3])STNT1D { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]sha256h2SHA256 hash update (part 2)SHA256H2 <Qd>, <Qn>, <Vm>.4Ssttrh&Store register halfword (unprivileged) STTRH <Wt>, [<Xn|SP>{, #<simm>}]sha1hSHA1 fixed rotateSHA1H <Sd>, <Sn>smlslb�Multiply the corresponding even-numbered signed elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLSLB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLSLB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]ldnf1sh��Contiguous load with non-faulting behavior of signed halfwords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.7LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]7LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]bf1cvtl18-bit floating-point convert to BFloat16 (vector)BF1CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>BF2CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>#BF1CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.B#BF2CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.Buqcvtn�Saturate the unsigned integer value in each element of the group of two source vectors to half the original source element width, and place the two-way interleaved results in the half-width destination elements."UQCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }*UQCVTN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }bfminnm�Determine the minimum number value of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.<BFMINNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H<BFMINNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HIBFMINNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }IBFMINNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }(BFMINNM <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.Hrev16-Reverse elements in 16-bit halfwords (vector)REV16 <Vd>.<T>, <Vn>.<T>REV16 <Wd>, <Wn>REV16 <Xd>, <Xn>cmle2Compare signed less than or equal to zero (vector)CMLE  D <d>, D<n>, #0CMLE <Vd>.<T>, <Vn>.<T>, #0ld1sh�=Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LD1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}],LD1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]1LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]1LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]5LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]5LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]2LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]2LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]3LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]+LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]uhsubr�9Subtract active unsigned elements of the first source vector from corresponding unsigned elements of the second source vector, shift right one bit, and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.-UHSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ld2d�3Contiguous load two-doubleword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]lduminah-Atomic unsigned minimum on halfword in memoryLDUMINAH <Ws>, <Wt>, [<Xn|SP>]LDUMINALH <Ws>, <Wt>, [<Xn|SP>]LDUMINH <Ws>, <Wt>, [<Xn|SP>]LDUMINLH <Ws>, <Wt>, [<Xn|SP>]cmploCMPLO (vectors)�PCompare active unsigned integer elements in the first source vector being lower than corresponding unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLO <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPHI    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>ushr Unsigned shift right (immediate)USHR  D <d>, D<n>, #<shift>!USHR <Vd>.<T>, <Vn>.<T>, #<shift>pacga.Pointer Authentication Code, using generic keyPACGA <Xd>, <Xn>, <Xm|SP>movprfxThe predicated %MOVPRFX <Zd>.<T>, <Pg>/<ZM>, <Zn>.<T>MOVPRFX <Zd>, <Zn>smlsl2Signed multiply-subtract long (vector, by element)
3SMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*SMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=SMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4SMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HISMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HISMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }st1PStore multiple single-element structures from one, two, three, or four registersST1  { <Vt>.<T> }, [<Xn|SP>]'ST1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]2ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]=ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]#ST1  { <Vt>.<T> }, [<Xn|SP>], <imm>"ST1  { <Vt>.<T> }, [<Xn|SP>], <Xm>.ST1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-ST1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>9ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>DST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>#ST1  { <Vt>.B }[<index>], [<Xn|SP>]#ST1  { <Vt>.H }[<index>], [<Xn|SP>]#ST1  { <Vt>.S }[<index>], [<Xn|SP>]#ST1  { <Vt>.D }[<index>], [<Xn|SP>]'ST1  { <Vt>.B }[<index>], [<Xn|SP>], #1)ST1  { <Vt>.B }[<index>], [<Xn|SP>], <Xm>'ST1  { <Vt>.H }[<index>], [<Xn|SP>], #2)ST1  { <Vt>.H }[<index>], [<Xn|SP>], <Xm>'ST1  { <Vt>.S }[<index>], [<Xn|SP>], #4)ST1  { <Vt>.S }[<index>], [<Xn|SP>], <Xm>'ST1  { <Vt>.D }[<index>], [<Xn|SP>], #8)ST1  { <Vt>.D }[<index>], [<Xn|SP>], <Xm>rcwsclrp@Read check write software atomic bit clear on quadword in memory RCWSCLRP <Xt1>, <Xt2>, [<Xn|SP>]!RCWSCLRPA <Xt1>, <Xt2>, [<Xn|SP>]"RCWSCLRPAL <Xt1>, <Xt2>, [<Xn|SP>]!RCWSCLRPL <Xt1>, <Xt2>, [<Xn|SP>]uaddwUnsigned add wide*UADDW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>pacia@Pointer Authentication Code for instruction address, using key APACIA <Xd>, <Xn|SP>PACIZA <Xd>
PACIA1716 PACIASP PACIAZ ldrsw%Load register signed word (immediate)LDRSW <Xt>, [<Xn|SP>], #<simm>LDRSW <Xt>, [<Xn|SP>, #<simm>]! LDRSW <Xt>, [<Xn|SP>{, #<pimm>}]LDRSW <Xt>, <label>9LDRSW <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]brkbs�Sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKBS <Pd>.B, <Pg>/Z, <Pn>.Bld4d�5Contiguous load four-doubleword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]ldnf1sw��Contiguous load with non-faulting behavior of signed words to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.7LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]rcwscasp=Read check write software compare and swap quadword in memory2RCWSCASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]3RCWSCASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]4RCWSCASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]3RCWSCASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]rcwsclrBRead check write software atomic bit clear on doubleword in memoryRCWSCLR <Xs>, <Xt>, [<Xn|SP>]RCWSCLRA <Xs>, <Xt>, [<Xn|SP>]RCWSCLRAL <Xs>, <Xt>, [<Xn|SP>]RCWSCLRL <Xs>, <Xt>, [<Xn|SP>]fcvt)Floating-point convert precision (scalar)FCVT <Sd>, <Hn>FCVT <Dd>, <Hn>FCVT <Hd>, <Sn>FCVT <Dd>, <Sn>FCVT <Hd>, <Dn>FCVT <Sd>, <Dn> FCVT { <Zd1>.S-<Zd2>.S }, <Zn>.H FCVT <Zd>.B, { <Zn1>.H-<Zn2>.H } FCVT <Zd>.B, { <Zn1>.S-<Zn4>.S } FCVT <Zd>.H, { <Zn1>.S-<Zn2>.S }FCVT <Zd>.S, <Pg>/M, <Zn>.HFCVT <Zd>.D, <Pg>/M, <Zn>.HFCVT <Zd>.H, <Pg>/M, <Zn>.SFCVT <Zd>.D, <Pg>/M, <Zn>.SFCVT <Zd>.H, <Pg>/M, <Zn>.DFCVT <Zd>.S, <Pg>/M, <Zn>.DudivUnsigned divideUDIV <Wd>, <Wn>, <Wm>UDIV <Xd>, <Xn>, <Xm>+UDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>	cpyfprtrn=Memory copy forward-only, reads unprivileged and non-temporal#CPYFPRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFERTRN  [ <Xd>]!, [<Xs>]!, <Xn>!bf1cvtlt�Convert each odd-numbered 8-bit floating-point element of the source vector to BFloat16 while downscaling the value, and place the results in the overlapping 16-bit elements of the destination vector. BF1CVTLT scales the values by 2BF1CVTLT <Zd>.H, <Zn>.BBF2CVTLT <Zd>.H, <Zn>.Bldsmaxb7Atomic signed maximum on byte in memory, without return4STSMAXB <Ws>, [<Xn|SP>]LDSMAXB  <Ws>, WZR, [<Xn|SP>]6STSMAXLB <Ws>, [<Xn|SP>]LDSMAXLB  <Ws>, WZR, [<Xn|SP>]aesimcAES inverse mix columnsAESIMC <Vd>.16B, <Vn>.16BAESIMC <Zdn>.B, <Zdn>.BcinvCINV -- A64Conditional invertCINV <Wd>, <Wn>, <invcond> CSINV   <Wd>, <Wn>, <Wm>, <cond>CINV <Xd>, <Xn>, <invcond> CSINV   <Xd>, <Xn>, <Xm>, <cond>subs+Subtract (extended register), setting flags1SUBS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2SUBS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}&SUBS <Wd>, <Wn|WSP>, #<imm>{, <shift>}%SUBS <Xd>, <Xn|SP>, #<imm>{, <shift>}*SUBS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*SUBS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}addqv��Unsigned addition of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as zero.ADDQV <Vd>.<T>, <Pg>, <Zn>.<Tb>bfmlslb��This BFloat16 floating-point multiply-subtract long instruction widens the even-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLSLB <Zda>.S, <Zn>.H, <Zm>.H&BFMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]pmullt�Polynomial multiply over [0, 1] the corresponding odd-numbered elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%PMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>PMULLT <Zd>.Q, <Zn>.D, <Zm>.Dldsetah$Atomic bit set on halfword in memoryLDSETAH <Ws>, <Wt>, [<Xn|SP>]LDSETALH <Ws>, <Wt>, [<Xn|SP>]LDSETH <Ws>, <Wt>, [<Xn|SP>]LDSETLH <Ws>, <Wt>, [<Xn|SP>]	sqdmlalbt�Multiply then double the corresponding even-numbered signed elements of the first and odd-numbered signed elements of the second source vector. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2)SQDMLALBT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>orrs�"Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #ORRS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bwhilegt�Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, signed scalar operand is greater than the second scalar operand and false thereafter down to the lowest numbered element. WHILEGT <Pd>.<T>, <R><n>, <R><m>#WHILEGT <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEGT { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>asrd��Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The result rounds toward zero as in a signed division. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.+ASRD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>fnmad�[Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.+FNMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>sqdecw�kDetermines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECW <Xdn>{, <pattern>{, MUL #<imm>}})SQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}ssubwt�
Subtract the even-numbered signed elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SSUBWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>umaddlUnsigned multiply-add longUMADDL <Xd>, <Wn>, <Wm>, <Xa>cbnzCompare and branch on nonzeroCBNZ <Wt>, <label>CBNZ <Xt>, <label>stlurh*Store-release register halfword (unscaled)!STLURH <Wt>, [<Xn|SP>{, #<simm>}]ccmn(Conditional compare negative (immediate)"CCMN <Wn>, #<imm>, #<nzcv>, <cond>"CCMN <Xn>, #<imm>, #<nzcv>, <cond> CCMN <Wn>, <Wm>, #<nzcv>, <cond> CCMN <Xn>, <Xm>, #<nzcv>, <cond>rsubhnb�TSubtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant rounded half of the result in the even-numbered half-width destination elements, while setting the odd-numbered half-width destination elements to zero. This instruction is unpredicated.&RSUBHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>at	AT -- A64Address translateAT <at_op>, <Xt>$SYS   #<op1>, C7, <Cm>, #<op2>, <Xt>sysp128-bit system instruction1SYSP  # <op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}usublt�Subtract the odd-numbered unsigned elements of the second source vector from the corresponding unsigned elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%USUBLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>rorROR (immediate) -- A64Rotate right (immediate)ROR <Wd>, <Ws>, #<shift>!EXTR   <Wd>, <Ws>, <Ws>, #<shift>ROR <Xd>, <Xs>, #<shift>!EXTR   <Xd>, <Xs>, <Xs>, #<shift>ROR (register) -- A64Rotate right (register)ROR <Wd>, <Wn>, <Wm>RORV   <Wd>, <Wn>, <Wm>ROR <Xd>, <Xn>, <Xm>RORV   <Xd>, <Xn>, <Xm>fdivr�5Reversed divide active floating-point elements of the second source vector by corresponding floating-point elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.,FDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>smaxSigned maximum (vector)!SMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SMAX <Wd>, <Wn>, #<simm>SMAX <Xd>, <Xn>, #<simm>CSMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CSMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RSMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RSMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }SMAX <Wd>, <Wn>, <Wm>SMAX <Xd>, <Xn>, <Xm>+SMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!SMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>hltHalt instructionHLT  # <imm>srhaddSigned rounding halving add#SRHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-SRHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>addpl�Add the current predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDPL <Xd|SP>, <Xn|SP>, #<imm>ldtrb!Load register byte (unprivileged) LDTRB <Wt>, [<Xn|SP>{, #<simm>}]ldur+Load SIMD&amp;FP register (unscaled offset)LDUR <Bt>, [<Xn|SP>{, #<simm>}]LDUR <Ht>, [<Xn|SP>{, #<simm>}]LDUR <St>, [<Xn|SP>{, #<simm>}]LDUR <Dt>, [<Xn|SP>{, #<simm>}]LDUR <Qt>, [<Xn|SP>{, #<simm>}]LDUR <Wt>, [<Xn|SP>{, #<simm>}]LDUR <Xt>, [<Xn|SP>{, #<simm>}]bfmmlaBBFloat16 floating-point matrix multiply-accumulate into 2x2 matrix BFMMLA <Vd>.4S, <Vn>.8H, <Vm>.8HBFMMLA <Zda>.S, <Zn>.H, <Zm>.Hstnt1w�Contiguous store non-temporal of words from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]:STNT1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]:STNT1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]?STNT1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}];STNT1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]MSTNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2])STNT1W { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}])STNT1W { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]clasta��From the source vector register extract the element after the last active element, or if the last active element is the final element extract element zero, and then zero-extend that element to destructively place in the destination and first source general-purpose register. If there are no active elements then destructively zero-extend the least significant element-size bits of the destination and first source general-purpose register.'CLASTA <R><dn>, <Pg>, <R><dn>, <Zm>.<T>'CLASTA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>+CLASTA <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>lasta�.If there is an active element then extract the element after the last active element modulo the number of elements from the final source vector register. If there are no active elements, extract element zero. Then zero-extend and place the extracted element in the destination general-purpose register.LASTA <R><d>, <Pg>, <Zn>.<T>LASTA <V><d>, <Pg>, <Zn>.<T>autda&Authenticate data address, using key AAUTDA <Xd>, <Xn|SP>AUTDZA <Xd>maddMultiply-addMADD <Wd>, <Wn>, <Wm>, <Wa>MADD <Xd>, <Xn>, <Xm>, <Xa>st3h�6Contiguous store three-halfword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]whilege��Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, signed scalar operand is greater than or equal to the second scalar operand and false thereafter down to the lowest numbered element. WHILEGE <Pd>.<T>, <R><n>, <R><m>#WHILEGE <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEGE { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>addptAdd checked pointer-ADDPT <Xd|SP>, <Xn|SP>, <Xm>{, LSL #<amount>}&ADDPT <Zdn>.D, <Pg>/M, <Zdn>.D, <Zm>.DADDPT <Zd>.D, <Zn>.D, <Zm>.Dfccmpe5Floating-point conditional signaling compare (scalar)"FCCMPE <Hn>, <Hm>, #<nzcv>, <cond>"FCCMPE <Sn>, <Sm>, #<nzcv>, <cond>"FCCMPE <Dn>, <Dm>, #<nzcv>, <cond>ldursh(Load register signed halfword (unscaled)!LDURSH <Wt>, [<Xn|SP>{, #<simm>}]!LDURSH <Xt>, [<Xn|SP>{, #<simm>}]usubwt�-Subtract the odd-numbered unsigned elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated. This instruction is unpredicated.$USUBWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>ummla:Unsigned 8-bit integer matrix multiply-accumulate (vector)!UMMLA <Vd>.4S, <Vn>.16B, <Vm>.16BUMMLA <Zda>.S, <Zn>.B, <Zm>.Bldpsw"Load pair of registers signed word%LDPSW <Xt1>, <Xt2>, [<Xn|SP>], #<imm>&LDPSW <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!'LDPSW <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]st3w�2Contiguous store three-word structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]aesmcAES mix columnsAESMC <Vd>.16B, <Vn>.16BAESMC <Zdn>.B, <Zdn>.Buabdl!Unsigned absolute difference long*UABDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>f1cvtlt�Convert each odd-numbered 8-bit floating-point element of the source vector to half-precision while downscaling the value, and place the results in the overlapping 16-bit elements of the destination vector. F1CVTLT scales the values by 2F1CVTLT <Zd>.H, <Zn>.BF2CVTLT <Zd>.H, <Zn>.BstlrhStore-release register halfwordSTLRH <Wt>, [<Xn|SP>{, #0}]btiBranch target identificationBTI  { <targets>}frsqrts*Floating-point reciprocal square root stepFRSQRTS <Hd>, <Hn>, <Hm>FRSQRTS <V><d>, <V><n>, <V><m>$FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FRSQRTS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ld1sb�)Gather load of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LD1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}],LD1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}])LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>])LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>])LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]2LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]2LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]+LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]brkn�If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise leaves the destination and second source predicate unchanged. Does not set the condition flags.%BRKN <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.Brcwsetp5Read check write atomic bit set on quadword in memoryRCWSETP <Xt1>, <Xt2>, [<Xn|SP>] RCWSETPA <Xt1>, <Xt2>, [<Xn|SP>]!RCWSETPAL <Xt1>, <Xt2>, [<Xn|SP>] RCWSETPL <Xt1>, <Xt2>, [<Xn|SP>]sadalp'Signed add and accumulate long pairwiseSADALP <Vd>.<Ta>, <Vn>.<Tb>#SADALP <Zda>.<T>, <Pg>/M, <Zn>.<Tb>umlslt�Multiply the corresponding odd-numbered unsigned elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLSLT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLSLT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]ldiapp+Load-Acquire RCpc ordered pair of registers"LDIAPP <Wt1>, <Wt2>, [<Xn|SP>], #8LDIAPP <Wt1>, <Wt2>, [<Xn|SP>]#LDIAPP <Xt1>, <Xt2>, [<Xn|SP>], #16LDIAPP <Xt1>, <Xt2>, [<Xn|SP>]frint64z;Floating-point round to 64-bit integer toward zero (vector)FRINT64Z <Vd>.<T>, <Vn>.<T>FRINT64Z <Sd>, <Sn>FRINT64Z <Dd>, <Dn>negNegate (vector)NEG  D <d>, D<n>NEG <Vd>.<T>, <Vn>.<T>NEG <Zd>.<T>, <Pg>/M, <Zn>.<T>NEG (shifted register) -- A64Negate (shifted register)#NEG <Wd>, <Wm>{, <shift> #<amount>}*SUB   <Wd>, WZR, <Wm>{, <shift> #<amount>}#NEG <Xd>, <Xm>{, <shift> #<amount>}*SUB   <Xd>, XZR, <Xm>{, <shift> #<amount>}ldadd*Atomic add on word or doubleword in memoryLDADD <Ws>, <Wt>, [<Xn|SP>]LDADDA <Ws>, <Wt>, [<Xn|SP>]LDADDAL <Ws>, <Wt>, [<Xn|SP>]LDADDL <Ws>, <Wt>, [<Xn|SP>]LDADD <Xs>, <Xt>, [<Xn|SP>]LDADDA <Xs>, <Xt>, [<Xn|SP>]LDADDAL <Xs>, <Xt>, [<Xn|SP>]LDADDL <Xs>, <Xt>, [<Xn|SP>]0STADD <Ws>, [<Xn|SP>]LDADD  <Ws>, WZR, [<Xn|SP>]2STADDL <Ws>, [<Xn|SP>]LDADDL  <Ws>, WZR, [<Xn|SP>]0STADD <Xs>, [<Xn|SP>]LDADD  <Xs>, XZR, [<Xn|SP>]2STADDL <Xs>, [<Xn|SP>]LDADDL  <Xs>, XZR, [<Xn|SP>]fcvtauZFloating-point convert to unsigned integer, rounding to nearest with ties to away (vector)
FCVTAU <Hd>, <Hn>FCVTAU <V><d>, <V><n>FCVTAU <Vd>.<T>, <Vn>.<T>FCVTAU <Vd>.<T>, <Vn>.<T>FCVTAU <Wd>, <Hn>FCVTAU <Xd>, <Hn>FCVTAU <Wd>, <Sn>FCVTAU <Xd>, <Sn>FCVTAU <Wd>, <Dn>FCVTAU <Xd>, <Dn>sm3tt1aSM3TT1A(SM3TT1A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]cmpCMP (extended register) -- A64Compare (extended register)*CMP <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2SUBS   WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}+CMP <Xn|SP>, <R><m>{, <extend> {#<amount>}}3SUBS   XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}CMP (immediate) -- A64Compare (immediate)CMP <Wn|WSP>, #<imm>{, <shift>}'SUBS   WZR, <Wn|WSP>, #<imm>{, <shift>}CMP <Xn|SP>, #<imm>{, <shift>}&SUBS   XZR, <Xn|SP>, #<imm>{, <shift>}CMP (shifted register) -- A64Compare (shifted register)#CMP <Wn>, <Wm>{, <shift> #<amount>}+SUBS   WZR, <Wn>, <Wm>{, <shift> #<amount>}#CMP <Xn>, <Xm>{, <shift> #<amount>}+SUBS   XZR, <Xn>, <Xm>{, <shift> #<amount>}ld64bSingle-copy atomic 64-byte LoadLD64B <Xt>, [<Xn|SP> {, #0}]usublb�	Subtract the even-numbered unsigned elements of the second source vector from the corresponding unsigned elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%USUBLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>ldurh!Load register halfword (unscaled) LDURH <Wt>, [<Xn|SP>{, #<simm>}]sqdmull5Signed saturating doubling multiply long (by element)5SQDMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>],SQDMULL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]!SQDMULL <Va><d>, <Vb><n>, <Vb><m>,SQDMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>rcwsswp3Read check write software swap doubleword in memoryRCWSSWP <Xs>, <Xt>, [<Xn|SP>]RCWSSWPA <Xs>, <Xt>, [<Xn|SP>]RCWSSWPAL <Xs>, <Xt>, [<Xn|SP>]RCWSSWPL <Xs>, <Xt>, [<Xn|SP>]sqcvtu�Saturate the signed integer value in each element of the two source vectors to unsigned integer value that is half the original source element width, and place the results in the half-width destination elements."SQCVTU <Zd>.H, { <Zn1>.S-<Zn2>.S }*SQCVTU <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }uaddlUnsigned add long (vector)*UADDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>sminpSigned minimum pairwise"SMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SMINP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ld1rNLoad one single-element structure and replicate to all lanes (of one register)LD1R  { <Vt>.<T> }, [<Xn|SP>]$LD1R  { <Vt>.<T> }, [<Xn|SP>], <imm>#LD1R  { <Vt>.<T> }, [<Xn|SP>], <Xm>ldaddb,Atomic add on byte in memory, without return2STADDB <Ws>, [<Xn|SP>]LDADDB  <Ws>, WZR, [<Xn|SP>]4STADDLB <Ws>, [<Xn|SP>]LDADDLB  <Ws>, WZR, [<Xn|SP>]sminqv�%Signed minimum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the maximum signed integer for the element size. SMINQV <Vd>.<T>, <Pg>, <Zn>.<Tb>fminnm&Floating-point minimum number (vector)#FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMINNM <Hd>, <Hn>, <Hm>FMINNM <Sd>, <Sn>, <Sm>FMINNM <Dd>, <Dn>, <Dm>EFMINNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>EFMINNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>TFMINNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TFMINNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>-FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>clrexClear exclusiveCLREX  {# <imm>}ld1rh�Load a single unsigned halfword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 2 in the range 0 to 126.-LD1RH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]ldrh"Load register halfword (immediate)LDRH <Wt>, [<Xn|SP>], #<simm>LDRH <Wt>, [<Xn|SP>, #<simm>]!LDRH <Wt>, [<Xn|SP>{, #<pimm>}]8LDRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]faddv�Floating-point add horizontally over all lanes of a vector using a recursive pairwise reduction, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as +0.0.FADDV <V><d>, <Pg>, <Zn>.<T>faminFloating-point absolute minimum"FAMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FAMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SFAMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SFAMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FAMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ld4q�3Contiguous load four-quadword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #4]notBitwise NOT (vector)NOT <Vd>.<T>, <Vn>.<T>NOT <Zd>.<T>, <Pg>/M, <Zn>.<T>NOT (predicate)�Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.NOT <Pd>.B, <Pg>/Z, <Pn>.B#EOR  <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.Buaddwt�Add the odd-numbered unsigned elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$UADDWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>umulhUnsigned multiply highUMULH <Xd>, <Xn>, <Xm>,UMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"UMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>uqshrnt�AShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2%UQSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>gcspopxGCSPOPX -- A641Guarded Control Stack pop exception return recordGCSPOPX SYS   #0, C7, C7, #6{, <Xt>}notsNOTS�Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the NOTS <Pd>.B, <Pg>/Z, <Pn>.B$EORS  <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.BfcvtzuMFloating-point convert to unsigned fixed-point, rounding toward zero (vector)FCVTZU <V><d>, <V><n>, #<fbits>#FCVTZU <Vd>.<T>, <Vn>.<T>, #<fbits>FCVTZU <Hd>, <Hn>FCVTZU <V><d>, <V><n>FCVTZU <Vd>.<T>, <Vn>.<T>FCVTZU <Vd>.<T>, <Vn>.<T>FCVTZU <Wd>, <Hn>, #<fbits>FCVTZU <Xd>, <Hn>, #<fbits>FCVTZU <Wd>, <Sn>, #<fbits>FCVTZU <Xd>, <Sn>, #<fbits>FCVTZU <Wd>, <Dn>, #<fbits>FCVTZU <Xd>, <Dn>, #<fbits>FCVTZU <Wd>, <Hn>FCVTZU <Xd>, <Hn>FCVTZU <Wd>, <Sn>FCVTZU <Xd>, <Sn>FCVTZU <Wd>, <Dn>FCVTZU <Xd>, <Dn>/FCVTZU { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FCVTZU { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }FCVTZU <Zd>.H, <Pg>/M, <Zn>.HFCVTZU <Zd>.S, <Pg>/M, <Zn>.HFCVTZU <Zd>.D, <Pg>/M, <Zn>.HFCVTZU <Zd>.S, <Pg>/M, <Zn>.SFCVTZU <Zd>.D, <Pg>/M, <Zn>.SFCVTZU <Zd>.S, <Pg>/M, <Zn>.DFCVTZU <Zd>.D, <Pg>/M, <Zn>.DfrintaGFloating-point round to integral, to nearest with ties to away (vector)FRINTA <Vd>.<T>, <Vn>.<T>FRINTA <Vd>.<T>, <Vn>.<T>FRINTA <Hd>, <Hn>FRINTA <Sd>, <Sn>FRINTA <Dd>, <Dn>/FRINTA { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTA { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }index�Populates the destination vector by setting the first element to the first signed immediate integer operand and monotonically incrementing the value by the second signed immediate integer operand for each subsequent element. This instruction is unpredicated. INDEX <Zd>.<T>, #<imm1>, #<imm2>INDEX <Zd>.<T>, #<imm>, <R><m>INDEX <Zd>.<T>, <R><n>, #<imm>INDEX <Zd>.<T>, <R><n>, <R><m>brka�RSets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.BRKA <Pd>.B, <Pg>/<ZM>, <Pn>.Buqincp�Counts the number of true elements in the source predicate and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.UQINCP <Wdn>, <Pm>.<T>UQINCP <Xdn>, <Pm>.<T>UQINCP <Zdn>.<T>, <Pm>.<T>brkpas�aIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the %BRKPAS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BnopNo operationNOP sttrStore register (unprivileged)STTR <Wt>, [<Xn|SP>{, #<simm>}]STTR <Xt>, [<Xn|SP>{, #<simm>}]facltFACLT��Compare active absolute values of floating-point elements in the first source vector being less than corresponding absolute values of elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FACLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FACGT    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>moviMove immediate (vector) MOVI <Vd>.<T>, #<imm8>{, LSL #0}'MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}'MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}%MOVI <Vd>.<T>, #<imm8>, MSL #<amount>MOVI <Dd>, #<imm>MOVI <Vd>.2D, #<imm>adds&Add (extended register), setting flags1ADDS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2ADDS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}&ADDS <Wd>, <Wn|WSP>, #<imm>{, <shift>}%ADDS <Xd>, <Xn|SP>, #<imm>{, <shift>}*ADDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*ADDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}ld1w�Contiguous load of unsigned words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]:LD1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]?LD1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]MLD1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]+LD1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]+LD1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1W { <Zt>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]0LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]0LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]0LD1W { <Zt>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]4LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]4LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]1LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]1LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]2LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]*LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ELD1W { <ZAt><HV>.S[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]csetmCSETM -- A64Conditional set maskCSETM <Wd>, <invcond>CSINV   <Wd>, WZR, WZR, <cond>CSETM <Xd>, <invcond>CSINV   <Xd>, XZR, XZR, <cond>bic%Bitwise bit clear (vector, immediate)&BIC <Vd>.<T>, #<imm8>{, LSL #<amount>}&BIC <Vd>.<T>, #<imm8>{, LSL #<amount>} BIC <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)BIC <Wd>, <Wn>, <Wm>{, <shift> #<amount>})BIC <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*BIC <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>BIC <Zd>.D, <Zn>.D, <Zm>.DBIC (immediate)�CBitwise clear bits using immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated."BIC <Zdn>.<T>, <Zdn>.<T>, #<const>*AND  <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)ldaxrLoad-acquire exclusive registerLDAXR <Wt>, [<Xn|SP>{, #0}]LDAXR <Xt>, [<Xn|SP>{, #0}]rorvRotate right variableRORV <Wd>, <Wn>, <Wm>RORV <Xd>, <Xn>, <Xm>ldeor3Atomic exclusive-OR on word or doubleword in memoryLDEOR <Ws>, <Wt>, [<Xn|SP>]LDEORA <Ws>, <Wt>, [<Xn|SP>]LDEORAL <Ws>, <Wt>, [<Xn|SP>]LDEORL <Ws>, <Wt>, [<Xn|SP>]LDEOR <Xs>, <Xt>, [<Xn|SP>]LDEORA <Xs>, <Xt>, [<Xn|SP>]LDEORAL <Xs>, <Xt>, [<Xn|SP>]LDEORL <Xs>, <Xt>, [<Xn|SP>]0STEOR <Ws>, [<Xn|SP>]LDEOR  <Ws>, WZR, [<Xn|SP>]2STEORL <Ws>, [<Xn|SP>]LDEORL  <Ws>, WZR, [<Xn|SP>]0STEOR <Xs>, [<Xn|SP>]LDEOR  <Xs>, XZR, [<Xn|SP>]2STEORL <Xs>, [<Xn|SP>]LDEORL  <Xs>, XZR, [<Xn|SP>]bcaxBit clear and exclusive-OR+BCAX <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B%BCAX <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dldff1d�VGather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.-LDFF1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #3}]6LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]3LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]4LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3],LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]smaxqv�%Signed maximum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the minimum signed integer for the element size. SMAXQV <Vd>.<T>, <Pg>, <Zn>.<Tb>st2q�2Contiguous store two-quadword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #4]whilerwnThis instruction checks two addresses for a conflict or overlap between address ranges of the form [addr,addr+WHILERW <Pd>.<T>, <Xn>, <Xm>setgpn)Memory set with tag setting, non-temporalSETGPN  [ <Xd>]!, <Xn>!, <Xs>SETGMN  [ <Xd>]!, <Xn>!, <Xs>SETGEN  [ <Xd>]!, <Xn>!, <Xs>sumops>The 8-bit integer variant works with a 32-bit element ZA tile./SUMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/SUMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hmlapt�Multiply with overflow check the elements of the first and second source vectors and add pointer check to elements of the third source (addend) vector. Destructively place the results in the destination and third source (addend) vector.MLAPT <Zda>.D, <Zn>.D, <Zm>.Dnor�1Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags."NOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bsabdlb�&Compute the absolute difference between even-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SABDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>smcSecure monitor callSMC  # <imm>sxthSXTH -- A64Sign extend halfwordSXTH <Wd>, <Wn>SBFM   <Wd>, <Wn>, #0, #15SXTH <Xd>, <Wn>SBFM   <Xd>, <Xn>, #0, #15asrr��Reversed shift right, preserving the sign bit, active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.+ASRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>f1cvt�Convert each 8-bit floating-point element of the source vector to half-precision while downscaling the value, and place the results in the corresponding 16-bit elements of the destination vectors. F1CVT scales the values by 2!F1CVT { <Zd1>.H-<Zd2>.H }, <Zn>.B!F2CVT { <Zd1>.H-<Zd2>.H }, <Zn>.BF1CVT <Zd>.H, <Zn>.BF2CVT <Zd>.H, <Zn>.Bnands�Bitwise NAND active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the $NANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bsrshlr��Shift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Inactive elements in the destination vector register remain unmodified.-SRSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldsminb7Atomic signed minimum on byte in memory, without return4STSMINB <Ws>, [<Xn|SP>]LDSMINB  <Ws>, WZR, [<Xn|SP>]6STSMINLB <Ws>, [<Xn|SP>]LDSMINLB  <Ws>, WZR, [<Xn|SP>]ftsmulThe #FTSMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>stzgStore Allocation Tag, zeroing STZG <Xt|SP>, [<Xn|SP>], #<simm>!STZG <Xt|SP>, [<Xn|SP>, #<simm>]!"STZG <Xt|SP>, [<Xn|SP>{, #<simm>}]sev
Send eventSEV ldff1h�\Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.-LDFF1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]-LDFF1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LDFF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]4LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]4LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]6LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]6LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]3LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]4LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1],LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ldlarLoad LOAcquire registerLDLAR <Wt>, [<Xn|SP>{, #0}]LDLAR <Xt>, [<Xn|SP>{, #0}]sqdecd�kDetermines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECD <Xdn>{, <pattern>{, MUL #<imm>}})SQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}pacda9Pointer Authentication Code for data address, using key APACDA <Xd>, <Xn|SP>PACDZA <Xd>uunpkhi�Unpack elements from the lowest or highest half of the source vector and then zero-extend them to place in elements of twice their size within the destination vector. This instruction is unpredicated.UUNPKHI <Zd>.<T>, <Zn>.<Tb>UUNPKLO <Zd>.<T>, <Zn>.<Tb>ubfxUBFX -- A64Unsigned bitfield extract!UBFX <Wd>, <Wn>, #<lsb>, #<width>-UBFM   <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)!UBFX <Xd>, <Xn>, #<lsb>, #<width>-UBFM   <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)whilels�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, unsigned scalar operand is lower or same as the second scalar operand and false thereafter up to the highest numbered element. WHILELS <Pd>.<T>, <R><n>, <R><m>#WHILELS <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELS { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>ldapursh5Load-acquire RCpc register signed halfword (unscaled)#LDAPURSH <Wt>, [<Xn|SP>{, #<simm>}]#LDAPURSH <Xt>, [<Xn|SP>{, #<simm>}]nmatch��This instruction compares each active 8-bit or 16-bit character in the first source vector with all of the characters in the corresponding 128-bit segment of the second source vector. Where the first source element detects no matching characters in the second segment it places true in the corresponding element of the destination predicate, otherwise false. Inactive elements in the destination predicate register are set to zero. Sets the +NMATCH <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>sshllb�1Shift left by immediate each even-numbered signed element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$SSHLLB <Zd>.<T>, <Zn>.<Tb>, #<const>sqnegSigned saturating negateSQNEG <V><d>, <V><n>SQNEG <Vd>.<T>, <Vn>.<T> SQNEG <Zd>.<T>, <Pg>/M, <Zn>.<T>prfb�Gather prefetch of bytes from the active memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive addresses are not prefetched from memory.&PRFB <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFB <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]#PRFB <prfop>, <Pg>, [<Xn|SP>, <Xm>],PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>],PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]%PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]ld1rob�Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]*LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]ldnt1sw�,Gather load non-temporal of signed words to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LDNT1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]saddlb�Add the corresponding even-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SADDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>sha1mSHA1 hash update (majority)SHA1M <Qd>, <Sn>, <Vm>.4Sfnmla�\Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.+FNMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>stllrh!Store LORelease register halfwordSTLLRH <Wt>, [<Xn|SP>{, #0}]uqincb�)Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCB <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCB <Xdn>{, <pattern>{, MUL #<imm>}}fcmp%Floating-point quiet compare (scalar)FCMP <Hn>, <Hm>FCMP <Hn>, #0.0FCMP <Sn>, <Sm>FCMP <Sn>, #0.0FCMP <Dn>, <Dm>FCMP <Dn>, #0.0revd�Reverse the order of 64-bit doublewords within each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.REVD <Zd>.Q, <Pg>/M, <Zn>.Qusmlall�aThis unsigned by signed integer multiply-add long-long instruction multiplies each unsigned 8-bit element in the one, two, or four first source vectors with each signed 8-bit indexed element of the second source vector, widens each product to 32-bits and destructively adds these values to the corresponding 32-bit elements of the ZA quad-vector groups.=USMLALL ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]RUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]4USMLALL ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.BIUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BIUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BVUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }VUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }ldeorh9Atomic exclusive-OR on halfword in memory, without return2STEORH <Ws>, [<Xn|SP>]LDEORH  <Ws>, WZR, [<Xn|SP>]4STEORLH <Ws>, [<Xn|SP>]LDEORLH  <Ws>, WZR, [<Xn|SP>]tblq�(For each 128-bit destination vector segment, reads each element of the corresponding second source (index) vector segment and uses its value to select an indexed element from the corresponding first source (table) vector segment. The indexed table element is placed in the element of the destination vector that corresponds to the index vector element. If an index value is greater than or equal to the number of elements in a 128-bit vector segment then it places zero in the corresponding destination vector element. This instruction is unpredicated.%TBLQ <Zd>.<T>, { <Zn>.<T> }, <Zm>.<T>cmgt$Compare signed greater than (vector)CMGT  D <d>, D<n>, D<m>!CMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>CMGT  D <d>, D<n>, #0CMGT <Vd>.<T>, <Vn>.<T>, #0smmla8Signed 8-bit integer matrix multiply-accumulate (vector)!SMMLA <Vd>.4S, <Vn>.16B, <Vm>.16BSMMLA <Zda>.S, <Zn>.B, <Zm>.B