asm-lsp 0.10.1 - Docs.rs

�odecb�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination.%DECB <Xdn>{, <pattern>{, MUL #<imm>}}%DECD <Xdn>{, <pattern>{, MUL #<imm>}}%DECH <Xdn>{, <pattern>{, MUL #<imm>}}%DECW <Xdn>{, <pattern>{, MUL #<imm>}}ld3b�1Contiguous load three-byte structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]smaxqv�%Signed maximum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the minimum signed integer for the element size. SMAXQV <Vd>.<T>, <Pg>, <Zn>.<Tb>ptrues�#Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to false otherwise. If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false.PTRUES <Pd>.<T>{, <pattern>}setpnMemory set, non-temporalSETPN  [ <Xd>]!, <Xn>!, <Xs>SETMN  [ <Xd>]!, <Xn>!, <Xs>SETEN  [ <Xd>]!, <Xn>!, <Xs>cpyfpwtnLMemory copy forward-only, writes unprivileged, reads and writes non-temporal"CPYFPWTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMWTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFEWTN  [ <Xd>]!, [<Xs>]!, <Xn>!fcpy�Copy a floating-point immediate into each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FCPY <Zd>.<T>, <Pg>/M, #<const>fcvtnsXFloating-point convert to signed integer, rounding to nearest with ties to even (vector)
FCVTNS <Hd>, <Hn>FCVTNS <V><d>, <V><n>FCVTNS <Vd>.<T>, <Vn>.<T>FCVTNS <Vd>.<T>, <Vn>.<T>FCVTNS <Wd>, <Hn>FCVTNS <Xd>, <Hn>FCVTNS <Wd>, <Sn>FCVTNS <Xd>, <Sn>FCVTNS <Wd>, <Dn>FCVTNS <Xd>, <Dn>ssublSigned subtract long*SSUBL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>cmle2Compare signed less than or equal to zero (vector)CMLE  D <d>, D<n>, #0CMLE <Vd>.<T>, <Vn>.<T>, #0	sha512su1SHA512 schedule update 1#SHA512SU1 <Vd>.2D, <Vn>.2D, <Vm>.2Dst64bv3Single-copy atomic 64-byte store with status resultST64BV <Xs>, <Xt>, [<Xn|SP>]wfiWait for interruptWFI frint32xLFloating-point round to 32-bit integer, using current rounding mode (vector)FRINT32X <Vd>.<T>, <Vn>.<T>FRINT32X <Sd>, <Sn>FRINT32X <Dd>, <Dn>brkas�Sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKAS <Pd>.B, <Pg>/Z, <Pn>.BfcvtnuZFloating-point convert to unsigned integer, rounding to nearest with ties to even (vector)
FCVTNU <Hd>, <Hn>FCVTNU <V><d>, <V><n>FCVTNU <Vd>.<T>, <Vn>.<T>FCVTNU <Vd>.<T>, <Vn>.<T>FCVTNU <Wd>, <Hn>FCVTNU <Xd>, <Hn>FCVTNU <Wd>, <Sn>FCVTNU <Xd>, <Sn>FCVTNU <Wd>, <Dn>FCVTNU <Xd>, <Dn>ldur+Load SIMD&amp;FP register (unscaled offset)LDUR <Bt>, [<Xn|SP>{, #<simm>}]LDUR <Ht>, [<Xn|SP>{, #<simm>}]LDUR <St>, [<Xn|SP>{, #<simm>}]LDUR <Dt>, [<Xn|SP>{, #<simm>}]LDUR <Qt>, [<Xn|SP>{, #<simm>}]LDUR <Wt>, [<Xn|SP>{, #<simm>}]LDUR <Xt>, [<Xn|SP>{, #<simm>}]autib-Authenticate instruction address, using key BAUTIB <Xd>, <Xn|SP>AUTIZB <Xd>
AUTIB1716 AUTIBSP AUTIBZ smlsl2Signed multiply-subtract long (vector, by element)
3SMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*SMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=SMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4SMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HISMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HISMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VSMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }autib171615-Authenticate instruction address, using key BAUTIB171615 fabd+Floating-point absolute difference (vector)FABD <Hd>, <Hn>, <Hm>FABD <V><d>, <V><n>, <V><m>!FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+FABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>strbStore register byte (immediate)STRB <Wt>, [<Xn|SP>], #<simm>STRB <Wt>, [<Xn|SP>, #<simm>]!STRB <Wt>, [<Xn|SP>{, #<pimm>}]6STRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]*STRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]uminqv�)Unsigned minimum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the maximum unsigned integer for the element size. UMINQV <Vd>.<T>, <Pg>, <Zn>.<Tb>sudotBDot product with signed and unsigned integers (vector, by element),SUDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$SUDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]ISUDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]ISUDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@SUDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@SUDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BcselConditional selectCSEL <Wd>, <Wn>, <Wm>, <cond>CSEL <Xd>, <Xn>, <Xm>, <cond>extq��For each 128-bit vector segment of the result, copy the indexed byte up to and including the last byte of the corresponding first source vector segment to the bottom of the result segment, then fill the remainder of the result segment starting from the first byte of the corresponding second source vector segment. The result segments are destructively placed in the corresponding first source vector segment. This instruction is unpredicated.%EXTQ <Zdn>.B, <Zdn>.B, <Zm>.B, #<imm>ldnf1w��Contiguous load with non-faulting behavior of unsigned words to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]ssbbSSBB -- A64 Speculative store bypass barrierSSBB DSB   #0fnmla�\Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.+FNMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>lduminab)Atomic unsigned minimum on byte in memoryLDUMINAB <Ws>, <Wt>, [<Xn|SP>]LDUMINALB <Ws>, <Wt>, [<Xn|SP>]LDUMINB <Ws>, <Wt>, [<Xn|SP>]LDUMINLB <Ws>, <Wt>, [<Xn|SP>]sturh"Store register halfword (unscaled) STURH <Wt>, [<Xn|SP>{, #<simm>}]at	AT -- A64Address translateAT <at_op>, <Xt>$SYS   #<op1>, C7, <Cm>, #<op2>, <Xt>punpkhi�Unpack elements from the lowest or highest half of the source predicate and place in elements of twice their size within the destination predicate. This instruction is unpredicated.PUNPKHI <Pd>.H, <Pn>.BPUNPKLO <Pd>.H, <Pn>.Bstllrh!Store LORelease register halfwordSTLLRH <Wt>, [<Xn|SP>{, #0}]ldgmLoad tag multipleLDGM <Xt>, [<Xn|SP>]isb#Instruction synchronization barrierISB  { <option>|#<imm>}ld1rqb�Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]*LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]ld3h�5Contiguous load three-halfword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]dupm��Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits.DUPM <Zd>.<T>, #<const>fnmsub7Floating-point negated fused multiply-subtract (scalar)FNMSUB <Hd>, <Hn>, <Hm>, <Ha>FNMSUB <Sd>, <Sn>, <Sm>, <Sa>FNMSUB <Dd>, <Dn>, <Dm>, <Da>prfh�Gather prefetch of halfwords from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive addresses are not prefetched from memory.&PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFH <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]+PRFH <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #1]/PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]/PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]-PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1]ldseth4Atomic bit set on halfword in memory, without return2STSETH <Ws>, [<Xn|SP>]LDSETH  <Ws>, WZR, [<Xn|SP>]4STSETLH <Ws>, [<Xn|SP>]LDSETLH  <Ws>, WZR, [<Xn|SP>]cfinvInvert carry flagCFINV sttrb"Store register byte (unprivileged) STTRB <Wt>, [<Xn|SP>{, #<simm>}]srsra6Signed rounding shift right and accumulate (immediate)SRSRA  D <d>, D<n>, #<shift>"SRSRA <Vd>.<T>, <Vn>.<T>, #<shift>#SRSRA <Zda>.<T>, <Zn>.<T>, #<const>shll!Shift left long (by element size)(SHLL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, #<shift>uhaddUnsigned halving add"UHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>fminFloating-point minimum (vector)!FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMIN <Hd>, <Hn>, <Hm>FMIN <Sd>, <Sn>, <Sm>FMIN <Dd>, <Dn>, <Dm>CFMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CFMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RFMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RFMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }*FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sqrdmulhMSigned saturating rounding doubling multiply returning high half (by element)+SQRDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>]/SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQRDMULH <V><d>, <V><n>, <V><m>%SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>%SQRDMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>&SQRDMULH <Zd>.H, <Zn>.H, <Zm>.H[<imm>]&SQRDMULH <Zd>.S, <Zn>.S, <Zm>.S[<imm>]&SQRDMULH <Zd>.D, <Zn>.D, <Zm>.D[<imm>]fmopanThe 8-bit floating-point sum of outer products and accumulate instruction works with a 16-bit element ZA tile..FMOPA <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.FMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.FMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPA <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.S.FMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.D, <Zm>.D	sha256su0SHA256 schedule update 0SHA256SU0 <Vd>.4S, <Vn>.4Sfvdott��The instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with the higher-numbered horizontal group of two 8-bit floating-point values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are scaled by 2GFVDOTT  ZA.S[ <Wv>, <offs>, VGx4], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]mls7Multiply-subtract from accumulator (vector, by element)*MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>] MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)MLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>"MLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]"MLS <Zda>.S, <Zn>.S, <Zm>.S[<imm>]"MLS <Zda>.D, <Zn>.D, <Zm>.D[<imm>]ld2rKLoad single 2-element structure and replicate to all lanes of two registers(LD2R  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]/LD2R  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>.LD2R  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>ld1h�Contiguous load of unsigned halfwords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]:LD1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]?LD1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]MLD1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]+LD1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]+LD1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]0LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]0LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]0LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]4LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]4LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]1LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]1LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]2LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]*LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ELD1H { <ZAt><HV>.H[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]eorv�Bitwise exclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as zero.EORV <V><d>, <Pg>, <Zn>.<T>st2h�2Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]ld1sb�)Gather load of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LD1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}],LD1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}])LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>])LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>])LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]2LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]2LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]+LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]retReturn from subroutineRET  { <Xn>}ldxp Load exclusive pair of registers"LDXP <Wt1>, <Wt2>, [<Xn|SP>{, #0}]"LDXP <Xt1>, <Xt2>, [<Xn|SP>{, #0}]fmlslb��This half-precision floating-point multiply-subtract long instruction widens the even-numbered half-precision elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding half-precision elements in the source vectors. This instruction is unpredicated.FMLSLB <Zda>.S, <Zn>.H, <Zm>.H%FMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]lsrr��Reversed shift right, inserting zeroes, active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.+LSRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldp"Load pair of SIMD&amp;FP registers#LDP <St1>, <St2>, [<Xn|SP>], #<imm>#LDP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>#LDP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>$LDP <St1>, <St2>, [<Xn|SP>, #<imm>]!$LDP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!$LDP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!%LDP <St1>, <St2>, [<Xn|SP>{, #<imm>}]%LDP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]%LDP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]#LDP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>#LDP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>$LDP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!$LDP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!%LDP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]%LDP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]ldclrb2Atomic bit clear on byte in memory, without return2STCLRB <Ws>, [<Xn|SP>]LDCLRB  <Ws>, WZR, [<Xn|SP>]4STCLRLB <Ws>, [<Xn|SP>]LDCLRLB  <Ws>, WZR, [<Xn|SP>]umullb�Multiply the corresponding even-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$UMULLB <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$UMULLB <Zd>.D, <Zn>.S, <Zm>.S[<imm>]tbzTest bit and branch if zeroTBZ <R><t>, #<imm>, <label>cmpleCMPLE (vectors)�WCompare active signed integer elements in the first source vector being less than or equal to corresponding signed elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPGE    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>psel�{If the indexed element of the second source predicate is true, place the contents of the first source predicate register into the destination predicate register, otherwise set the destination predicate to all-false. The indexed element is determined by the sum of a general-purpose index register and an immediate, modulo the number of elements. Does not set the condition flags.&PSEL <Pd>, <Pn>, <Pm>.<T>[<Wv>, <imm>]rorROR (immediate) -- A64Rotate right (immediate)ROR <Wd>, <Ws>, #<shift>!EXTR   <Wd>, <Ws>, <Ws>, #<shift>ROR <Xd>, <Xs>, #<shift>!EXTR   <Xd>, <Xs>, <Xs>, #<shift>ROR (register) -- A64Rotate right (register)ROR <Wd>, <Wn>, <Wm>RORV   <Wd>, <Wn>, <Wm>ROR <Xd>, <Xn>, <Xm>RORV   <Xd>, <Xn>, <Xm>prfw�Gather prefetch of words from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive addresses are not prefetched from memory.&PRFW <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFW <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]+PRFW <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #2]/PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]/PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]-PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2]stxrStore exclusive register STXR <Ws>, <Wt>, [<Xn|SP>{, #0}] STXR <Ws>, <Xt>, [<Xn|SP>{, #0}]uabalb�Compute the absolute difference between even-numbered unsigned elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UABALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>eorbt�?Interleaving exclusive OR between the even-numbered elements of the first source vector register and the odd-numbered elements of the second source vector register, placing the result in the even-numbered elements of the destination vector, leaving the odd-numbered elements unchanged. This instruction is unpredicated."EORBT <Zd>.<T>, <Zn>.<T>, <Zm>.<T>pacda9Pointer Authentication Code for data address, using key APACDA <Xd>, <Xn|SP>PACDZA <Xd>trcitTRCIT -- A64Trace instrumentation
TRCIT <Xt>SYS   #3, C7, C2, #7, <Xt>cpypMemory copyCPYP  [ <Xd>]!, [<Xs>]!, <Xn>!CPYM  [ <Xd>]!, [<Xs>]!, <Xn>!CPYE  [ <Xd>]!, [<Xs>]!, <Xn>!fcaddFloating-point complex add-FCADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>5FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>revd�Reverse the order of 64-bit doublewords within each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.REVD <Zd>.Q, <Pg>/M, <Zn>.QfmlalbM8-bit floating-point multiply-add long to half-precision (vector, by element))FMLALB <Vd>.8H, <Vn>.16B, <Vm>.B[<index>])FMLALT <Vd>.8H, <Vn>.16B, <Vm>.B[<index>]"FMLALB <Vd>.8H, <Vn>.16B, <Vm>.16B"FMLALT <Vd>.8H, <Vn>.16B, <Vm>.16BFMLALB <Zda>.H, <Zn>.B, <Zm>.B%FMLALB <Zda>.H, <Zn>.B, <Zm>.B[<imm>]FMLALB <Zda>.S, <Zn>.H, <Zm>.H%FMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]st1PStore multiple single-element structures from one, two, three, or four registersST1  { <Vt>.<T> }, [<Xn|SP>]'ST1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]2ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]=ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]#ST1  { <Vt>.<T> }, [<Xn|SP>], <imm>"ST1  { <Vt>.<T> }, [<Xn|SP>], <Xm>.ST1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-ST1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>9ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8ST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>DST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CST1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>#ST1  { <Vt>.B }[<index>], [<Xn|SP>]#ST1  { <Vt>.H }[<index>], [<Xn|SP>]#ST1  { <Vt>.S }[<index>], [<Xn|SP>]#ST1  { <Vt>.D }[<index>], [<Xn|SP>]'ST1  { <Vt>.B }[<index>], [<Xn|SP>], #1)ST1  { <Vt>.B }[<index>], [<Xn|SP>], <Xm>'ST1  { <Vt>.H }[<index>], [<Xn|SP>], #2)ST1  { <Vt>.H }[<index>], [<Xn|SP>], <Xm>'ST1  { <Vt>.S }[<index>], [<Xn|SP>], #4)ST1  { <Vt>.S }[<index>], [<Xn|SP>], <Xm>'ST1  { <Vt>.D }[<index>], [<Xn|SP>], #8)ST1  { <Vt>.D }[<index>], [<Xn|SP>], <Xm>sqshrnb�<Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's signed integer range -2%SQSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>fcvtpsPFloating-point convert to signed integer, rounding toward plus infinity (vector)
FCVTPS <Hd>, <Hn>FCVTPS <V><d>, <V><n>FCVTPS <Vd>.<T>, <Vn>.<T>FCVTPS <Vd>.<T>, <Vn>.<T>FCVTPS <Wd>, <Hn>FCVTPS <Xd>, <Hn>FCVTPS <Wd>, <Sn>FCVTPS <Xd>, <Sn>FCVTPS <Wd>, <Dn>FCVTPS <Xd>, <Dn>fcmle:Floating-point compare less than or equal to zero (vector)FCMLE <Hd>, <Hn>, #0.0FCMLE <V><d>, <V><n>, #0.0FCMLE <Vd>.<T>, <Vn>.<T>, #0.0FCMLE <Vd>.<T>, <Vn>.<T>, #0.0FCMLE (vectors)�hCompare active floating-point elements in the first source vector being less than or equal to corresponding elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FCMLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FCMGE    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>asrr��Reversed shift right, preserving the sign bit, active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.+ASRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>decpxCounts the number of true elements in the source predicate and then uses the result to decrement the scalar destination.DECP <Xdn>, <Pm>.<T>DECP <Zdn>.<T>, <Pm>.<T>stp#Store pair of SIMD&amp;FP registers#STP <St1>, <St2>, [<Xn|SP>], #<imm>#STP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>#STP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>$STP <St1>, <St2>, [<Xn|SP>, #<imm>]!$STP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!$STP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!%STP <St1>, <St2>, [<Xn|SP>{, #<imm>}]%STP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]%STP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]#STP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>#STP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>$STP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!$STP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!%STP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]%STP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]saddlb�Add the corresponding even-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SADDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>bfmul�Multiply active BFloat16 elements of the second source vector to corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.&BFMUL <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HBFMUL <Zd>.H, <Zn>.H, <Zm>.H#BFMUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]saddlt�Add the corresponding odd-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SADDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>uxtb�Zero-extend the least-significant sub-element of each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.UXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>UXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>UXTW <Zd>.D, <Pg>/M, <Zn>.DUXTB -- A64Unsigned extend byteUXTB <Wd>, <Wn>UBFM   <Wd>, <Wn>, #0, #7bfvdot��The instruction computes the sum-of-products of each vertical pair of BFloat16 values in the corresponding elements of the two first source vectors with the pair of BFloat16 values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are destructively added to the corresponding single-precision elements of the two ZA single-vector groups.IBFVDOT  ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]ldeorah)Atomic exclusive-OR on halfword in memoryLDEORAH <Ws>, <Wt>, [<Xn|SP>]LDEORALH <Ws>, <Wt>, [<Xn|SP>]LDEORH <Ws>, <Wt>, [<Xn|SP>]LDEORLH <Ws>, <Wt>, [<Xn|SP>]gcspushmGCSPUSHM -- A64Guarded Control Stack push
GCSPUSHM <Xt>SYS   #3, C7, C7, #0, <Xt>brkpbs�eIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the %BRKPBS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bsqdecd�kDetermines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECD <Xdn>{, <pattern>{, MUL #<imm>}})SQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}smstartSMSTART -- A64@Enables access to Streaming SVE mode and SME architectural stateSMSTART  { <option>}MSR   <pstatefield>,   #1lsl�GShift left by immediate each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. Inactive elements in the destination vector register remain unmodified.*LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>(LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D*LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> LSL <Zd>.<T>, <Zn>.<T>, #<const>LSL <Zd>.<T>, <Zn>.<T>, <Zm>.DLSL (register) -- A64Logical shift left (register)LSL <Wd>, <Wn>, <Wm>LSLV   <Wd>, <Wn>, <Wm>LSL <Xd>, <Xn>, <Xm>LSLV   <Xd>, <Xn>, <Xm>LSL (immediate) -- A64Logical shift left (immediate)LSL <Wd>, <Wn>, #<shift>6UBFM   <Wd>, <Wn>, #(-<shift>  MOD  32), #(31-<shift>)LSL <Xd>, <Xn>, #<shift>6UBFM   <Xd>, <Xn>, #(-<shift>  MOD  64), #(63-<shift>)ldsminah+Atomic signed minimum on halfword in memoryLDSMINAH <Ws>, <Wt>, [<Xn|SP>]LDSMINALH <Ws>, <Wt>, [<Xn|SP>]LDSMINH <Ws>, <Wt>, [<Xn|SP>]LDSMINLH <Ws>, <Wt>, [<Xn|SP>]whilelt�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, signed scalar operand is less than the second scalar operand and false thereafter up to the highest numbered element. WHILELT <Pd>.<T>, <R><n>, <R><m>#WHILELT <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELT { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>addptAdd checked pointer-ADDPT <Xd|SP>, <Xn|SP>, <Xm>{, LSL #<amount>}&ADDPT <Zdn>.D, <Pg>/M, <Zdn>.D, <Zm>.DADDPT <Zd>.D, <Zn>.D, <Zm>.DbBranch conditionallyB. <cond>  <label>	B <label>ldsminh;Atomic signed minimum on halfword in memory, without return4STSMINH <Ws>, [<Xn|SP>]LDSMINH  <Ws>, WZR, [<Xn|SP>]6STSMINLH <Ws>, [<Xn|SP>]LDSMINLH  <Ws>, WZR, [<Xn|SP>]autda&Authenticate data address, using key AAUTDA <Xd>, <Xn|SP>AUTDZA <Xd>fcvtzuMFloating-point convert to unsigned fixed-point, rounding toward zero (vector)FCVTZU <V><d>, <V><n>, #<fbits>#FCVTZU <Vd>.<T>, <Vn>.<T>, #<fbits>FCVTZU <Hd>, <Hn>FCVTZU <V><d>, <V><n>FCVTZU <Vd>.<T>, <Vn>.<T>FCVTZU <Vd>.<T>, <Vn>.<T>FCVTZU <Wd>, <Hn>, #<fbits>FCVTZU <Xd>, <Hn>, #<fbits>FCVTZU <Wd>, <Sn>, #<fbits>FCVTZU <Xd>, <Sn>, #<fbits>FCVTZU <Wd>, <Dn>, #<fbits>FCVTZU <Xd>, <Dn>, #<fbits>FCVTZU <Wd>, <Hn>FCVTZU <Xd>, <Hn>FCVTZU <Wd>, <Sn>FCVTZU <Xd>, <Sn>FCVTZU <Wd>, <Dn>FCVTZU <Xd>, <Dn>/FCVTZU { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FCVTZU { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }FCVTZU <Zd>.H, <Pg>/M, <Zn>.HFCVTZU <Zd>.S, <Pg>/M, <Zn>.HFCVTZU <Zd>.D, <Pg>/M, <Zn>.HFCVTZU <Zd>.S, <Pg>/M, <Zn>.SFCVTZU <Zd>.D, <Pg>/M, <Zn>.SFCVTZU <Zd>.S, <Pg>/M, <Zn>.DFCVTZU <Zd>.D, <Pg>/M, <Zn>.DsrhaddSigned rounding halving add#SRHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-SRHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>fcvtmsQFloating-point convert to signed integer, rounding toward minus infinity (vector)
FCVTMS <Hd>, <Hn>FCVTMS <V><d>, <V><n>FCVTMS <Vd>.<T>, <Vn>.<T>FCVTMS <Vd>.<T>, <Vn>.<T>FCVTMS <Wd>, <Hn>FCVTMS <Xd>, <Hn>FCVTMS <Wd>, <Sn>FCVTMS <Xd>, <Sn>FCVTMS <Wd>, <Dn>FCVTMS <Xd>, <Dn>bf1cvtlt�Convert each odd-numbered 8-bit floating-point element of the source vector to BFloat16 while downscaling the value, and place the results in the overlapping 16-bit elements of the destination vector. BF1CVTLT scales the values by 2BF1CVTLT <Zd>.H, <Zn>.BBF2CVTLT <Zd>.H, <Zn>.B	sqrdcmlah��Multiply without saturation the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of the integral numbers in the first source vector by the corresponding complex number in the second source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation.0SQRDCMLAH <Zda>.<T>, <Zn>.<T>, <Zm>.<T>, <const>1SQRDCMLAH <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>1SQRDCMLAH <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>rcwcas6Read check write compare and swap doubleword in memoryRCWCAS <Xs>, <Xt>, [<Xn|SP>]RCWCASA <Xs>, <Xt>, [<Xn|SP>]RCWCASAL <Xs>, <Xt>, [<Xn|SP>]RCWCASL <Xs>, <Xt>, [<Xn|SP>]sumlall�aThis signed by unsigned integer multiply-add long-long instruction multiplies each signed 8-bit element in the one, two, or four first source vectors with each unsigned 8-bit indexed element of the second source vector, widens each product to 32-bits and destructively adds these values to the corresponding 32-bit elements of the ZA quad-vector groups.=SUMLALL ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]RSUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RSUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]ISUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BISUMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BswppSwap quadword in memorySWPP <Xt1>, <Xt2>, [<Xn|SP>]SWPPA <Xt1>, <Xt2>, [<Xn|SP>]SWPPAL <Xt1>, <Xt2>, [<Xn|SP>]SWPPL <Xt1>, <Xt2>, [<Xn|SP>]clasta��From the source vector register extract the element after the last active element, or if the last active element is the final element extract element zero, and then zero-extend that element to destructively place in the destination and first source general-purpose register. If there are no active elements then destructively zero-extend the least significant element-size bits of the destination and first source general-purpose register.'CLASTA <R><dn>, <Pg>, <R><dn>, <Zm>.<T>'CLASTA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>+CLASTA <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>sqdmlslb�Multiply then double the corresponding even-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLSLB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLSLB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]smov6Signed move vector element to general-purpose registerSMOV <Wd>, <Vn>.<Ts>[<index>]SMOV <Xd>, <Vn>.<Ts>[<index>]ldarLoad-acquire registerLDAR <Wt>, [<Xn|SP>{, #0}]LDAR <Xt>, [<Xn|SP>{, #0}]ssublbt�Subtract the odd-numbered signed elements of the second source vector from the even-numbered signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.&SSUBLBT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>pfirst�Sets the first active element in the destination predicate to true, otherwise elements from the source predicate are passed through unchanged. Sets the PFIRST <Pdn>.B, <Pg>, <Pdn>.Bnor�1Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags."NOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BcasbCompare and swap byte in memory CASB <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASAB <Ws>, <Wt>, [<Xn|SP>{, #0}]"CASALB <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASLB <Ws>, <Wt>, [<Xn|SP>{, #0}]frintxLFloating-point round to integral exact, using current rounding mode (vector)FRINTX <Vd>.<T>, <Vn>.<T>FRINTX <Vd>.<T>, <Vn>.<T>FRINTX <Hd>, <Hn>FRINTX <Sd>, <Sn>FRINTX <Dd>, <Dn>stzgm&Store Allocation Tag and zero multipleSTZGM <Xt>, [<Xn|SP>]ldclr0Atomic bit clear on word or doubleword in memoryLDCLR <Ws>, <Wt>, [<Xn|SP>]LDCLRA <Ws>, <Wt>, [<Xn|SP>]LDCLRAL <Ws>, <Wt>, [<Xn|SP>]LDCLRL <Ws>, <Wt>, [<Xn|SP>]LDCLR <Xs>, <Xt>, [<Xn|SP>]LDCLRA <Xs>, <Xt>, [<Xn|SP>]LDCLRAL <Xs>, <Xt>, [<Xn|SP>]LDCLRL <Xs>, <Xt>, [<Xn|SP>]0STCLR <Ws>, [<Xn|SP>]LDCLR  <Ws>, WZR, [<Xn|SP>]2STCLRL <Ws>, [<Xn|SP>]LDCLRL  <Ws>, WZR, [<Xn|SP>]0STCLR <Xs>, [<Xn|SP>]LDCLR  <Xs>, XZR, [<Xn|SP>]2STCLRL <Xs>, [<Xn|SP>]LDCLRL  <Xs>, XZR, [<Xn|SP>]fcvtasXFloating-point convert to signed integer, rounding to nearest with ties to away (vector)
FCVTAS <Hd>, <Hn>FCVTAS <V><d>, <V><n>FCVTAS <Vd>.<T>, <Vn>.<T>FCVTAS <Vd>.<T>, <Vn>.<T>FCVTAS <Wd>, <Hn>FCVTAS <Xd>, <Hn>FCVTAS <Wd>, <Sn>FCVTAS <Xd>, <Sn>FCVTAS <Wd>, <Dn>FCVTAS <Xd>, <Dn>bslBitwise select BSL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$BSL <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dprfum!Prefetch memory (unscaled offset)/PRFUM  ( <prfop>|#<imm5>), [<Xn|SP>{, #<simm>}]faddqv�(Floating-point addition of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as +0.0. FADDQV <Vd>.<T>, <Pg>, <Zn>.<Tb>aesdAES single round decryptionAESD <Vd>.16B, <Vn>.16BAESD <Zdn>.B, <Zdn>.B, <Zm>.Bsttrh&Store register halfword (unprivileged) STTRH <Wt>, [<Xn|SP>{, #<simm>}]st26Store multiple 2-element structures from two registers'ST2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>].ST2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-ST2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>,ST2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>],ST2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>],ST2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>],ST2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]0ST2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #22ST2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>0ST2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #42ST2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>0ST2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #82ST2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>1ST2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #162ST2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>sqcvtn�Saturate the signed integer value in each element of the group of two source vectors to half the original source element width, and place the two-way interleaved results in the half-width destination elements."SQCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }*SQCVTN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }brka�RSets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.BRKA <Pd>.B, <Pg>/<ZM>, <Pn>.Bnands�Bitwise NAND active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the $NANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bumlsll��This unsigned integer multiply-subtract long-long instruction multiplies each unsigned 8-bit or 16-bit element in the one, two, or four first source vectors with each unsigned 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively subtracts these values from the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=UMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=UMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RUMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RUMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RUMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<UMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dUMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }bfmlslb��This BFloat16 floating-point multiply-subtract long instruction widens the even-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLSLB <Zda>.S, <Zn>.H, <Zm>.H&BFMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]saddv�Signed add horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Narrow elements are first sign-extended to 64 bits. Inactive elements in the source vector are treated as zero.SADDV <Dd>, <Pg>, <Zn>.<T>wfitWait for interrupt with timeout	WFIT <Xt>facge>Floating-point absolute compare greater than or equal (vector)FACGE <Hd>, <Hn>, <Hm>FACGE <V><d>, <V><n>, <V><m>"FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>umnegl
UMNEGL -- A64Unsigned multiply-negate longUMNEGL <Xd>, <Wn>, <Wm>UMSUBL   <Xd>, <Wn>, <Wm>, XZRtlbiTLBI -- A64TLB invalidate operationTLBI <tlbi_op>{, <Xt>}(SYS   #<op1>, <Cn>, <Cm>, #<op2>{, <Xt>}mad�&Multiply the corresponding active elements of the first and second source vectors and add to elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.)MAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>sqsubr�'Subtract active signed elements of the first source vector from corresponding signed elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Each result element is saturated to the N-bit element's signed integer range -2-SQSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>fmla=Floating-point fused multiply-add to accumulator (by element) FMLA <Hd>, <Hn>, <Vm>.H[<index>]'FMLA <V><d>, <V><n>, <Vm>.<Ts>[<index>](FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]+FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]!FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*FMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>#FMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]#FMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>]#FMLA <Zda>.D, <Zn>.D, <Zm>.D[<imm>]IFMLA    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IFMLA    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.S-<Zn2>.S }, <Zm>.S[<index>]IFMLA    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }, <Zm>.D[<index>]IFMLA    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]IFMLA    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.S-<Zn4>.S }, <Zm>.S[<index>]IFMLA    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }, <Zm>.D[<index>]HFMLA    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>@FMLA    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HHFMLA    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>@FMLA    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HWFMLA    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }MFMLA    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }WFMLA    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }MFMLA    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }casp7Compare and swap pair of words or doublewords in memory4CASP <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]5CASPA <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]6CASPAL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]5CASPL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]4CASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]5CASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]6CASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]5CASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{, #0}]	sha512su0SHA512 schedule update 0SHA512SU0 <Vd>.2D, <Vn>.2Dbfsub�#Subtract active BFloat16 elements of the second source vector from corresponding BFloat16 elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.&BFSUB <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HBFSUB <Zd>.H, <Zn>.H, <Zm>.H8BFSUB   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }8BFSUB   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }fcmlt.Floating-point compare less than zero (vector)FCMLT <Hd>, <Hn>, #0.0FCMLT <V><d>, <V><n>, #0.0FCMLT <Vd>.<T>, <Vn>.<T>, #0.0FCMLT <Vd>.<T>, <Vn>.<T>, #0.0FCMLT (vectors)�\Compare active floating-point elements in the first source vector being less than corresponding elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FCMLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FCMGT    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>st1d�Contiguous store of doublewords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]8ST1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]=ST1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]KST1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3])ST1D { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]2ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]2ST1D { <Zt>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}].ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3].ST1D { <Zt>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]2ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]/ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]0ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3](ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]CST1D { <ZAt><HV>.D[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #3}]fmadd*Floating-point fused multiply-add (scalar)FMADD <Hd>, <Hn>, <Hm>, <Ha>FMADD <Sd>, <Sn>, <Sm>, <Sa>FMADD <Dd>, <Dn>, <Dm>, <Da>sqshlr��Shift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's signed integer range -2-SQSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>csnegConditional select negationCSNEG <Wd>, <Wn>, <Wm>, <cond>CSNEG <Xd>, <Xn>, <Xm>, <cond>fmaxnm&Floating-point maximum number (vector)#FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMAXNM <Hd>, <Hn>, <Hm>FMAXNM <Sd>, <Sn>, <Sm>FMAXNM <Dd>, <Dn>, <Dm>EFMAXNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>EFMAXNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>TFMAXNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TFMAXNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>-FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>decd�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements.'DECD <Zdn>.D{, <pattern>{, MUL #<imm>}}'DECH <Zdn>.H{, <pattern>{, MUL #<imm>}}'DECW <Zdn>.S{, <pattern>{, MUL #<imm>}}sadalp'Signed add and accumulate long pairwiseSADALP <Vd>.<Ta>, <Vn>.<Tb>#SADALP <Zda>.<T>, <Pg>/M, <Zn>.<Tb>msb�-Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.)MSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>sabalt�Compute the absolute difference between odd-numbered signed elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SABALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>nors�Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #NORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bsaddlbt��Add the even-numbered signed elements of the first source vector to the odd-numbered signed elements of the second source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.&SADDLBT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>shsubr�5Subtract active signed elements of the first source vector from corresponding signed elements of the second source vector, shift right one bit, and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.-SHSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>svdot��The signed integer vertical dot product instruction computes the vertical dot product of the corresponding two signed 16-bit integer values held in the two first source vectors and two signed 16-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product results are destructively added to the corresponding 32-bit element of the ZA single-vector groups.ISVDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]ISVDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]ISVDOT   ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]zip1Zip vectors (primary)!ZIP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>cpyfprtnKMemory copy forward-only, reads unprivileged, reads and writes non-temporal"CPYFPRTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMRTN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFERTN  [ <Xd>]!, [<Xs>]!, <Xn>!cpypwtrn4Memory copy, writes unprivileged, reads non-temporal"CPYPWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYEWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!cnegCNEG -- A64Conditional negateCNEG <Wd>, <Wn>, <invcond> CSNEG   <Wd>, <Wn>, <Wm>, <cond>CNEG <Xd>, <Xn>, <invcond> CSNEG   <Xd>, <Xn>, <Xm>, <cond>negNegate (vector)NEG  D <d>, D<n>NEG <Vd>.<T>, <Vn>.<T>NEG <Zd>.<T>, <Pg>/M, <Zn>.<T>NEG (shifted register) -- A64Negate (shifted register)#NEG <Wd>, <Wm>{, <shift> #<amount>}*SUB   <Wd>, WZR, <Wm>{, <shift> #<amount>}#NEG <Xd>, <Xm>{, <shift> #<amount>}*SUB   <Xd>, XZR, <Xm>{, <shift> #<amount>}smulhSigned multiply highSMULH <Xd>, <Xn>, <Xm>,SMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"SMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>uqxtn"Unsigned saturating extract narrowUQXTN <Vb><d>, <Va><n>UQXTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>cadd��Add the real and imaginary components of the integral complex numbers from the first source vector to the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation, equivalent to multiplying the complex numbers in the second source vector by ±,CADD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, <const>ldumaxb9Atomic unsigned maximum on byte in memory, without return4STUMAXB <Ws>, [<Xn|SP>]LDUMAXB  <Ws>, WZR, [<Xn|SP>]6STUMAXLB <Ws>, [<Xn|SP>]LDUMAXLB  <Ws>, WZR, [<Xn|SP>]autia171615-Authenticate instruction address, using key AAUTIA171615 ic	IC -- A64Instruction cache operationIC <ic_op>{, <Xt>}&SYS   #<op1>, C7, <Cm>, #<op2>{, <Xt>}uaddlpUnsigned add long pairwiseUADDLP <Vd>.<Ta>, <Vn>.<Tb>uaba+Unsigned absolute difference and accumulate!UABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"UABA <Zda>.<T>, <Zn>.<T>, <Zm>.<T>rev32)Reverse elements in 32-bit words (vector)REV32 <Vd>.<T>, <Vn>.<T>REV32 <Xd>, <Xn>fmlallbbT8-bit floating-point multiply-add long-long to single-precision (vector, by element)
+FMLALLBB <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]+FMLALLBT <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]+FMLALLTB <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]+FMLALLTT <Vd>.4S, <Vn>.16B, <Vm>.B[<index>]$FMLALLBB <Vd>.4S, <Vn>.16B, <Vm>.16B$FMLALLBT <Vd>.4S, <Vn>.16B, <Vm>.16B$FMLALLTB <Vd>.4S, <Vn>.16B, <Vm>.16B$FMLALLTT <Vd>.4S, <Vn>.16B, <Vm>.16B FMLALLBB <Zda>.S, <Zn>.B, <Zm>.B'FMLALLBB <Zda>.S, <Zn>.B, <Zm>.B[<imm>]ctzCount trailing zerosCTZ <Wd>, <Wn>CTZ <Xd>, <Xn>sqinch�kDetermines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCH <Xdn>{, <pattern>{, MUL #<imm>}})SQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}subpSubtract pointerSUBP <Xd>, <Xn|SP>, <Xm|SP>rshrnb�aShift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.$RSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>movprfxThe predicated %MOVPRFX <Zd>.<T>, <Pg>/<ZM>, <Zn>.<T>MOVPRFX <Zd>, <Zn>eortb�?Interleaving exclusive OR between the odd-numbered elements of the first source vector register and the even-numbered elements of the second source vector register, placing the result in the odd-numbered elements of the destination vector, leaving the even-numbered elements unchanged. This instruction is unpredicated."EORTB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>fcvtx�CConvert active double-precision floating-point elements from the source vector to single-precision, rounding to Odd, and place the results in the even-numbered 32-bit elements of the destination vector, while setting the odd-numbered elements to zero. Inactive elements in the destination vector register remain unmodified.FCVTX <Zd>.S, <Pg>/M, <Zn>.Dsmlalt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLALT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]frintaGFloating-point round to integral, to nearest with ties to away (vector)FRINTA <Vd>.<T>, <Vn>.<T>FRINTA <Vd>.<T>, <Vn>.<T>FRINTA <Hd>, <Hn>FRINTA <Sd>, <Sn>FRINTA <Dd>, <Dn>/FRINTA { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTA { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }umlall�~This unsigned integer multiply-add long-long instruction multiplies each unsigned 8-bit or 16-bit element in the one, two, or four first source vectors with each unsigned 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively adds these values to the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=UMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=UMLALL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RUMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RUMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RUMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<UMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dUMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }xtnExtract narrowXTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>ldnf1d��Contiguous load with non-faulting behavior of doublewords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]andBitwise AND (vector)	 AND <Vd>.<T>, <Vn>.<T>, <Vm>.<T>AND <Wd|WSP>, <Wn>, #<imm>AND <Xd|SP>, <Xn>, #<imm>)AND <Wd>, <Wn>, <Wm>{, <shift> #<amount>})AND <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"AND <Zdn>.<T>, <Zdn>.<T>, #<const>AND <Zd>.D, <Zn>.D, <Zm>.Dbfmax�Determine the maximum of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.:BFMAX { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H:BFMAX { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HGBFMAX { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }GBFMAX { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }&BFMAX <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.Hldnf1sw��Contiguous load with non-faulting behavior of signed words to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.7LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]smcSecure monitor callSMC  # <imm>ld1q�Gather load of quadwords to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.)LD1Q { <Zt>.Q }, <Pg>/Z, [<Zn>.D{, <Xm>}]ELD1Q { <ZAt><HV>.Q[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #4}]stxp!Store exclusive pair of registers(STXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{, #0}](STXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{, #0}]ursqrte(Unsigned reciprocal square root estimateURSQRTE <Vd>.<T>, <Vn>.<T>URSQRTE <Zd>.S, <Pg>/M, <Zn>.SrbitReverse bit order (vector)RBIT <Vd>.<T>, <Vn>.<T>RBIT <Wd>, <Wn>RBIT <Xd>, <Xn>RBIT <Zd>.<T>, <Pg>/M, <Zn>.<T>histseg�*This instruction compares each 8-bit byte element of the first source vector with all of the elements in the corresponding 128-bit segment of the second source vector and places the count of matching elements in the corresponding element of the destination vector. This instruction is unpredicated.HISTSEG <Zd>.B, <Zn>.B, <Zm>.Blduminh=Atomic unsigned minimum on halfword in memory, without return4STUMINH <Ws>, [<Xn|SP>]LDUMINH  <Ws>, WZR, [<Xn|SP>]6STUMINLH <Ws>, [<Xn|SP>]LDUMINLH  <Ws>, WZR, [<Xn|SP>]uxtlUXTL, UXTL2 -- A64Unsigned extend longUXTL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>#USHLL {2}  <Vd>.<Ta>, <Vn>.<Tb>, #0	cpyfpwtrnAMemory copy forward-only, writes unprivileged, reads non-temporal#CPYFPWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFEWTRN  [ <Xd>]!, [<Xs>]!, <Xn>!st3w�2Contiguous store three-word structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]uqdech�*Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECH <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECH <Xdn>{, <pattern>{, MUL #<imm>}})UQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}ld1rqh�Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]sqxtunb�Saturate the signed integer value in each source element to an unsigned integer value that is half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.SQXTUNB <Zd>.<T>, <Zn>.<Tb>sabal.Signed absolute difference and accumulate long*SABAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>smullt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$SMULLT <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$SMULLT <Zd>.D, <Zn>.S, <Zm>.S[<imm>]addpl�Add the current predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDPL <Xd|SP>, <Xn|SP>, #<imm>notsNOTS�Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the NOTS <Pd>.B, <Pg>/Z, <Pn>.B$EORS  <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.BcmpCMP (extended register) -- A64Compare (extended register)*CMP <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2SUBS   WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}+CMP <Xn|SP>, <R><m>{, <extend> {#<amount>}}3SUBS   XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}CMP (immediate) -- A64Compare (immediate)CMP <Wn|WSP>, #<imm>{, <shift>}'SUBS   WZR, <Wn|WSP>, #<imm>{, <shift>}CMP <Xn|SP>, #<imm>{, <shift>}&SUBS   XZR, <Xn|SP>, #<imm>{, <shift>}CMP (shifted register) -- A64Compare (shifted register)#CMP <Wn>, <Wm>{, <shift> #<amount>}+SUBS   WZR, <Wn>, <Wm>{, <shift> #<amount>}#CMP <Xn>, <Xm>{, <shift> #<amount>}+SUBS   XZR, <Xn>, <Xm>{, <shift> #<amount>}ldaprh#Load-acquire RCpc register halfwordLDAPRH <Wt>, [<Xn|SP> {, #0}]bfmla�NMultiply the corresponding active BFloat16 elements of the first and second source vectors and add to elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.%BFMLA <Zda>.H, <Pg>/M, <Zn>.H, <Zm>.H$BFMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]IBFMLA   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IBFMLA   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@BFMLA   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@BFMLA   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMBFMLA   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MBFMLA   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }lasta�.If there is an active element then extract the element after the last active element modulo the number of elements from the final source vector register. If there are no active elements, extract element zero. Then zero-extend and place the extracted element in the destination general-purpose register.LASTA <R><d>, <Pg>, <Zn>.<T>LASTA <V><d>, <Pg>, <Zn>.<T>ldpsw"Load pair of registers signed word%LDPSW <Xt1>, <Xt2>, [<Xn|SP>], #<imm>&LDPSW <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!'LDPSW <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]uunpk�Unpack elements from one or two source vectors and then zero-extend them to place in elements of twice their size within the two or four destination vectors.(UUNPK { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<Tb>8UUNPK { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<Tb>-<Zn2>.<Tb> }ldnt1sb�,Gather load non-temporal of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LDNT1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}],LDNT1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]sel�Read active elements from the two or four first source vectors and inactive elements from the two or four second source vectors and place in the corresponding elements of the two or four destination vectors.TSEL { <Zd1>.<T>-<Zd2>.<T> }, <PNg>, { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TSEL { <Zd1>.<T>-<Zd4>.<T> }, <PNg>, { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> } SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B&SEL <Zd>.<T>, <Pv>, <Zn>.<T>, <Zm>.<T>setp
Memory setSETP  [ <Xd>]!, <Xn>!, <Xs>SETM  [ <Xd>]!, <Xn>!, <Xs>SETE  [ <Xd>]!, <Xn>!, <Xs>uabalt�Compute the absolute difference between odd-numbered unsigned elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UABALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>ldrsb%Load register signed byte (immediate)
LDRSB <Wt>, [<Xn|SP>], #<simm>LDRSB <Xt>, [<Xn|SP>], #<simm>LDRSB <Wt>, [<Xn|SP>, #<simm>]!LDRSB <Xt>, [<Xn|SP>, #<simm>]! LDRSB <Wt>, [<Xn|SP>{, #<pimm>}] LDRSB <Xt>, [<Xn|SP>{, #<pimm>}]7LDRSB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]+LDRSB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]7LDRSB <Xt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]+LDRSB <Xt>, [<Xn|SP>, <Xm>{, LSL <amount>}]
retaasppcrTReturn from subroutine, with enhanced pointer authentication return using a registerRETAASPPCR <Xm>RETABSPPCR <Xm>ldsetab Atomic bit set on byte in memoryLDSETAB <Ws>, <Wt>, [<Xn|SP>]LDSETALB <Ws>, <Wt>, [<Xn|SP>]LDSETB <Ws>, <Wt>, [<Xn|SP>]LDSETLB <Ws>, <Wt>, [<Xn|SP>]xarExclusive-OR and rotate&XAR <Vd>.2D, <Vn>.2D, <Vm>.2D, #<imm6>,XAR <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, #<const>whilegt�Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, signed scalar operand is greater than the second scalar operand and false thereafter down to the lowest numbered element. WHILEGT <Pd>.<T>, <R><n>, <R><m>#WHILEGT <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEGT { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>luti2$Lookup table read with 2-bit indices	+LUTI2 <Vd>.16B, { <Vn>.16B }, <Vm>[<index>])LUTI2 <Vd>.8H, { <Vn>.8H }, <Vm>[<index>]1LUTI2 { <Zd1>.<T>-<Zd2>.<T> }, ZT0, <Zn>[<index>]2LUTI2 { <Zd1>.<T>, <Zd2>.<T> }, ZT0, <Zn>[<index>]1LUTI2 { <Zd1>.<T>-<Zd4>.<T> }, ZT0, <Zn>[<index>]HLUTI2 { <Zd1>.<T>, <Zd2>.<T>, <Zd3>.<T>, <Zd4>.<T> }, ZT0, <Zn>[<index>]"LUTI2 <Zd>.<T>, ZT0, <Zn>[<index>]'LUTI2 <Zd>.B, { <Zn>.B }, <Zm>[<index>]'LUTI2 <Zd>.H, { <Zn>.H }, <Zm>[<index>]sqabs Signed saturating absolute valueSQABS <V><d>, <V><n>SQABS <Vd>.<T>, <Vn>.<T> SQABS <Zd>.<T>, <Pg>/M, <Zn>.<T>tsbTrace synchronization barrierTSB  CSYNC uqincd�*Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCD <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCD <Xdn>{, <pattern>{, MUL #<imm>}})UQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}ldlarLoad LOAcquire registerLDLAR <Wt>, [<Xn|SP>{, #0}]LDLAR <Xt>, [<Xn|SP>{, #0}]sabdlSigned absolute difference long*SABDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>uvdot��The unsigned integer vertical dot product instruction computes the vertical dot product of the corresponding two unsigned 16-bit integer values held in the two first source vectors and two unsigned 16-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product results are destructively added to the corresponding 32-bit element of the ZA single-vector groups.IUVDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IUVDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]IUVDOT   ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]cmpltCMPLT (vectors)�KCompare active signed integer elements in the first source vector being less than corresponding signed elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPGT    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>ldtrb!Load register byte (unprivileged) LDTRB <Wt>, [<Xn|SP>{, #<simm>}]zipq1�Interleave alternating elements from low halves of the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."ZIPQ1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>brkBreakpoint instructionBRK  # <imm>sdiv
Signed divideSDIV <Wd>, <Wn>, <Wm>SDIV <Xd>, <Xn>, <Xm>+SDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>brBranch to registerBR <Xn>ldff1sb�FGather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector..LDFF1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}].LDFF1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]-LDFF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}]-LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}]-LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]4LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]4LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]-LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]sm3tt1aSM3TT1A(SM3TT1A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]dcps2Debug change PE state to EL2DCPS2  {# <imm>}cpyfpwt-Memory copy forward-only, writes unprivileged!CPYFPWT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMWT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFEWT  [ <Xd>]!, [<Xs>]!, <Xn>!ld1rqw�Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]ldiapp+Load-Acquire RCpc ordered pair of registers"LDIAPP <Wt1>, <Wt2>, [<Xn|SP>], #8LDIAPP <Wt1>, <Wt2>, [<Xn|SP>]#LDIAPP <Xt1>, <Xt2>, [<Xn|SP>], #16LDIAPP <Xt1>, <Xt2>, [<Xn|SP>]uzpq1�Concatenate adjacent even-numbered elements from the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."UZPQ1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>absAbsolute valueABS <Wd>, <Wn>ABS <Xd>, <Xn>ABS  D <d>, D<n>ABS <Vd>.<T>, <Vn>.<T>ABS <Zd>.<T>, <Pg>/M, <Zn>.<T>clastb�\From the source vector register extract the last active element, and then zero-extend that element to destructively place in the destination and first source general-purpose register. If there are no active elements then destructively zero-extend the least significant element-size bits of the destination and first source general-purpose register.'CLASTB <R><dn>, <Pg>, <R><dn>, <Zm>.<T>'CLASTB <V><dn>, <Pg>, <V><dn>, <Zm>.<T>+CLASTB <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>sbclb�dSubtract the even-numbered elements of the first source vector and the inverted 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector from the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#SBCLB <Zda>.<T>, <Zn>.<T>, <Zm>.<T>sqrdmlshVSigned saturating rounding doubling multiply subtract returning high half (by element)+SQRDMLSH <V><d>, <V><n>, <Vm>.<Ts>[<index>]/SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQRDMLSH <V><d>, <V><n>, <V><m>%SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>&SQRDMLSH <Zda>.<T>, <Zn>.<T>, <Zm>.<T>'SQRDMLSH <Zda>.H, <Zn>.H, <Zm>.H[<imm>]'SQRDMLSH <Zda>.S, <Zn>.S, <Zm>.S[<imm>]'SQRDMLSH <Zda>.D, <Zn>.D, <Zm>.D[<imm>]smaxpSigned maximum pairwise"SMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SMAXP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>raddhn"Rounding add returning high narrow+RADDHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>ldtrLoad register (unprivileged)LDTR <Wt>, [<Xn|SP>{, #<simm>}]LDTR <Xt>, [<Xn|SP>{, #<simm>}]smlalb�Multiply the corresponding even-numbered signed elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLALB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]srshr'Signed rounding shift right (immediate)SRSHR  D <d>, D<n>, #<shift>"SRSHR <Vd>.<T>, <Vn>.<T>, #<shift>,SRSHR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>dc	DC -- A64Data cache operationDC <dc_op>, <Xt>$SYS   #<op1>, C7, <Cm>, #<op2>, <Xt>ld1d�Contiguous load of unsigned doublewords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]:LD1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]?LD1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]MLD1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]+LD1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1D { <Zt>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]0LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]0LD1D { <Zt>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]4LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]1LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]2LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3]*LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ELD1D { <ZAt><HV>.D[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #3}]cmtst*Compare bitwise test bits nonzero (vector)CMTST  D <d>, D<n>, D<m>"CMTST <Vd>.<T>, <Vn>.<T>, <Vm>.<T>frinti�!Round to an integral floating-point value with the specified rounding option from each active floating-point element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.!FRINTI <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTX <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTA <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTN <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTZ <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTM <Zd>.<T>, <Pg>/M, <Zn>.<T>!FRINTP <Zd>.<T>, <Pg>/M, <Zn>.<T>FRINTI <Vd>.<T>, <Vn>.<T>FRINTI <Vd>.<T>, <Vn>.<T>FRINTI <Hd>, <Hn>FRINTI <Sd>, <Sn>FRINTI <Dd>, <Dn>sqdmlslt�Multiply then double the corresponding odd-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLSLT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLSLT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]sqdecw�kDetermines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECW <Xdn>{, <pattern>{, MUL #<imm>}})SQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}smsublSigned multiply-subtract longSMSUBL <Xd>, <Wn>, <Wm>, <Xa>uqdecp�Counts the number of true elements in the source predicate and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.UQDECP <Wdn>, <Pm>.<T>UQDECP <Xdn>, <Pm>.<T>UQDECP <Zdn>.<T>, <Pm>.<T>ldff1h�\Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.-LDFF1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]-LDFF1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LDFF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]4LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]4LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]6LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]6LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]3LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]4LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1],LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]tblTable vector lookup&TBL <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>2TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>>TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>JTBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta>$TBL <Zd>.<T>, { <Zn>.<T> }, <Zm>.<T>0TBL <Zd>.<T>, { <Zn1>.<T>, <Zn2>.<T> }, <Zm>.<T>sqdmlalt�Multiply then double the corresponding odd-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLALT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]ushlUnsigned shift left (register)USHL  D <d>, D<n>, D<m>!USHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>orr(Bitwise inclusive OR (vector, immediate)&ORR <Vd>.<T>, #<imm8>{, LSL #<amount>}&ORR <Vd>.<T>, #<imm8>{, LSL #<amount>} ORR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>ORR <Wd|WSP>, <Wn>, #<imm>ORR <Xd|SP>, <Xn>, #<imm>)ORR <Wd>, <Wn>, <Wm>{, <shift> #<amount>})ORR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*ORR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"ORR <Zdn>.<T>, <Zdn>.<T>, #<const>ORR <Zd>.D, <Zn>.D, <Zm>.Dfmlslt��This half-precision floating-point multiply-subtract long instruction widens the odd-numbered half-precision elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding half-precision elements in the source vectors. This instruction is unpredicated.FMLSLT <Zda>.S, <Zn>.H, <Zm>.H%FMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]sm4ekeySM4 key!SM4EKEY <Vd>.4S, <Vn>.4S, <Vm>.4SSM4EKEY <Zd>.S, <Zn>.S, <Zm>.SwfeWait for eventWFE fcmpe)Floating-point signaling compare (scalar)FCMPE <Hn>, <Hm>FCMPE <Hn>, #0.0FCMPE <Sn>, <Sm>FCMPE <Sn>, #0.0FCMPE <Dn>, <Dm>FCMPE <Dn>, #0.0frintz6Floating-point round to integral, toward zero (vector)FRINTZ <Vd>.<T>, <Vn>.<T>FRINTZ <Vd>.<T>, <Vn>.<T>FRINTZ <Hd>, <Hn>FRINTZ <Sd>, <Sn>FRINTZ <Dd>, <Dn>bfmlslt��This BFloat16 floating-point multiply-subtract long instruction widens the odd-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and subtracts these values without intermediate rounding from the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLSLT <Zda>.S, <Zn>.H, <Zm>.H&BFMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]umullt�Multiply the corresponding odd-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$UMULLT <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$UMULLT <Zd>.D, <Zn>.S, <Zm>.S[<imm>]setptn)Memory set, unprivileged and non-temporalSETPTN  [ <Xd>]!, <Xn>!, <Xs>SETMTN  [ <Xd>]!, <Xn>!, <Xs>SETETN  [ <Xd>]!, <Xn>!, <Xs>umaxvUnsigned maximum across vectorUMAXV <V><d>, <Vn>.<T>UMAXV <V><d>, <Pg>, <Zn>.<T>fdup�Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is unpredicated.FDUP <Zd>.<T>, #<const>	retaasppc]Return from subroutine, with enhanced pointer authentication return using an immediate offsetRETAASPPC <label>RETABSPPC <label>fcvtpuRFloating-point convert to unsigned integer, rounding toward plus infinity (vector)
FCVTPU <Hd>, <Hn>FCVTPU <V><d>, <V><n>FCVTPU <Vd>.<T>, <Vn>.<T>FCVTPU <Vd>.<T>, <Vn>.<T>FCVTPU <Wd>, <Hn>FCVTPU <Xd>, <Hn>FCVTPU <Wd>, <Sn>FCVTPU <Xd>, <Sn>FCVTPU <Wd>, <Dn>FCVTPU <Xd>, <Dn>ldsetp$Atomic bit set on quadword in memoryLDSETP <Xt1>, <Xt2>, [<Xn|SP>]LDSETPA <Xt1>, <Xt2>, [<Xn|SP>] LDSETPAL <Xt1>, <Xt2>, [<Xn|SP>]LDSETPL <Xt1>, <Xt2>, [<Xn|SP>]sminSigned minimum (vector)!SMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SMIN <Wd>, <Wn>, #<simm>SMIN <Xd>, <Xn>, #<simm>CSMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CSMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RSMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RSMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }SMIN <Wd>, <Wn>, <Wm>SMIN <Xd>, <Xn>, <Xm>+SMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!SMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>rev
Reverse bytesREV <Wd>, <Wn>REV <Xd>, <Xn>REV <Pd>.<T>, <Pn>.<T>REV <Zd>.<T>, <Zn>.<T>tcommitCommit current transactionTCOMMIT famaxFloating-point absolute maximum"FAMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FAMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SFAMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SFAMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FAMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uxthUXTH -- A64Unsigned extend halfwordUXTH <Wd>, <Wn>UBFM   <Wd>, <Wn>, #0, #15umaddlUnsigned multiply-add longUMADDL <Xd>, <Wn>, <Wm>, <Xa>uminpUnsigned minimum pairwise"UMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UMINP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>fminqv�,Floating-point minimum of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as +Infinity. FMINQV <Vd>.<T>, <Pg>, <Zn>.<Tb>fcmla7Floating-point complex multiply accumulate (by element)7FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>], #<rotate>-FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>4FCMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>, <const>-FCMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>-FCMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>uqxtnb�Saturate the unsigned integer value in each source element to half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.UQXTNB <Zd>.<T>, <Zn>.<Tb>st1w�Contiguous store of words from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]8ST1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]=ST1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]KST1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2])ST1W { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}])ST1W { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]4ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]2ST1W { <Zt>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #2].ST1W { <Zt>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]2ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]2ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]/ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]/ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]0ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2](ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]CST1W { <ZAt><HV>.S[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #2}]ld3rMLoad single 3-element structure and replicate to all lanes of three registers3LD3R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]:LD3R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>9LD3R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>sqxtnb�Saturate the signed integer value in each source element to half the original source element width, and place the results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero.SQXTNB <Zd>.<T>, <Zn>.<Tb>	sqdmlalbt�Multiply then double the corresponding even-numbered signed elements of the first and odd-numbered signed elements of the second source vector. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2)SQDMLALBT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>ld1rh�Load a single unsigned halfword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 2 in the range 0 to 126.-LD1RH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]sshrSigned shift right (immediate)SSHR  D <d>, D<n>, #<shift>!SSHR <Vd>.<T>, <Vn>.<T>, #<shift>fcvtxnJFloating-point convert to lower precision narrow, rounding to odd (vector)FCVTXN  S <d>, D<n>FCVTXN{ 2}  <Vd>.<Tb>, <Vn>.2DstllrbStore LORelease register byteSTLLRB <Wt>, [<Xn|SP>{, #0}]suqadd.Signed saturating accumulate of unsigned valueSUQADD <V><d>, <V><n>SUQADD <Vd>.<T>, <Vn>.<T>-SUQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sxtb�Sign-extend the least-significant sub-element of each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.SXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>SXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>SXTW <Zd>.D, <Pg>/M, <Zn>.DSXTB -- A64Signed extend byteSXTB <Wd>, <Wn>SBFM   <Wd>, <Wn>, #0, #7SXTB <Xd>, <Wn>SBFM   <Xd>, <Xn>, #0, #7ldrbLoad register byte (immediate)LDRB <Wt>, [<Xn|SP>], #<simm>LDRB <Wt>, [<Xn|SP>, #<simm>]!LDRB <Wt>, [<Xn|SP>{, #<pimm>}]6LDRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]*LDRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]cdot�The complex integer dot product instructions delimit the source vectors into pairs of 8-bit or 16-bit signed integer complex numbers. Within each pair, the complex numbers in the first source vector are multiplied by the corresponding complex numbers in the second source vector and the resulting wide real or wide imaginary part of the product is accumulated into a 32-bit or 64-bit destination vector element which overlaps all four of the elements that comprise a pair of complex number values in the first source vector.-CDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>, <const>,CDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>], <const>,CDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>], <const>pacga.Pointer Authentication Code, using generic keyPACGA <Xd>, <Xn>, <Xm|SP>fminnmp:Floating-point minimum number of pair of elements (scalar)FMINNMP  H <d>, <Vn>.2HFMINNMP <V><d>, <Vn>.<T>$FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>.FMINNMP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldsminab'Atomic signed minimum on byte in memoryLDSMINAB <Ws>, <Wt>, [<Xn|SP>]LDSMINALB <Ws>, <Wt>, [<Xn|SP>]LDSMINB <Ws>, <Wt>, [<Xn|SP>]LDSMINLB <Ws>, <Wt>, [<Xn|SP>]ld1rsw�Load a single signed word from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 4 in the range 0 to 252..LD1RSW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]rsubhnb�TSubtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant rounded half of the result in the even-numbered half-width destination elements, while setting the odd-numbered half-width destination elements to zero. This instruction is unpredicated.&RSUBHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>umaxUnsigned maximum (vector)!UMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>UMAX <Wd>, <Wn>, #<uimm>UMAX <Xd>, <Xn>, #<uimm>CUMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CUMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RUMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RUMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }UMAX <Wd>, <Wn>, <Wm>UMAX <Xd>, <Xn>, <Xm>+UMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!UMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>sqshlu1Signed saturating shift left unsigned (immediate)SQSHLU <V><d>, <V><n>, #<shift>#SQSHLU <Vd>.<T>, <Vn>.<T>, #<shift>-SQSHLU <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>braaz/Branch to register, with pointer authentication
BRAAZ <Xn>BRAA <Xn>, <Xm|SP>
BRABZ <Xn>BRAB <Xn>, <Xm|SP>ld1rNLoad one single-element structure and replicate to all lanes (of one register)LD1R  { <Vt>.<T> }, [<Xn|SP>]$LD1R  { <Vt>.<T> }, [<Xn|SP>], <imm>#LD1R  { <Vt>.<T> }, [<Xn|SP>], <Xm>pacib171615@Pointer Authentication Code for instruction address, using key BPACIB171615 ldadd*Atomic add on word or doubleword in memoryLDADD <Ws>, <Wt>, [<Xn|SP>]LDADDA <Ws>, <Wt>, [<Xn|SP>]LDADDAL <Ws>, <Wt>, [<Xn|SP>]LDADDL <Ws>, <Wt>, [<Xn|SP>]LDADD <Xs>, <Xt>, [<Xn|SP>]LDADDA <Xs>, <Xt>, [<Xn|SP>]LDADDAL <Xs>, <Xt>, [<Xn|SP>]LDADDL <Xs>, <Xt>, [<Xn|SP>]0STADD <Ws>, [<Xn|SP>]LDADD  <Ws>, WZR, [<Xn|SP>]2STADDL <Ws>, [<Xn|SP>]LDADDL  <Ws>, WZR, [<Xn|SP>]0STADD <Xs>, [<Xn|SP>]LDADD  <Xs>, XZR, [<Xn|SP>]2STADDL <Xs>, [<Xn|SP>]LDADDL  <Xs>, XZR, [<Xn|SP>]eretaa-Exception return, with pointer authenticationERETAA ERETAB orqv�Bitwise inclusive OR of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as all zeros.ORQV <Vd>.<T>, <Pg>, <Zn>.<Tb>sdot2Dot product signed arithmetic (vector, by element)+SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>SDOT <Zda>.S, <Zn>.H, <Zm>.H#SDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]$SDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>#SDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]#SDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]ISDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]ISDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@SDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@SDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMSDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MSDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }ISDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]ISDOT    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]ISDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]ISDOT    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]KSDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>KSDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>[SDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }[SDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }fmlslIFloating-point fused multiply-subtract long from accumulator (by element)+FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>],FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]%FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>&FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=FMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4FMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VFMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }sha1su0SHA1 schedule update 0!SHA1SU0 <Vd>.4S, <Vn>.4S, <Vm>.4SsqrshrunASigned saturating rounded shift right unsigned narrow (immediate)#SQRSHRUN <Vb><d>, <Va><n>, #<shift>,SQRSHRUN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>.SQRSHRUN <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>6SQRSHRUN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>ldapurb*Load-acquire RCpc register byte (unscaled)"LDAPURB <Wt>, [<Xn|SP>{, #<simm>}]ldff1b�HGather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.	-LDFF1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]-LDFF1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}],LDFF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, <Xm>}],LDFF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}],LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}],LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]3LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>],LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]umlsl4Unsigned multiply-subtract long (vector, by element)
3UMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*UMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=UMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4UMLSL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VUMLSL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }umops5This instruction works with a 32-bit element ZA tile..UMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.UMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.UMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hins1Insert vector element from another vector element,INS <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]INS <Vd>.<Ts>[<index>], <R><n>bcBranch consistent conditionallyBC. <cond>  <label>fcsel*Floating-point conditional select (scalar)FCSEL <Hd>, <Hn>, <Hm>, <cond>FCSEL <Sd>, <Sn>, <Sm>, <cond>FCSEL <Dd>, <Dn>, <Dm>, <cond>adds&Add (extended register), setting flags1ADDS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2ADDS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}&ADDS <Wd>, <Wn|WSP>, #<imm>{, <shift>}%ADDS <Xd>, <Xn|SP>, #<imm>{, <shift>}*ADDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*ADDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}ldclrah&Atomic bit clear on halfword in memoryLDCLRAH <Ws>, <Wt>, [<Xn|SP>]LDCLRALH <Ws>, <Wt>, [<Xn|SP>]LDCLRH <Ws>, <Wt>, [<Xn|SP>]LDCLRLH <Ws>, <Wt>, [<Xn|SP>]movaz�The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size. The tile slices are zeroed after moving their contents to the destination vectors.;MOVAZ { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>]=MOVAZ { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>]=MOVAZ { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>]=MOVAZ { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>];MOVAZ { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>]=MOVAZ { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>]=MOVAZ { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>]=MOVAZ { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]5MOVAZ { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]5MOVAZ { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}]%MOVAZ <Zd>.B, ZA0<HV>.B[<Ws>, <offs>]'MOVAZ <Zd>.H, <ZAn><HV>.H[<Ws>, <offs>]'MOVAZ <Zd>.S, <ZAn><HV>.S[<Ws>, <offs>]'MOVAZ <Zd>.D, <ZAn><HV>.D[<Ws>, <offs>]'MOVAZ <Zd>.Q, <ZAn><HV>.Q[<Ws>, <offs>]cpyprtn>Memory copy, reads unprivileged, reads and writes non-temporal!CPYPRTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMRTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYERTN  [ <Xd>]!, [<Xs>]!, <Xn>!sqxtn Signed saturating extract narrowSQXTN <Vb><d>, <Va><n>SQXTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>sqaddSigned saturating addSQADD <V><d>, <V><n>, <V><m>"SQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-SQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"SQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>maddptMultiply-add checked pointerMADDPT <Xd>, <Xn>, <Xm>, <Xa>stgp*Store Allocation Tag and pair of registers$STGP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>%STGP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!&STGP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]umsublUnsigned multiply-subtract longUMSUBL <Xd>, <Wn>, <Wm>, <Xa>ld1rb�Load a single unsigned byte from a memory address generated by a 64-bit scalar base address plus an immediate offset which is in the range 0 to 63.-LD1RB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]orn!Bitwise inclusive OR NOT (vector) ORN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)ORN <Wd>, <Wn>, <Wm>{, <shift> #<amount>})ORN <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"ORN <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BORN (immediate)�KBitwise inclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated."ORN <Zdn>.<T>, <Zdn>.<T>, #<const>*ORR  <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)csincConditional select incrementCSINC <Wd>, <Wn>, <Wm>, <cond>CSINC <Xd>, <Xn>, <Xm>, <cond>sbclt�cSubtract the odd-numbered elements of the first source vector and the inverted 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector from the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#SBCLT <Zda>.<T>, <Zn>.<T>, <Zm>.<T>lastb�If there is an active element then extract the last active element from the final source vector register. If there are no active elements, extract the highest-numbered element. Then zero-extend and place the extracted element in the destination general-purpose register.LASTB <R><d>, <Pg>, <Zn>.<T>LASTB <V><d>, <Pg>, <Zn>.<T>ld1rqd�Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base address..LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]st1h�Contiguous store of halfwords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]8ST1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]=ST1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]KST1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1])ST1H { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}])ST1H { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]4ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]2ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]2ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]/ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]/ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]0ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1](ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]CST1H { <ZAt><HV>.H[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #1}]uaddv�Unsigned add horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Narrow elements are first zero-extended to 64 bits. Inactive elements in the source vector are treated as zero.UADDV <Dd>, <Pg>, <Zn>.<T>movzMove wide with zero!MOVZ <Wd>, #<imm>{, LSL #<shift>}!MOVZ <Xd>, #<imm>{, LSL #<shift>}subhnt�1Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.%SUBHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>sha512hSHA512 hash update part 1SHA512H <Qd>, <Qn>, <Vm>.2Dldnt1h�Contiguous load non-temporal of halfwords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]<LDNT1H { <Zt1>.H-<Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]<LDNT1H { <Zt1>.H-<Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]ALDNT1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]=LDNT1H { <Zt1>.H, <Zt2>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]OLDNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #1]+LDNT1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]+LDNT1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]ldaprbLoad-acquire RCpc register byteLDAPRB <Wt>, [<Xn|SP> {, #0}]cpyptwn?Memory copy, reads and writes unprivileged, writes non-temporal!CPYPTWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMTWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYETWN  [ <Xd>]!, [<Xs>]!, <Xn>!fmlalltb�This 8-bit floating-point multiply-add long-long instruction widens the third 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2 FMLALLTB <Zda>.S, <Zn>.B, <Zm>.B'FMLALLTB <Zda>.S, <Zn>.B, <Zm>.B[<imm>]uqrshl2Unsigned saturating rounding shift left (register)UQRSHL <V><d>, <V><n>, <V><m>#UQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-UQRSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>shrnt�_Shift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.#SHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>	cpyfprtrn=Memory copy forward-only, reads unprivileged and non-temporal#CPYFPRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFERTRN  [ <Xd>]!, [<Xs>]!, <Xn>!frint64z;Floating-point round to 64-bit integer toward zero (vector)FRINT64Z <Vd>.<T>, <Vn>.<T>FRINT64Z <Sd>, <Sn>FRINT64Z <Dd>, <Dn>pssbbPSSBB -- A64)Physical speculative store bypass barrierPSSBB DSB   #4sli!Shift left and insert (immediate)SLI  D <d>, D<n>, #<shift> SLI <Vd>.<T>, <Vn>.<T>, #<shift> SLI <Zd>.<T>, <Zn>.<T>, #<const>sqshl(Signed saturating shift left (immediate)SQSHL <V><d>, <V><n>, #<shift>"SQSHL <Vd>.<T>, <Vn>.<T>, #<shift>SQSHL <V><d>, <V><n>, <V><m>"SQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>,SQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uhsubUnsigned halving subtract"UHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UHSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldnf1sb��Contiguous load with non-faulting behavior of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.7LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]7LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]7LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]dcps1Debug change PE state to EL1DCPS1  {# <imm>}blraaz9Branch with link to register, with pointer authenticationBLRAAZ <Xn>BLRAA <Xn>, <Xm|SP>BLRABZ <Xn>BLRAB <Xn>, <Xm|SP>	autibsppcBAuthenticate return address using key B, using an immediate offsetAUTIBSPPC <label>subgSubtract with tag)SUBG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>fmaxFloating-point maximum (vector)!FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMAX <Hd>, <Hn>, <Hm>FMAX <Sd>, <Sn>, <Sm>FMAX <Dd>, <Dn>, <Dm>CFMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CFMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RFMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RFMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }*FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uunpkhi�Unpack elements from the lowest or highest half of the source vector and then zero-extend them to place in elements of twice their size within the destination vector. This instruction is unpredicated.UUNPKHI <Zd>.<T>, <Zn>.<Tb>UUNPKLO <Zd>.<T>, <Zn>.<Tb>xaflagIConvert floating-point condition flags from external format to Arm formatXAFLAG ldapur8Load-acquire RCpc SIMD&amp;FP register (unscaled offset)!LDAPUR <Bt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Ht>, [<Xn|SP>{, #<simm>}]!LDAPUR <St>, [<Xn|SP>{, #<simm>}]!LDAPUR <Dt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Qt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Wt>, [<Xn|SP>{, #<simm>}]!LDAPUR <Xt>, [<Xn|SP>{, #<simm>}]sm3tt2bSM3TT2B(SM3TT2B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]msubMultiply-subtractMSUB <Wd>, <Wn>, <Wm>, <Wa>MSUB <Xd>, <Xn>, <Xm>, <Xa>ld2h�1Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]setgptn:Memory set with tag setting, unprivileged and non-temporalSETGPTN  [ <Xd>]!, <Xn>!, <Xs>SETGMTN  [ <Xd>]!, <Xn>!, <Xs>SETGETN  [ <Xd>]!, <Xn>!, <Xs>ftmadThe ,FTMAD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, #<imm>ldnf1b��Contiguous load with non-faulting behavior of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]lduminah-Atomic unsigned minimum on halfword in memoryLDUMINAH <Ws>, <Wt>, [<Xn|SP>]LDUMINALH <Ws>, <Wt>, [<Xn|SP>]LDUMINH <Ws>, <Wt>, [<Xn|SP>]LDUMINLH <Ws>, <Wt>, [<Xn|SP>]ldeorh9Atomic exclusive-OR on halfword in memory, without return2STEORH <Ws>, [<Xn|SP>]LDEORH  <Ws>, WZR, [<Xn|SP>]4STEORLH <Ws>, [<Xn|SP>]LDEORLH  <Ws>, WZR, [<Xn|SP>]sminpSigned minimum pairwise"SMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SMINP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>cntp�Counts the number of active and true elements in the source predicate and places the scalar result in the destination general-purpose register. Inactive predicate elements are not counted.CNTP <Xd>, <Pg>, <Pn>.<T>CNTP <Xd>, <PNn>.<T>, <vl>ubfxUBFX -- A64Unsigned bitfield extract!UBFX <Wd>, <Wn>, #<lsb>, #<width>-UBFM   <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)!UBFX <Xd>, <Xn>, #<lsb>, #<width>-UBFM   <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)fmaxv$Floating-point maximum across vectorFMAXV <V><d>, <Vn>.<T>FMAXV  S <d>, <Vn>.4SFMAXV <V><d>, <Pg>, <Zn>.<T>eors�"Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #EORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bsmlal-Signed multiply-add long (vector, by element)
3SMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*SMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=SMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4SMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HISMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HISMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VSMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }sqcvtu�Saturate the signed integer value in each element of the two source vectors to unsigned integer value that is half the original source element width, and place the results in the half-width destination elements."SQCVTU <Zd>.H, { <Zn1>.S-<Zn2>.S }*SQCVTU <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }smull)Signed multiply long (vector, by element)3SMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*SMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>SMULL -- A64Signed multiply longSMULL <Xd>, <Wn>, <Wm>SMADDL   <Xd>, <Wn>, <Wm>, XZRusra/Unsigned shift right and accumulate (immediate)USRA  D <d>, D<n>, #<shift>!USRA <Vd>.<T>, <Vn>.<T>, #<shift>"USRA <Zda>.<T>, <Zn>.<T>, #<const>match��This instruction compares each active 8-bit or 16-bit character in the first source vector with all of the characters in the corresponding 128-bit segment of the second source vector. Where the first source element detects any matching characters in the second segment it places true in the corresponding element of the destination predicate, otherwise false. Inactive elements in the destination predicate register are set to zero. Sets the *MATCH <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>uaddwUnsigned add wide*UADDW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>smlslt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLSLT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLSLT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]ld4w�/Contiguous load four-word structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]sminvSigned minimum across vectorSMINV <V><d>, <Vn>.<T>SMINV <V><d>, <Pg>, <Zn>.<T>uaddlvUnsigned sum long across vectorUADDLV <V><d>, <Vn>.<T>fnegFloating-point negate (vector)FNEG <Vd>.<T>, <Vn>.<T>FNEG <Vd>.<T>, <Vn>.<T>FNEG <Hd>, <Hn>FNEG <Sd>, <Sn>FNEG <Dd>, <Dn>FNEG <Zd>.<T>, <Pg>/M, <Zn>.<T>cbnzCompare and branch on nonzeroCBNZ <Wt>, <label>CBNZ <Xt>, <label>drpsDebug restore PE stateDRPS bext�xThis instruction gathers bits in each element of the first source vector from the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector to the lowest-numbered contiguous bits of the corresponding destination element, preserving their order, and sets the remaining higher-numbered bits to zero. This instruction is unpredicated.!BEXT <Zd>.<T>, <Zn>.<T>, <Zm>.<T>faddp,Floating-point add pair of elements (scalar)FADDP  H <d>, <Vn>.2HFADDP <V><d>, <Vn>.<T>"FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FADDP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>gcspushxGCSPUSHX -- A642Guarded Control Stack push exception return record	GCSPUSHX SYS   #0, C7, C7, #4{, <Xt>}fmulx-Floating-point multiply extended (by element)	!FMULX <Hd>, <Hn>, <Vm>.H[<index>](FMULX <V><d>, <V><n>, <Vm>.<Ts>[<index>])FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>],FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]FMULX <Hd>, <Hn>, <Hm>FMULX <V><d>, <V><n>, <V><m>"FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FMULX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>umaxqv��Unsigned maximum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as zero. UMAXQV <Vd>.<T>, <Pg>, <Zn>.<Tb>fcvtn9Floating-point convert to lower precision narrow (vector)FCVTN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>%FCVTN <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>&FCVTN{ 2}  <Vd>.<Ta>, <Vn>.4S, <Vm>.4S!FCVTN <Zd>.B, { <Zn1>.H-<Zn2>.H }!FCVTN <Zd>.B, { <Zn1>.S-<Zn4>.S }!FCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }pfalse7Set all elements in the destination predicate to false.
PFALSE <Pd>.Bfvdot�yThe instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with horizontal group of two 8-bit floating-point values in the indexed 16-bit group of the corresponding 128-bit segment of the second source vector. The half-precision sum-of-products are scaled by 2IFVDOT   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IFVDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]revb�
Reverse the order of 8-bit bytes, 16-bit halfwords or 32-bit words within each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.REVB <Zd>.<T>, <Pg>/M, <Zn>.<T>REVH <Zd>.<T>, <Pg>/M, <Zn>.<T>REVW <Zd>.D, <Pg>/M, <Zn>.Duqrshrnb�CShift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&UQRSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>fnmsb�bMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.+FNMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>bfc
BFC -- A64Bitfield clearBFC <Wd>, #<lsb>, #<width>1BFM   <Wd>, WZR, #(-<lsb>  MOD  32), #(<width>-1)BFC <Xd>, #<lsb>, #<width>1BFM   <Xd>, XZR, #(-<lsb>  MOD  64), #(<width>-1)cpyfpwn-Memory copy forward-only, writes non-temporal!CPYFPWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMWN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFEWN  [ <Xd>]!, [<Xs>]!, <Xn>!uqdecd�*Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECD <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECD <Xdn>{, <pattern>{, MUL #<imm>}})UQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}cinvCINV -- A64Conditional invertCINV <Wd>, <Wn>, <invcond> CSINV   <Wd>, <Wn>, <Wm>, <cond>CINV <Xd>, <Xn>, <invcond> CSINV   <Xd>, <Xn>, <Xm>, <cond>stgStore Allocation TagSTG <Xt|SP>, [<Xn|SP>], #<simm> STG <Xt|SP>, [<Xn|SP>, #<simm>]!!STG <Xt|SP>, [<Xn|SP>{, #<simm>}]uaddlt�Add the corresponding odd-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UADDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>bfm
Bitfield move BFM <Wd>, <Wn>, #<immr>, #<imms> BFM <Xd>, <Xn>, #<immr>, #<imms>ldumin7Atomic unsigned minimum on word or doubleword in memoryLDUMIN <Ws>, <Wt>, [<Xn|SP>]LDUMINA <Ws>, <Wt>, [<Xn|SP>]LDUMINAL <Ws>, <Wt>, [<Xn|SP>]LDUMINL <Ws>, <Wt>, [<Xn|SP>]LDUMIN <Xs>, <Xt>, [<Xn|SP>]LDUMINA <Xs>, <Xt>, [<Xn|SP>]LDUMINAL <Xs>, <Xt>, [<Xn|SP>]LDUMINL <Xs>, <Xt>, [<Xn|SP>]2STUMIN <Ws>, [<Xn|SP>]LDUMIN  <Ws>, WZR, [<Xn|SP>]4STUMINL <Ws>, [<Xn|SP>]LDUMINL  <Ws>, WZR, [<Xn|SP>]2STUMIN <Xs>, [<Xn|SP>]LDUMIN  <Xs>, XZR, [<Xn|SP>]4STUMINL <Xs>, [<Xn|SP>]LDUMINL  <Xs>, XZR, [<Xn|SP>]pacnbiasppcPPointer Authentication Code for return address, using key A, not a branch targetPACNBIASPPC rcwset7Read check write atomic bit set on doubleword in memoryRCWSET <Xs>, <Xt>, [<Xn|SP>]RCWSETA <Xs>, <Xt>, [<Xn|SP>]RCWSETAL <Xs>, <Xt>, [<Xn|SP>]RCWSETL <Xs>, <Xt>, [<Xn|SP>]	sha256su1SHA256 schedule update 1#SHA256SU1 <Vd>.4S, <Vn>.4S, <Vm>.4Suzpq2�Concatenate adjacent odd-numbered elements from the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."UZPQ2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>setgpMemory set with tag settingSETGP  [ <Xd>]!, <Xn>!, <Xs>SETGM  [ <Xd>]!, <Xn>!, <Xs>SETGE  [ <Xd>]!, <Xn>!, <Xs>blrBranch with link to registerBLR <Xn>ld23Load multiple 2-element structures to two registers'LD2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>].LD2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-LD2  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>,LD2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>],LD2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>],LD2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>],LD2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]0LD2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #22LD2  { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>0LD2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #42LD2  { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>0LD2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #82LD2  { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>1LD2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #162LD2  { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>asr�ZShift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.*ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>(ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D*ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> ASR <Zd>.<T>, <Zn>.<T>, #<const>ASR <Zd>.<T>, <Zn>.<T>, <Zm>.DASR (register) -- A64!Arithmetic shift right (register)ASR <Wd>, <Wn>, <Wm>ASRV   <Wd>, <Wn>, <Wm>ASR <Xd>, <Xn>, <Xm>ASRV   <Xd>, <Xn>, <Xm>ASR (immediate) -- A64"Arithmetic shift right (immediate)ASR <Wd>, <Wn>, #<shift> SBFM   <Wd>, <Wn>, #<shift>, #31ASR <Xd>, <Xn>, #<shift> SBFM   <Xd>, <Xn>, #<shift>, #63whilele�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, signed scalar operand is less than or equal to the second scalar operand and false thereafter up to the highest numbered element. WHILELE <Pd>.<T>, <R><n>, <R><m>#WHILELE <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELE { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>addhnt�*Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.%ADDHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>msrr>Move two adjacent general-purpose registers to System register?MSRR  ( <systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>), <Xt>, <Xt+1>sqdmullb�Multiply the corresponding even-numbered signed elements of the first and second source vectors, double and place the results in the overlapping double-width elements of the destination vector. Each result element is saturated to the double-width N-bit element's signed integer range -2'SQDMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>&SQDMULLB <Zd>.S, <Zn>.H, <Zm>.H[<imm>]&SQDMULLB <Zd>.D, <Zn>.S, <Zm>.S[<imm>]rax1Rotate and exclusive-ORRAX1 <Vd>.2D, <Vn>.2D, <Vm>.2DRAX1 <Zd>.D, <Zn>.D, <Zm>.Dsha1hSHA1 fixed rotateSHA1H <Sd>, <Sn>frintnGFloating-point round to integral, to nearest with ties to even (vector)FRINTN <Vd>.<T>, <Vn>.<T>FRINTN <Vd>.<T>, <Vn>.<T>FRINTN <Hd>, <Hn>FRINTN <Sd>, <Sn>FRINTN <Dd>, <Dn>/FRINTN { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTN { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }ldsmax5Atomic signed maximum on word or doubleword in memoryLDSMAX <Ws>, <Wt>, [<Xn|SP>]LDSMAXA <Ws>, <Wt>, [<Xn|SP>]LDSMAXAL <Ws>, <Wt>, [<Xn|SP>]LDSMAXL <Ws>, <Wt>, [<Xn|SP>]LDSMAX <Xs>, <Xt>, [<Xn|SP>]LDSMAXA <Xs>, <Xt>, [<Xn|SP>]LDSMAXAL <Xs>, <Xt>, [<Xn|SP>]LDSMAXL <Xs>, <Xt>, [<Xn|SP>]2STSMAX <Ws>, [<Xn|SP>]LDSMAX  <Ws>, WZR, [<Xn|SP>]4STSMAXL <Ws>, [<Xn|SP>]LDSMAXL  <Ws>, WZR, [<Xn|SP>]2STSMAX <Xs>, [<Xn|SP>]LDSMAX  <Xs>, XZR, [<Xn|SP>]4STSMAXL <Xs>, [<Xn|SP>]LDSMAXL  <Xs>, XZR, [<Xn|SP>]urshlr��Shift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Inactive elements in the destination vector register remain unmodified.-URSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>cincCINC -- A64Conditional incrementCINC <Wd>, <Wn>, <invcond> CSINC   <Wd>, <Wn>, <Wm>, <cond>CINC <Xd>, <Xn>, <invcond> CSINC   <Xd>, <Xn>, <Xm>, <cond>cpyfptrnKMemory copy forward-only, reads and writes unprivileged, reads non-temporal"CPYFPTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFETRN  [ <Xd>]!, [<Xs>]!, <Xn>!bmopswThis instruction works with 32-bit element ZA tile. This instruction generates an outer product of the first source SVL.BMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.SldaxrLoad-acquire exclusive registerLDAXR <Wt>, [<Xn|SP>{, #0}]LDAXR <Xt>, [<Xn|SP>{, #0}]notBitwise NOT (vector)NOT <Vd>.<T>, <Vn>.<T>NOT <Zd>.<T>, <Pg>/M, <Zn>.<T>NOT (predicate)�Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.NOT <Pd>.B, <Pg>/Z, <Pn>.B#EOR  <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.B	sqrshrunb�AShift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2'SQRSHRUNB <Zd>.<T>, <Zn>.<Tb>, #<const>ssubwt�
Subtract the even-numbered signed elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SSUBWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>uqrshlr��Shift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2.UQRSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sxtlSXTL, SXTL2 -- A64Signed extend longSXTL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>#SSHLL {2}  <Vd>.<Ta>, <Vn>.<Tb>, #0ccmn(Conditional compare negative (immediate)"CCMN <Wn>, #<imm>, #<nzcv>, <cond>"CCMN <Xn>, #<imm>, #<nzcv>, <cond> CCMN <Wn>, <Wm>, #<nzcv>, <cond> CCMN <Xn>, <Xm>, #<nzcv>, <cond>ssubwSigned subtract wide*SSUBW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>subr�Reversed subtract active elements of the first source vector from corresponding elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.+SUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>,SUBR <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}rev64/Reverse elements in 64-bit doublewords (vector)REV64 <Vd>.<T>, <Vn>.<T>REV64 -- A64
Reverse bytesREV64 <Xd>, <Xn>REV   <Xd>, <Xn>mrrs>Move System register to two adjacent general-purpose registers=MRRS <Xt>, <Xt+1>, (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>)sqrshl0Signed saturating rounding shift left (register)SQRSHL <V><d>, <V><n>, <V><m>#SQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-SQRSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>bfmaxnm�Determine the maximum number value of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.<BFMAXNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H<BFMAXNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HIBFMAXNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }IBFMAXNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }(BFMAXNM <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HbfcvtnHFloating-point convert from single-precision to BFloat16 format (vector)BFCVTN{ 2}  <Vd>.<Ta>, <Vn>.4S"BFCVTN <Zd>.B, { <Zn1>.H-<Zn2>.H }"BFCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }sysSystem instruction(SYS  # <op1>, <Cn>, <Cm>, #<op2>{, <Xt>}usmops>The 8-bit integer variant works with a 32-bit element ZA tile./USMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/USMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hsha1su1SHA1 schedule update 1SHA1SU1 <Vd>.4S, <Vn>.4Sstnt1d�"Contiguous store non-temporal of doublewords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]:STNT1D { <Zt1>.D-<Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]:STNT1D { <Zt1>.D-<Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]?STNT1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}];STNT1D { <Zt1>.D, <Zt2>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3]MSTNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>, [<Xn|SP>, <Xm>, LSL #3])STNT1D { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]tlbipTLBIP -- A64TLB invalidate pair operation TLBIP <tlbip_op>{, <Xt1>, <Xt2>}1SYSP   #<op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}udfPermanently undefinedUDF  # <imm>	sm3partw2	SM3PARTW2#SM3PARTW2 <Vd>.4S, <Vn>.4S, <Vm>.4SmaddMultiply-addMADD <Wd>, <Wn>, <Wm>, <Wa>MADD <Xd>, <Xn>, <Xm>, <Xa>bfi
BFI -- A64Bitfield insert BFI <Wd>, <Wn>, #<lsb>, #<width>2BFM   <Wd>, <Wn>, #(-<lsb>  MOD  32), #(<width>-1) BFI <Xd>, <Xn>, #<lsb>, #<width>2BFM   <Xd>, <Xn>, #(-<lsb>  MOD  64), #(<width>-1)movnMove wide with NOT!MOVN <Wd>, #<imm>{, LSL #<shift>}!MOVN <Xd>, #<imm>{, LSL #<shift>}uabdl!Unsigned absolute difference long*UABDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>uqrshrnt�9Shift each unsigned integer value in the source vector elements by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&UQRSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>	cpyfprtwnAMemory copy forward-only, reads unprivileged, writes non-temporal#CPYFPRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFERTWN  [ <Xd>]!, [<Xs>]!, <Xn>!autdb&Authenticate data address, using key BAUTDB <Xd>, <Xn|SP>AUTDZB <Xd>ldxrLoad exclusive registerLDXR <Wt>, [<Xn|SP>{, #0}]LDXR <Xt>, [<Xn|SP>{, #0}]subhnb�5Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.%SUBHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>gcspopmGCSPOPM -- A64Guarded Control Stack popGCSPOPM    { <Xt>}SYSL   <Xt>, #3, C7, C7, #1pmullb�Polynomial multiply over [0, 1] the corresponding even-numbered elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%PMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>PMULLB <Zd>.Q, <Zn>.D, <Zm>.Dsaddwt�Add the odd-numbered signed elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SADDWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>addvl�	Add the current vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDVL <Xd|SP>, <Xn|SP>, #<imm>prfmPrefetch memory (immediate).PRFM  ( <prfop>|#<imm5>), [<Xn|SP>{, #<pimm>}]!PRFM  ( <prfop>|#<imm5>), <label>GPRFM  ( <prfop>|#<imm5>), [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]sunpkhi�Unpack elements from the lowest or highest half of the source vector and then sign-extend them to place in elements of twice their size within the destination vector. This instruction is unpredicated.SUNPKHI <Zd>.<T>, <Zn>.<Tb>SUNPKLO <Zd>.<T>, <Zn>.<Tb>ucvtf7Unsigned fixed-point convert to floating-point (vector)UCVTF <V><d>, <V><n>, #<fbits>"UCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>UCVTF <Hd>, <Hn>UCVTF <V><d>, <V><n>UCVTF <Vd>.<T>, <Vn>.<T>UCVTF <Vd>.<T>, <Vn>.<T>UCVTF <Hd>, <Wn>, #<fbits>UCVTF <Hd>, <Xn>, #<fbits>UCVTF <Sd>, <Wn>, #<fbits>UCVTF <Sd>, <Xn>, #<fbits>UCVTF <Dd>, <Wn>, #<fbits>UCVTF <Dd>, <Xn>, #<fbits>UCVTF <Hd>, <Wn>UCVTF <Sd>, <Wn>UCVTF <Dd>, <Wn>UCVTF <Hd>, <Xn>UCVTF <Sd>, <Xn>UCVTF <Dd>, <Xn>.UCVTF { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }.UCVTF { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }UCVTF <Zd>.H, <Pg>/M, <Zn>.HUCVTF <Zd>.H, <Pg>/M, <Zn>.SUCVTF <Zd>.S, <Pg>/M, <Zn>.SUCVTF <Zd>.D, <Pg>/M, <Zn>.SUCVTF <Zd>.H, <Pg>/M, <Zn>.DUCVTF <Zd>.S, <Pg>/M, <Zn>.DUCVTF <Zd>.D, <Pg>/M, <Zn>.Dusmlall�aThis unsigned by signed integer multiply-add long-long instruction multiplies each unsigned 8-bit element in the one, two, or four first source vectors with each signed 8-bit indexed element of the second source vector, widens each product to 32-bits and destructively adds these values to the corresponding 32-bit elements of the ZA quad-vector groups.=USMLALL ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]RUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]4USMLALL ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.BIUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BIUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BVUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }VUSMLALL ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }sqxtnt�Saturate the signed integer value in each source element to half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.SQXTNT <Zd>.<T>, <Zn>.<Tb>splice��Select a region from the first source vector and copy it to the lowest-numbered elements of the result. Then set any remaining elements of the result to a copy of the lowest-numbered elements from the second source vector. The region is selected using the first and last true elements in the vector select predicate register. The result is placed destructively in the destination and first source vector, or constructively in the destination vector./SPLICE <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }+SPLICE <Zdn>.<T>, <Pv>, <Zdn>.<T>, <Zm>.<T>ldset.Atomic bit set on word or doubleword in memoryLDSET <Ws>, <Wt>, [<Xn|SP>]LDSETA <Ws>, <Wt>, [<Xn|SP>]LDSETAL <Ws>, <Wt>, [<Xn|SP>]LDSETL <Ws>, <Wt>, [<Xn|SP>]LDSET <Xs>, <Xt>, [<Xn|SP>]LDSETA <Xs>, <Xt>, [<Xn|SP>]LDSETAL <Xs>, <Xt>, [<Xn|SP>]LDSETL <Xs>, <Xt>, [<Xn|SP>]0STSET <Ws>, [<Xn|SP>]LDSET  <Ws>, WZR, [<Xn|SP>]2STSETL <Ws>, [<Xn|SP>]LDSETL  <Ws>, WZR, [<Xn|SP>]0STSET <Xs>, [<Xn|SP>]LDSET  <Xs>, XZR, [<Xn|SP>]2STSETL <Xs>, [<Xn|SP>]LDSETL  <Xs>, XZR, [<Xn|SP>]asrd��Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The result rounds toward zero as in a signed division. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.+ASRD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>ldapursw1Load-acquire RCpc register signed word (unscaled)#LDAPURSW <Xt>, [<Xn|SP>{, #<simm>}]	autiasppcBAuthenticate return address using key A, using an immediate offsetAUTIASPPC <label>cnt
Count bitsCNT <Wd>, <Wn>CNT <Xd>, <Xn>CNT <Vd>.<T>, <Vn>.<T>CNT <Zd>.<T>, <Pg>/M, <Zn>.<T>sqdmullt�Multiply the corresponding odd-numbered signed elements of the first and second source vectors, double and place the results in the overlapping double-width elements of the destination vector. Each result element is saturated to the double-width N-bit element's signed integer range -2'SQDMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>&SQDMULLT <Zd>.S, <Zn>.H, <Zm>.H[<imm>]&SQDMULLT <Zd>.D, <Zn>.S, <Zm>.S[<imm>]eon+Bitwise exclusive-OR NOT (shifted register))EON <Wd>, <Wn>, <Wm>{, <shift> #<amount>})EON <Xd>, <Xn>, <Xm>{, <shift> #<amount>}EON�KBitwise exclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated."EON <Zdn>.<T>, <Zdn>.<T>, #<const>*EOR  <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)sevlSend event localSEVL sshllb�1Shift left by immediate each even-numbered signed element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$SSHLLB <Zd>.<T>, <Zn>.<Tb>, #<const>prfd�Gather prefetch of doublewords from the active memory addresses generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive addresses are not prefetched from memory.&PRFD <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFD <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]+PRFD <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #3]/PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #3]/PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]-PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3]ld1rod�Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]stzgStore Allocation Tag, zeroing STZG <Xt|SP>, [<Xn|SP>], #<simm>!STZG <Xt|SP>, [<Xn|SP>, #<simm>]!"STZG <Xt|SP>, [<Xn|SP>{, #<simm>}]fmlall�7This 8-bit floating-point multiply-add long long instruction widens all 8-bit floating-point elements in the one, two, or four first source vectors and the indexed element of the second source vector to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2=FMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]RFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]4FMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.BIFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BIFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BVFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }VFMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }sbfizSBFIZ -- A64Signed bitfield insert in zeros"SBFIZ <Wd>, <Wn>, #<lsb>, #<width>3SBFM   <Wd>, <Wn>, #(-<lsb>  MOD  32), #(<width>-1)"SBFIZ <Xd>, <Xn>, #<lsb>, #<width>3SBFM   <Xd>, <Xn>, #(-<lsb>  MOD  64), #(<width>-1)stllrStore LORelease registerSTLLR <Wt>, [<Xn|SP>{, #0}]STLLR <Xt>, [<Xn|SP>{, #0}]uabd%Unsigned absolute difference (vector)!UABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+UABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>fmov&Floating-point move immediate (vector)FMOV <Vd>.<T>, #<imm>FMOV <Vd>.<T>, #<imm>FMOV <Vd>.2D, #<imm>FMOV <Hd>, <Hn>FMOV <Sd>, <Sn>FMOV <Dd>, <Dn>FMOV <Wd>, <Hn>FMOV <Xd>, <Hn>FMOV <Hd>, <Wn>FMOV <Sd>, <Wn>FMOV <Wd>, <Sn>FMOV <Hd>, <Xn>FMOV <Dd>, <Xn>FMOV <Vd>.D[1], <Xn>FMOV <Xd>, <Dn>FMOV <Xd>, <Vn>.D[1]FMOV <Hd>, #<imm>FMOV <Sd>, #<imm>FMOV <Dd>, #<imm>FMOV (zero, predicated)�Move floating-point constant +0.0 to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FMOV <Zd>.<T>, <Pg>/M, #0.0CPY  <Zd>.<T>, <Pg>/M, #0FMOV (zero, unpredicated)�Unconditionally broadcast the floating-point constant +0.0 into each element of the destination vector. This instruction is unpredicated.FMOV <Zd>.<T>, #0.0DUP  <Zd>.<T>, #0FMOV (immediate, predicated)�Move a floating-point immediate into each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.FMOV <Zd>.<T>, <Pg>/M, #<const>#FCPY     <Zd>.<T>, <Pg>/M, #<const>FMOV (immediate, unpredicated)�Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is unpredicated.FMOV <Zd>.<T>, #<const>FDUP     <Zd>.<T>, #<const>fcvtnb}Convert each single-precision element of the group of two source vectors to 8-bit floating-point while scaling the value by 2"FCVTNB <Zd>.B, { <Zn1>.S-<Zn2>.S }raddhnb�6Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant rounded half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.&RADDHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>movt�Move 8 bytes to a general-purpose register from the ZT0 register at the byte offset specified by the immediate index. This instruction is UNDEFINED in Non-debug state.MOVT <Xt>, ZT0[<offs>]MOVT    ZT0[ <offs>], <Xt>$MOVT    ZT0 {[<offs>, MUL VL]}, <Zt>rcwsset@Read check write software atomic bit set on doubleword in memoryRCWSSET <Xs>, <Xt>, [<Xn|SP>]RCWSSETA <Xs>, <Xt>, [<Xn|SP>]RCWSSETAL <Xs>, <Xt>, [<Xn|SP>]RCWSSETL <Xs>, <Xt>, [<Xn|SP>]ldff1sw�WGather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector..LDFF1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]7LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]4LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]5LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]-LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ldnf1sh��Contiguous load with non-faulting behavior of signed halfwords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.7LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]7LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]sbfmSigned bitfield move!SBFM <Wd>, <Wn>, #<immr>, #<imms>!SBFM <Xd>, <Xn>, #<immr>, #<imms>sumopa>The 8-bit integer variant works with a 32-bit element ZA tile./SUMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/SUMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Huqinch�*Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCH <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCH <Xdn>{, <pattern>{, MUL #<imm>}})UQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}fminnmv+Floating-point minimum number across vectorFMINNMV <V><d>, <Vn>.<T>FMINNMV  S <d>, <Vn>.4SFMINNMV <V><d>, <Pg>, <Zn>.<T>ldnt1w�Contiguous load non-temporal of words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]<LDNT1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]<LDNT1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]ALDNT1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]=LDNT1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]OLDNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]+LDNT1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]+LDNT1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]ssra-Signed shift right and accumulate (immediate)SSRA  D <d>, D<n>, #<shift>!SSRA <Vd>.<T>, <Vn>.<T>, #<shift>"SSRA <Zda>.<T>, <Zn>.<T>, #<const>ld44Load multiple 4-element structures to four registers=LD4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]DLD4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CLD4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>>LD4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]>LD4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]>LD4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]>LD4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]BLD4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4DLD4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>BLD4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8DLD4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>CLD4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16DLD4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>CLD4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32DLD4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>st64b6Single-copy atomic 64-byte store without status resultST64B <Xt>, [<Xn|SP> {, #0}]cpyprnMemory copy, reads non-temporal CPYPRN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMRN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYERN  [ <Xd>]!, [<Xs>]!, <Xn>!usublUnsigned subtract long*USUBL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>setptMemory set, unprivilegedSETPT  [ <Xd>]!, <Xn>!, <Xs>SETMT  [ <Xd>]!, <Xn>!, <Xs>SETET  [ <Xd>]!, <Xn>!, <Xs>st38Store multiple 3-element structures from three registers2ST3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]9ST3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8ST3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>5ST3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]5ST3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]5ST3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]5ST3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]9ST3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3;ST3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>9ST3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6;ST3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>:ST3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12;ST3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>:ST3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24;ST3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>facltFACLT��Compare active absolute values of floating-point elements in the first source vector being less than corresponding absolute values of elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FACLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FACGT    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>orv�Bitwise inclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as zero.ORV <V><d>, <Pg>, <Zn>.<T>gcsss1
GCSSS1 -- A64$Guarded Control Stack switch stack 1GCSSS1 <Xt>SYS   #3, C7, C7, #2, <Xt>ldeor3Atomic exclusive-OR on word or doubleword in memoryLDEOR <Ws>, <Wt>, [<Xn|SP>]LDEORA <Ws>, <Wt>, [<Xn|SP>]LDEORAL <Ws>, <Wt>, [<Xn|SP>]LDEORL <Ws>, <Wt>, [<Xn|SP>]LDEOR <Xs>, <Xt>, [<Xn|SP>]LDEORA <Xs>, <Xt>, [<Xn|SP>]LDEORAL <Xs>, <Xt>, [<Xn|SP>]LDEORL <Xs>, <Xt>, [<Xn|SP>]0STEOR <Ws>, [<Xn|SP>]LDEOR  <Ws>, WZR, [<Xn|SP>]2STEORL <Ws>, [<Xn|SP>]LDEORL  <Ws>, WZR, [<Xn|SP>]0STEOR <Xs>, [<Xn|SP>]LDEOR  <Xs>, XZR, [<Xn|SP>]2STEORL <Xs>, [<Xn|SP>]LDEORL  <Xs>, XZR, [<Xn|SP>]ldtrsb(Load register signed byte (unprivileged)!LDTRSB <Wt>, [<Xn|SP>{, #<simm>}]!LDTRSB <Xt>, [<Xn|SP>{, #<simm>}]fdotG8-bit floating-point dot product to half-precision (vector, by element)+FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.2B[<index>]$FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>+FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>FDOT <Zda>.S, <Zn>.B, <Zm>.B#FDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]FDOT <Zda>.H, <Zn>.B, <Zm>.B#FDOT <Zda>.H, <Zn>.B, <Zm>.B[<imm>]FDOT <Zda>.S, <Zn>.H, <Zm>.H#FDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]IFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@FDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@FDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BMFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }MFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }IFDOT    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IFDOT    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@FDOT    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@FDOT    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BMFDOT    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }MFDOT    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }IFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@FDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@FDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMFDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MFDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }rshrn'Rounding shift right narrow (immediate))RSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>fccmp1Floating-point conditional quiet compare (scalar)!FCCMP <Hn>, <Hm>, #<nzcv>, <cond>!FCCMP <Sn>, <Sm>, #<nzcv>, <cond>!FCCMP <Dn>, <Dm>, #<nzcv>, <cond>cpyfpn7Memory copy forward-only, reads and writes non-temporal CPYFPN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFMN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFEN  [ <Xd>]!, [<Xs>]!, <Xn>!rdffrsRead the first-fault register (RDFFRS <Pd>.B, <Pg>/Zumlalt�Multiply the corresponding odd-numbered unsigned elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLALT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLALT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]cnot�Logically invert the boolean value in each active element of the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.CNOT <Zd>.<T>, <Pg>/M, <Zn>.<T>cash#Compare and swap halfword in memory CASH <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASAH <Ws>, <Wt>, [<Xn|SP>{, #0}]"CASALH <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASLH <Ws>, <Wt>, [<Xn|SP>{, #0}]brkbs�Sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the BRKBS <Pd>.B, <Pg>/Z, <Pn>.BaddgAdd with tag)ADDG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>cpyfpMemory copy forward-onlyCPYFP  [ <Xd>]!, [<Xs>]!, <Xn>!CPYFM  [ <Xd>]!, [<Xs>]!, <Xn>!CPYFE  [ <Xd>]!, [<Xs>]!, <Xn>!bfmlal?BFloat16 floating-point widening multiply-add long (by element)
.BFMLAL <bt>  <Vd>.4S, <Vn>.8H, <Vm>.H[<index>]&BFMLAL <bt>  <Vd>.4S, <Vn>.8H, <Vm>.8H=BFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4BFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VBFMLAL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }raddhnt�2Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant rounded half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.&RADDHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>dup,Duplicate vector element to vector or scalarDUP <V><d>, <Vn>.<T>[<index>] DUP <Vd>.<T>, <Vn>.<Ts>[<index>]DUP <Vd>.<T>, <R><n>DUP <Zd>.<T>, #<imm>{, <shift>}DUP <Zd>.<T>, <R><n|SP>DUP <Zd>.<T>, <Zn>.<T>[<imm>]stilp'Store-release ordered pair of registers#STILP <Wt1>, <Wt2>, [<Xn|SP>, #-8]!STILP <Wt1>, <Wt2>, [<Xn|SP>]$STILP <Xt1>, <Xt2>, [<Xn|SP>, #-16]!STILP <Xt1>, <Xt2>, [<Xn|SP>]ldapursh5Load-acquire RCpc register signed halfword (unscaled)#LDAPURSH <Wt>, [<Xn|SP>{, #<simm>}]#LDAPURSH <Xt>, [<Xn|SP>{, #<simm>}]pacib@Pointer Authentication Code for instruction address, using key BPACIB <Xd>, <Xn|SP>PACIZB <Xd>
PACIB1716 PACIBSP PACIBZ ldff1d�VGather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.-LDFF1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #3}]6LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]3LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]4LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3],LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]st4b�0Contiguous store four-byte structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]BST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>, <Xm>]cmge-Compare signed greater than or equal (vector)CMGE  D <d>, D<n>, D<m>!CMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>CMGE  D <d>, D<n>, #0CMGE <Vd>.<T>, <Vn>.<T>, #0subpsSubtract pointer, setting flagsSUBPS <Xd>, <Xn|SP>, <Xm|SP>smlsll��This signed integer multiply-subtract long-long instruction multiplies each signed 8-bit or 16-bit element in the one, two, or four first source vectors with each signed 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively subtracts these values from the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=SMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=SMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RSMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RSMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLSLL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RSMLSLL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<SMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dSMLSLL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }lsrvLogical shift right variableLSRV <Wd>, <Wn>, <Wm>LSRV <Xd>, <Xn>, <Xm>dmbData memory barrierDMB  ( <option>|#<imm>)cbzCompare and branch on zeroCBZ <Wt>, <label>CBZ <Xt>, <label>irgInsert random tagIRG <Xd|SP>, <Xn|SP>{, <Xm>}saba)Signed absolute difference and accumulate!SABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"SABA <Zda>.<T>, <Zn>.<T>, <Zm>.<T>ldnt1sw�,Gather load non-temporal of signed words to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LDNT1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]bf1cvt�Convert each 8-bit floating-point element of the source vector to BFloat16 while downscaling the value, and place the results in the corresponding 16-bit elements of the destination vectors. BF1CVT scales the values by 2"BF1CVT { <Zd1>.H-<Zd2>.H }, <Zn>.B"BF2CVT { <Zd1>.H-<Zd2>.H }, <Zn>.BBF1CVT <Zd>.H, <Zn>.BBF2CVT <Zd>.H, <Zn>.BsbSpeculation barrierSB facgt��Compare active absolute values of floating-point elements in the first source vector with corresponding absolute values of elements in the second source vector, and place the boolean results of the specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FACGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FACGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>FACGT <Hd>, <Hn>, <Hm>FACGT <V><d>, <V><n>, <V><m>"FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>crc32bCRC32 checksumCRC32B <Wd>, <Wn>, <Wm>CRC32H <Wd>, <Wn>, <Wm>CRC32W <Wd>, <Wn>, <Wm>CRC32X <Wd>, <Wn>, <Xm>gmiTag mask insertGMI <Xd>, <Xn|SP>, <Xm>st2gStore Allocation Tags ST2G <Xt|SP>, [<Xn|SP>], #<simm>!ST2G <Xt|SP>, [<Xn|SP>, #<simm>]!"ST2G <Xt|SP>, [<Xn|SP>{, #<simm>}]suvdot��The signed by unsigned integer vertical dot product instruction computes the vertical dot product of the corresponding signed 8-bit elements from the four first source vectors and four unsigned 8-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product result is destructively added to the corresponding 32-bit element of the ZA single-vector groups.ISUVDOT  ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]cmhi Compare unsigned higher (vector)CMHI  D <d>, D<n>, D<m>!CMHI <Vd>.<T>, <Vn>.<T>, <Vm>.<T>adcAdd with carryADC <Wd>, <Wn>, <Wm>ADC <Xd>, <Xn>, <Xm>stlrhStore-release register halfwordSTLRH <Wt>, [<Xn|SP>{, #0}]bfclamp�jClamp each BFloat16 element in the two or four destination vectors to between the BFloat16 minimum value in the corresponding element of the first source vector and the BFloat16 maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.+BFCLAMP { <Zd1>.H-<Zd2>.H }, <Zn>.H, <Zm>.H+BFCLAMP { <Zd1>.H-<Zd4>.H }, <Zn>.H, <Zm>.HBFCLAMP <Zd>.H, <Zn>.H, <Zm>.Hwhilehs�Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, unsigned scalar operand is higher or same as the second scalar operand and false thereafter down to the lowest numbered element. WHILEHS <Pd>.<T>, <R><n>, <R><m>#WHILEHS <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEHS { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>rcwsclrBRead check write software atomic bit clear on doubleword in memoryRCWSCLR <Xs>, <Xt>, [<Xn|SP>]RCWSCLRA <Xs>, <Xt>, [<Xn|SP>]RCWSCLRAL <Xs>, <Xt>, [<Xn|SP>]RCWSCLRL <Xs>, <Xt>, [<Xn|SP>]sqincd�kDetermines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCD <Xdn>{, <pattern>{, MUL #<imm>}})SQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}cpyprtwn4Memory copy, reads unprivileged, writes non-temporal"CPYPRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMRTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYERTWN  [ <Xd>]!, [<Xs>]!, <Xn>!smullb�Multiply the corresponding even-numbered signed elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SMULLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>$SMULLB <Zd>.S, <Zn>.H, <Zm>.H[<imm>]$SMULLB <Zd>.D, <Zn>.S, <Zm>.S[<imm>]fadda�PFloating-point add a SIMD&amp;FP scalar source and all active lanes of the vector source and place the result destructively in the SIMD&amp;FP scalar source register. Vector elements are processed strictly in order from low to high, with the scalar source providing the initial value. Inactive elements in the source vector are ignored.&FADDA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>sqincp�Counts the number of true elements in the source predicate and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.SQINCP <Xdn>, <Pm>.<T>, <Wdn>SQINCP <Xdn>, <Pm>.<T>SQINCP <Zdn>.<T>, <Pm>.<T>sqrshrn8Signed saturating rounded shift right narrow (immediate)"SQRSHRN <Vb><d>, <Va><n>, #<shift>+SQRSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>-SQRSHRN <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>5SQRSHRN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>sqrshru�
Shift right by an immediate value, the signed integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2-SQRSHRU <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>5SQRSHRU <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>uqrshr�Shift right by an immediate value, the unsigned integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2,UQRSHR <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>4UQRSHR <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>ushr Unsigned shift right (immediate)USHR  D <d>, D<n>, #<shift>!USHR <Vd>.<T>, <Vn>.<T>, #<shift>brb
BRB -- A64Branch record bufferBRB <brb_op> SYS   #1, C7, C2, #<op2>{, <Xt>}	sqdmlslbt�Multiply then double the corresponding even-numbered signed elements of the first and odd-numbered signed elements of the second source vector. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2)SQDMLSLBT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>tcancelCancel current transactionTCANCEL  # <imm>sbfxSBFX -- A64Signed bitfield extract!SBFX <Wd>, <Wn>, #<lsb>, #<width>-SBFM   <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)!SBFX <Xd>, <Xn>, #<lsb>, #<width>-SBFM   <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)mov!MOV (to/from SP) -- A64Move (to/from SP)MOV <Wd|WSP>, <Wn|WSP>ADD   <Wd|WSP>, <Wn|WSP>, #0MOV <Xd|SP>, <Xn|SP>ADD   <Xd|SP>, <Xn|SP>, #0$MOV (predicate, predicated, zeroing)�Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.MOV <Pd>.B, <Pg>/Z, <Pn>.B#AND  <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.B$MOV (immediate, predicated, zeroing)�Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register are set to zero.'MOV <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>},CPY      <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}$MOV (immediate, predicated, merging)�Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.'MOV <Zd>.<T>, <Pg>/M, #<imm>{, <shift>},CPY      <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}MOV (scalar, predicated)�Move the general-purpose scalar source register to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.MOV <Zd>.<T>, <Pg>/M, <R><n|SP>$CPY      <Zd>.<T>, <Pg>/M, <R><n|SP>$MOV (SIMD&amp;FP scalar, predicated)�Move the SIMD &amp; floating-point scalar source register to each active element in the destination vector. Inactive elements in the destination vector register remain unmodified.MOV <Zd>.<T>, <Pg>/M, <V><n>!CPY      <Zd>.<T>, <Pg>/M, <V><n>MOV (scalar) -- A64Move vector element to scalarMOV <V><d>, <Vn>.<T>[<index>]DUP   <V><d>, <Vn>.<T>[<index>]MOV (immediate, unpredicated)�Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is unpredicated.MOV <Zd>.<T>, #<imm>{, <shift>}$DUP      <Zd>.<T>, #<imm>{, <shift>}MOV (scalar, unpredicated)�Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This instruction is unpredicated.MOV <Zd>.<T>, <R><n|SP>DUP      <Zd>.<T>, <R><n|SP>&MOV (SIMD&amp;FP scalar, unpredicated)Unconditionally broadcast the SIMD&amp;FP scalar into each element of the destination vector. This instruction is unpredicated.MOV <Zd>.<T>, <Zn>.<T>[<imm>]"DUP      <Zd>.<T>, <Zn>.<T>[<imm>]MOV <Zd>.<T>, <V><n>DUP  <Zd>.<T>, <Zn>.<T>[0]MOV��Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits.MOV <Zd>.<T>, #<const>DUPM     <Zd>.<T>, #<const>MOV (element) -- A64-Move vector element to another vector element,MOV <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>].INS   <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]MOV (from general) -- A641Move general-purpose register to a vector elementMOV <Vd>.<Ts>[<index>], <R><n> INS   <Vd>.<Ts>[<index>], <R><n>#MOV (tile to vector, two registers)The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.9MOV { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>]>MOVA     { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>];MOV { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>]@MOVA     { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>];MOV { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>]@MOVA     { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>];MOV { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>]@MOVA     { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>]$MOV (tile to vector, four registers)�The instruction operates on four consecutive horizontal or vertical slices within a named ZA tile of the specified element size.9MOV { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>]>MOVA     { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>];MOV { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>]@MOVA     { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>];MOV { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>]@MOVA     { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>];MOV { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]@MOVA     { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]$MOV (array to vector, two registers)8The instruction operates on two ZA single-vector groups.3MOV { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]8MOVA     { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]%MOV (array to vector, four registers)9The instruction operates on four ZA single-vector groups.3MOV { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}]8MOVA     { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}]MOV (tile to vector, single)�zThe instruction operates on individual horizontal or vertical slices within a named ZA tile of the specified element size. The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of such elements in a vector. The immediate offset is in the range 0 to the number of elements in a 128-bit vector segment minus 1.
+MOV <Zd>.B, <Pg>/M, ZA0<HV>.B[<Ws>, <offs>]0MOVA     <Zd>.B, <Pg>/M, ZA0<HV>.B[<Ws>, <offs>]-MOV <Zd>.H, <Pg>/M, <ZAn><HV>.H[<Ws>, <offs>]2MOVA     <Zd>.H, <Pg>/M, <ZAn><HV>.H[<Ws>, <offs>]-MOV <Zd>.S, <Pg>/M, <ZAn><HV>.S[<Ws>, <offs>]2MOVA     <Zd>.S, <Pg>/M, <ZAn><HV>.S[<Ws>, <offs>]-MOV <Zd>.D, <Pg>/M, <ZAn><HV>.D[<Ws>, <offs>]2MOVA     <Zd>.D, <Pg>/M, <ZAn><HV>.D[<Ws>, <offs>]-MOV <Zd>.Q, <Pg>/M, <ZAn><HV>.Q[<Ws>, <offs>]2MOVA     <Zd>.Q, <Pg>/M, <ZAn><HV>.Q[<Ws>, <offs>]#MOV (vector to tile, two registers)The instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.>MOV     ZA0 <HV>.B[<Ws>, <offs1>:<offs2>], { <Zn1>.B-<Zn2>.B }>MOVA     ZA0<HV>.B[<Ws>, <offs1>:<offs2>], { <Zn1>.B-<Zn2>.B };MOV <ZAd><HV>.H[<Ws>, <offs1>:<offs2>], { <Zn1>.H-<Zn2>.H }@MOVA     <ZAd><HV>.H[<Ws>, <offs1>:<offs2>], { <Zn1>.H-<Zn2>.H };MOV <ZAd><HV>.S[<Ws>, <offs1>:<offs2>], { <Zn1>.S-<Zn2>.S }@MOVA     <ZAd><HV>.S[<Ws>, <offs1>:<offs2>], { <Zn1>.S-<Zn2>.S };MOV <ZAd><HV>.D[<Ws>, <offs1>:<offs2>], { <Zn1>.D-<Zn2>.D }@MOVA     <ZAd><HV>.D[<Ws>, <offs1>:<offs2>], { <Zn1>.D-<Zn2>.D }$MOV (vector to tile, four registers)�The instruction operates on four consecutive horizontal or vertical slices within a named ZA tile of the specified element size.>MOV     ZA0 <HV>.B[<Ws>, <offs1>:<offs4>], { <Zn1>.B-<Zn4>.B }>MOVA     ZA0<HV>.B[<Ws>, <offs1>:<offs4>], { <Zn1>.B-<Zn4>.B };MOV <ZAd><HV>.H[<Ws>, <offs1>:<offs4>], { <Zn1>.H-<Zn4>.H }@MOVA     <ZAd><HV>.H[<Ws>, <offs1>:<offs4>], { <Zn1>.H-<Zn4>.H };MOV <ZAd><HV>.S[<Ws>, <offs1>:<offs4>], { <Zn1>.S-<Zn4>.S }@MOVA     <ZAd><HV>.S[<Ws>, <offs1>:<offs4>], { <Zn1>.S-<Zn4>.S };MOV <ZAd><HV>.D[<Ws>, <offs1>:<offs4>], { <Zn1>.D-<Zn4>.D }@MOVA     <ZAd><HV>.D[<Ws>, <offs1>:<offs4>], { <Zn1>.D-<Zn4>.D }$MOV (vector to array, two registers)8The instruction operates on two ZA single-vector groups.8MOV     ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }8MOVA     ZA.D[<Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }%MOV (vector to array, four registers)9The instruction operates on four ZA single-vector groups.8MOV     ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }8MOVA     ZA.D[<Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }MOV (vector to tile, single)�zThe instruction operates on individual horizontal or vertical slices within a named ZA tile of the specified element size. The slice number within the tile is selected by the sum of the slice index register and immediate offset, modulo the number of such elements in a vector. The immediate offset is in the range 0 to the number of elements in a 128-bit vector segment minus 1.
0MOV     ZA0 <HV>.B[<Ws>, <offs>], <Pg>/M, <Zn>.B0MOVA     ZA0<HV>.B[<Ws>, <offs>], <Pg>/M, <Zn>.B-MOV <ZAd><HV>.H[<Ws>, <offs>], <Pg>/M, <Zn>.H2MOVA     <ZAd><HV>.H[<Ws>, <offs>], <Pg>/M, <Zn>.H-MOV <ZAd><HV>.S[<Ws>, <offs>], <Pg>/M, <Zn>.S2MOVA     <ZAd><HV>.S[<Ws>, <offs>], <Pg>/M, <Zn>.S-MOV <ZAd><HV>.D[<Ws>, <offs>], <Pg>/M, <Zn>.D2MOVA     <ZAd><HV>.D[<Ws>, <offs>], <Pg>/M, <Zn>.D-MOV <ZAd><HV>.Q[<Ws>, <offs>], <Pg>/M, <Zn>.Q2MOVA     <ZAd><HV>.Q[<Ws>, <offs>], <Pg>/M, <Zn>.Q$MOV (inverted wide immediate) -- A64Move (inverted wide immediate)MOV <Wd>, #<imm>$MOVN   <Wd>, #<imm16>, LSL  #<shift>MOV <Xd>, #<imm>$MOVN   <Xd>, #<imm16>, LSL  #<shift>MOV (wide immediate) -- A64Move (wide immediate)MOV <Wd>, #<imm>$MOVZ   <Wd>, #<imm16>, LSL  #<shift>MOV <Xd>, #<imm>$MOVZ   <Xd>, #<imm16>, LSL  #<shift>MOV (vector) -- A64Move vectorMOV <Vd>.<T>, <Vn>.<T>"ORR   <Vd>.<T>, <Vn>.<T>, <Vn>.<T>MOV (bitmask immediate) -- A64Move (bitmask immediate)MOV <Wd|WSP>, #<imm>ORR   <Wd|WSP>, WZR, #<imm>MOV <Xd|SP>, #<imm>ORR   <Xd|SP>, XZR, #<imm>MOV (register) -- A64Move (register)MOV <Wd>, <Wm>ORR   <Wd>, WZR, <Wm>MOV <Xd>, <Xm>ORR   <Xd>, XZR, <Xm>MOV�Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated. Does not set the condition flags.MOV <Pd>.B, <Pn>.B#ORR  <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.BMOV (vector, unpredicated)7Move vector register. This instruction is unpredicated.MOV <Zd>.D, <Zn>.DORR  <Zd>.D, <Zn>.D, <Zn>.D$MOV (predicate, predicated, merging)�Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register remain unmodified. Does not set the condition flags.MOV <Pd>.B, <Pg>/M, <Pn>.B!SEL  <Pd>.B, <Pg>, <Pn>.B, <Pd>.BMOV (vector, predicated)�Move elements from the source vector to the corresponding elements of the destination vector. Inactive elements in the destination vector register remain unmodified.MOV <Zd>.<T>, <Pv>/M, <Zn>.<T>'SEL  <Zd>.<T>, <Pv>, <Zn>.<T>, <Zd>.<T>MOV (to general) -- A64/Move vector element to general-purpose registerMOV <Wd>, <Vn>.S[<index>]UMOV   <Wd>, <Vn>.S[<index>]MOV <Xd>, <Vn>.D[<index>]UMOV   <Xd>, <Vn>.D[<index>]stnt1w�Contiguous store non-temporal of words from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]:STNT1W { <Zt1>.S-<Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]:STNT1W { <Zt1>.S-<Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]?STNT1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}];STNT1W { <Zt1>.S, <Zt2>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2]MSTNT1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>, [<Xn|SP>, <Xm>, LSL #2])STNT1W { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}])STNT1W { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]ubfizUBFIZ -- A64!Unsigned bitfield insert in zeros"UBFIZ <Wd>, <Wn>, #<lsb>, #<width>3UBFM   <Wd>, <Wn>, #(-<lsb>  MOD  32), #(<width>-1)"UBFIZ <Xd>, <Xn>, #<lsb>, #<width>3UBFM   <Xd>, <Xn>, #(-<lsb>  MOD  64), #(<width>-1)msr(Move immediate value to special registerMSR <pstatefield>, #<imm>6MSR  ( <systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>), <Xt>tblq�(For each 128-bit destination vector segment, reads each element of the corresponding second source (index) vector segment and uses its value to select an indexed element from the corresponding first source (table) vector segment. The indexed table element is placed in the element of the destination vector that corresponds to the index vector element. If an index value is greater than or equal to the number of elements in a 128-bit vector segment then it places zero in the corresponding destination vector element. This instruction is unpredicated.%TBLQ <Zd>.<T>, { <Zn>.<T> }, <Zm>.<T>uqrshrn:Unsigned saturating rounded shift right narrow (immediate)"UQRSHRN <Vb><d>, <Va><n>, #<shift>+UQRSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>-UQRSHRN <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>5UQRSHRN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>uzp1Unzip vectors (primary)!UZP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!UZP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!UZP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!UZP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>UZP1 <Zd>.Q, <Zn>.Q, <Zm>.Q!UZP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>UZP2 <Zd>.Q, <Zn>.Q, <Zm>.Qsha1cSHA1 hash update (choose)SHA1C <Qd>, <Sn>, <Vm>.4Suqshrnt�AShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2%UQSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>bfmmlaBBFloat16 floating-point matrix multiply-accumulate into 2x2 matrix BFMMLA <Vd>.4S, <Vn>.8H, <Vm>.8HBFMMLA <Zda>.S, <Zn>.H, <Zm>.Hfcmeq%Floating-point compare equal (vector)FCMEQ <Hd>, <Hn>, <Hm>FCMEQ <V><d>, <V><n>, <V><m>"FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FCMEQ <Hd>, <Hn>, #0.0FCMEQ <V><d>, <V><n>, #0.0FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0&FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0&FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0*FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*FCMUO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>
autiasppcr9Authenticate return address using key A, using a registerAUTIASPPCR <Xn>fcvtauZFloating-point convert to unsigned integer, rounding to nearest with ties to away (vector)
FCVTAU <Hd>, <Hn>FCVTAU <V><d>, <V><n>FCVTAU <Vd>.<T>, <Vn>.<T>FCVTAU <Vd>.<T>, <Vn>.<T>FCVTAU <Wd>, <Hn>FCVTAU <Xd>, <Hn>FCVTAU <Wd>, <Sn>FCVTAU <Xd>, <Sn>FCVTAU <Wd>, <Dn>FCVTAU <Xd>, <Dn>ld3w�1Contiguous load three-word structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]sabdSigned absolute difference!SABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+SABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sha1pSHA1 hash update (parity)SHA1P <Qd>, <Sn>, <Vm>.4Ssha256h2SHA256 hash update (part 2)SHA256H2 <Qd>, <Qn>, <Vm>.4Sldrsh)Load register signed halfword (immediate)LDRSH <Wt>, [<Xn|SP>], #<simm>LDRSH <Xt>, [<Xn|SP>], #<simm>LDRSH <Wt>, [<Xn|SP>, #<simm>]!LDRSH <Xt>, [<Xn|SP>, #<simm>]! LDRSH <Wt>, [<Xn|SP>{, #<pimm>}] LDRSH <Xt>, [<Xn|SP>{, #<pimm>}]9LDRSH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]9LDRSH <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]sqdecb�jDetermines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECB <Xdn>{, <pattern>{, MUL #<imm>}}uzp�Concatenate every fourth element from each of the four source vectors and place them in the corresponding elements of the four destination vectors.4UZP { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<T>-<Zn4>.<T> },UZP { <Zd1>.Q-<Zd4>.Q }, { <Zn1>.Q-<Zn4>.Q }/UZP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>'UZP { <Zd1>.Q-<Zd2>.Q }, <Zn>.Q, <Zm>.QcpyfptwnLMemory copy forward-only, reads and writes unprivileged, writes non-temporal"CPYFPTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFMTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYFETWN  [ <Xd>]!, [<Xs>]!, <Xn>!stlxrh)Store-release exclusive register halfword"STLXRH <Ws>, <Wt>, [<Xn|SP>{, #0}]usdotBDot product with unsigned and signed integers (vector, by element)
,USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]%USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>USDOT <Zda>.S, <Zn>.B, <Zm>.B$USDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]IUSDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IUSDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]@USDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B@USDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BMUSDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }MUSDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }cmploCMPLO (vectors)�PCompare active unsigned integer elements in the first source vector being lower than corresponding unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLO <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPHI    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>fminnmqv�9Floating-point minimum number of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the default NaN."FMINNMQV <Vd>.<T>, <Pg>, <Zn>.<Tb>gcsbGuarded Control Stack barrierGCSB  DSYNC ldnt1b�Contiguous load non-temporal of bytes to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LDNT1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]4LDNT1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]ALDNT1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LDNT1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]GLDNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]+LDNT1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]+LDNT1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]*LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]psb!Profiling synchronization barrierPSB  CSYNC sqdmlsl>Signed saturating doubling multiply-subtract long (by element),SQDMLSL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]5SQDMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]!SQDMLSL <Va><d>, <Vb><n>, <Vb><m>,SQDMLSL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>scvtf5Signed fixed-point convert to floating-point (vector)SCVTF <V><d>, <V><n>, #<fbits>"SCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>SCVTF <Hd>, <Hn>SCVTF <V><d>, <V><n>SCVTF <Vd>.<T>, <Vn>.<T>SCVTF <Vd>.<T>, <Vn>.<T>SCVTF <Hd>, <Wn>, #<fbits>SCVTF <Hd>, <Xn>, #<fbits>SCVTF <Sd>, <Wn>, #<fbits>SCVTF <Sd>, <Xn>, #<fbits>SCVTF <Dd>, <Wn>, #<fbits>SCVTF <Dd>, <Xn>, #<fbits>SCVTF <Hd>, <Wn>SCVTF <Sd>, <Wn>SCVTF <Dd>, <Wn>SCVTF <Hd>, <Xn>SCVTF <Sd>, <Xn>SCVTF <Dd>, <Xn>.SCVTF { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }.SCVTF { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }SCVTF <Zd>.H, <Pg>/M, <Zn>.HSCVTF <Zd>.H, <Pg>/M, <Zn>.SSCVTF <Zd>.S, <Pg>/M, <Zn>.SSCVTF <Zd>.D, <Pg>/M, <Zn>.SSCVTF <Zd>.H, <Pg>/M, <Zn>.DSCVTF <Zd>.S, <Pg>/M, <Zn>.DSCVTF <Zd>.D, <Pg>/M, <Zn>.DfrecpsFloating-point reciprocal stepFRECPS <Hd>, <Hn>, <Hm>FRECPS <V><d>, <V><n>, <V><m>#FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FRECPS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ldclrp&Atomic bit clear on quadword in memoryLDCLRP <Xt1>, <Xt2>, [<Xn|SP>]LDCLRPA <Xt1>, <Xt2>, [<Xn|SP>] LDCLRPAL <Xt1>, <Xt2>, [<Xn|SP>]LDCLRPL <Xt1>, <Xt2>, [<Xn|SP>]bsl2n�CSelects bits from the first source vector where the corresponding bit in the third source vector is '1', and from the inverted second source vector where the corresponding bit in the third source vector is '0'. The result is placed destructively in the destination and first source vector. This instruction is unpredicated.&BSL2N <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dsqshrunb�CShift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&SQSHRUNB <Zd>.<T>, <Zn>.<Tb>, #<const>cmhs(Compare unsigned higher or same (vector)CMHS  D <d>, D<n>, D<m>!CMHS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>mla0Multiply-add to accumulator (vector, by element)*MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>] MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)MLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>"MLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]"MLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>]"MLA <Zda>.D, <Zn>.D, <Zm>.D[<imm>]sm3ss1SM3SS1)SM3SS1 <Vd>.4S, <Vn>.4S, <Vm>.4S, <Va>.4Suaddlb�Add the corresponding even-numbered unsigned elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UADDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>cpypwtwn1Memory copy, writes unprivileged and non-temporal"CPYPWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYEWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!setf8)Evaluation of 8-bit or 16-bit flag values
SETF8 <Wn>SETF16 <Wn>ld1rd�Load a single doubleword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 8 in the range 0 to 504.-LD1RD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]ldaxp(Load-acquire exclusive pair of registers#LDAXP <Wt1>, <Wt2>, [<Xn|SP>{, #0}]#LDAXP <Xt1>, <Xt2>, [<Xn|SP>{, #0}]sha256hSHA256 hash update (part 1)SHA256H <Qd>, <Qn>, <Vm>.4Sld4h�3Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]whilels�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, unsigned scalar operand is lower or same as the second scalar operand and false thereafter up to the highest numbered element. WHILELS <Pd>.<T>, <R><n>, <R><m>#WHILELS <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELS { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>saddlvSigned add long across vectorSADDLV <V><d>, <Vn>.<T>dupq�Unconditionally broadcast the indexed element within each 128-bit source vector segment to all elements of the corresponding destination vector segment. This instruction is unpredicated.DUPQ <Zd>.<T>, <Zn>.<T>[<imm>]uqsubUnsigned saturating subtractUQSUB <V><d>, <V><n>, <V><m>"UQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UQSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-UQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"UQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>fmul$Floating-point multiply (by element) FMUL <Hd>, <Hn>, <Vm>.H[<index>]'FMUL <V><d>, <V><n>, <Vm>.<Ts>[<index>](FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]+FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]!FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMUL <Hd>, <Hn>, <Hm>FMUL <Sd>, <Sn>, <Sm>FMUL <Dd>, <Dn>, <Dm>*FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!FMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>"FMUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]"FMUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]"FMUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]ld1sh�=Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LD1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}],LD1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]5LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]1LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]1LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]5LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]5LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]2LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]2LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]3LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]+LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ldapurh.Load-acquire RCpc register halfword (unscaled)"LDAPURH <Wt>, [<Xn|SP>{, #<simm>}]rcwclrp7Read check write atomic bit clear on quadword in memoryRCWCLRP <Xt1>, <Xt2>, [<Xn|SP>] RCWCLRPA <Xt1>, <Xt2>, [<Xn|SP>]!RCWCLRPAL <Xt1>, <Xt2>, [<Xn|SP>] RCWCLRPL <Xt1>, <Xt2>, [<Xn|SP>]cmla��Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of the integral numbers in the first source vector by the corresponding complex number in the second source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation.+CMLA <Zda>.<T>, <Zn>.<T>, <Zm>.<T>, <const>,CMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>,CMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>ushll$Unsigned shift left long (immediate))USHLL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, #<shift>csdb'Consumption of speculative data barrierCSDB ssubwb�
Subtract the even-numbered signed elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SSUBWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>smopa5This instruction works with a 32-bit element ZA tile..SMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.SMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.SMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hldaxrb$Load-acquire exclusive register byteLDAXRB <Wt>, [<Xn|SP>{, #0}]smmla8Signed 8-bit integer matrix multiply-accumulate (vector)!SMMLA <Vd>.4S, <Vn>.16B, <Vm>.16BSMMLA <Zda>.S, <Zn>.B, <Zm>.Bfmmla��The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2 matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the Effective SVE vector length is at least 256 bits.FMMLA <Zda>.S, <Zn>.S, <Zm>.SFMMLA <Zda>.D, <Zn>.D, <Zm>.DpmullPolynomial multiply long*PMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>strh#Store register halfword (immediate)STRH <Wt>, [<Xn|SP>], #<simm>STRH <Wt>, [<Xn|SP>, #<simm>]!STRH <Wt>, [<Xn|SP>{, #<pimm>}]8STRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]usublt�Subtract the odd-numbered unsigned elements of the second source vector from the corresponding unsigned elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%USUBLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>wfetWait for event with timeout	WFET <Xt>stur,Store SIMD&amp;FP register (unscaled offset)STUR <Bt>, [<Xn|SP>{, #<simm>}]STUR <Ht>, [<Xn|SP>{, #<simm>}]STUR <St>, [<Xn|SP>{, #<simm>}]STUR <Dt>, [<Xn|SP>{, #<simm>}]STUR <Qt>, [<Xn|SP>{, #<simm>}]STUR <Wt>, [<Xn|SP>{, #<simm>}]STUR <Xt>, [<Xn|SP>{, #<simm>}]shrnShift right narrow (immediate)(SHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>ssublt�Subtract the odd-numbered signed elements of the second source vector from the corresponding signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SSUBLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>zip�Place the four-way interleaved elements from the four source vectors in the corresponding elements of the four destination vectors.4ZIP { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<T>-<Zn4>.<T> },ZIP { <Zd1>.Q-<Zd4>.Q }, { <Zn1>.Q-<Zn4>.Q }/ZIP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>'ZIP { <Zd1>.Q-<Zd2>.Q }, <Zn>.Q, <Zm>.QcsetCSET -- A64Conditional setCSET <Wd>, <invcond>CSINC   <Wd>, WZR, WZR, <cond>CSET <Xd>, <invcond>CSINC   <Xd>, XZR, XZR, <cond>sqsubSigned saturating subtractSQSUB <V><d>, <V><n>, <V><m>"SQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SQSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-SQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"SQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ld4q�3Contiguous load four-quadword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #4]fmsub/Floating-point fused multiply-subtract (scalar)FMSUB <Hd>, <Hn>, <Hm>, <Ha>FMSUB <Sd>, <Sn>, <Sm>, <Sa>FMSUB <Dd>, <Dn>, <Dm>, <Da>fminp3Floating-point minimum of pair of elements (scalar)FMINP  H <d>, <Vn>.2HFMINP <V><d>, <Vn>.<T>"FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FMINP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>nand�2Bitwise NAND active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.#NAND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bf1cvtl78-bit floating-point convert to half-precision (vector)F1CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>F2CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>"F1CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.B"F2CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.BbfcvtHFloating-point convert from single-precision to BFloat16 format (scalar)BFCVT <Hd>, <Sn>!BFCVT <Zd>.B, { <Zn1>.H-<Zn2>.H }!BFCVT <Zd>.H, { <Zn1>.S-<Zn2>.S }BFCVT <Zd>.H, <Pg>/M, <Zn>.SpacnbibsppcPPointer Authentication Code for return address, using key B, not a branch targetPACNBIBSPPC ldraa*Load register, with pointer authentication LDRAA <Xt>, [<Xn|SP>{, #<simm>}]!LDRAA <Xt>, [<Xn|SP>{, #<simm>}]! LDRAB <Xt>, [<Xn|SP>{, #<simm>}]!LDRAB <Xt>, [<Xn|SP>{, #<simm>}]!swpabSwap byte in memorySWPAB <Ws>, <Wt>, [<Xn|SP>]SWPALB <Ws>, <Wt>, [<Xn|SP>]SWPB <Ws>, <Wt>, [<Xn|SP>]SWPLB <Ws>, <Wt>, [<Xn|SP>]uqsubr�2Subtract active unsigned elements of the first source vector from corresponding unsigned elements of the second source vector and destructively place the results in the corresponding elements of the first source vector. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2-UQSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>autia-Authenticate instruction address, using key AAUTIA <Xd>, <Xn|SP>AUTIZA <Xd>
AUTIA1716 AUTIASP AUTIAZ fvdotb��The instruction computes the fused sum-of-products of each vertical group of two 8-bit floating-point values held in the corresponding elements of the two first source vectors with the lower-numbered horizontal group of two 8-bit floating-point values in the indexed 32-bit group of the corresponding 128-bit segment of the second source vector. The single-precision sum-of-products are scaled by 2GFVDOTB  ZA.S[ <Wv>, <offs>, VGx4], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]ldff1w�YGather load with first-faulting behavior of unsigned words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
-LDFF1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]-LDFF1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]4LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]6LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]6LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]3LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]4LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2],LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]movsMOVS (predicated)�Read active elements from the source predicate and place in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the MOVS <Pd>.B, <Pg>/Z, <Pn>.B$ANDS  <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.BMOVS (unpredicated)Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated. Sets the MOVS <Pd>.B, <Pn>.B$ORRS  <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.Bsunpk�Unpack elements from one or two source vectors and then sign-extend them to place in elements of twice their size within the two or four destination vectors.(SUNPK { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<Tb>8SUNPK { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<Tb>-<Zn2>.<Tb> }umlal/Unsigned multiply-add long (vector, by element)
3UMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*UMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=UMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4UMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VUMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }blBranch with link
BL <label>rcwsswpp1Read check write software swap quadword in memory RCWSSWPP <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSWPPA <Xt1>, <Xt2>, [<Xn|SP>]"RCWSSWPPAL <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSWPPL <Xt1>, <Xt2>, [<Xn|SP>]ldumaxah-Atomic unsigned maximum on halfword in memoryLDUMAXAH <Ws>, <Wt>, [<Xn|SP>]LDUMAXALH <Ws>, <Wt>, [<Xn|SP>]LDUMAXH <Ws>, <Wt>, [<Xn|SP>]LDUMAXLH <Ws>, <Wt>, [<Xn|SP>]uqshrn2Unsigned saturating shift right narrow (immediate)!UQSHRN <Vb><d>, <Va><n>, #<shift>*UQSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>stnt1b�Contiguous store non-temporal of bytes from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]2STNT1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]2STNT1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>]?STNT1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]3STNT1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]ESTNT1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>])STNT1B { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}])STNT1B { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}](STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>, <Xm>]pmov�Copy a packed bitmap, where bit value 0b1 represents TRUE and bit value 0b0 represents FALSE, from a portion of the source vector register to elements of the destination SVE predicate register.PMOV <Pd>.B, <Zn>PMOV <Pd>.D, <Zn>{[<imm>]}PMOV <Pd>.H, <Zn>{[<imm>]}PMOV <Pd>.S, <Zn>{[<imm>]}PMOV <Zd>, <Pn>.BPMOV <Zd>{[<imm>]}, <Pn>.DPMOV <Zd>{[<imm>]}, <Pn>.HPMOV <Zd>{[<imm>]}, <Pn>.Sshrnb�cShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.#SHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>ldsmin5Atomic signed minimum on word or doubleword in memoryLDSMIN <Ws>, <Wt>, [<Xn|SP>]LDSMINA <Ws>, <Wt>, [<Xn|SP>]LDSMINAL <Ws>, <Wt>, [<Xn|SP>]LDSMINL <Ws>, <Wt>, [<Xn|SP>]LDSMIN <Xs>, <Xt>, [<Xn|SP>]LDSMINA <Xs>, <Xt>, [<Xn|SP>]LDSMINAL <Xs>, <Xt>, [<Xn|SP>]LDSMINL <Xs>, <Xt>, [<Xn|SP>]2STSMIN <Ws>, [<Xn|SP>]LDSMIN  <Ws>, WZR, [<Xn|SP>]4STSMINL <Ws>, [<Xn|SP>]LDSMINL  <Ws>, WZR, [<Xn|SP>]2STSMIN <Xs>, [<Xn|SP>]LDSMIN  <Xs>, XZR, [<Xn|SP>]4STSMINL <Xs>, [<Xn|SP>]LDSMINL  <Xs>, XZR, [<Xn|SP>]dsbData synchronization barrierDSB  ( <option>|#<imm>)DSB <option>nXSsqdmlal9Signed saturating doubling multiply-add long (by element),SQDMLAL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]5SQDMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]!SQDMLAL <Va><d>, <Vb><n>, <Vb><m>,SQDMLAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>mulMultiply (vector, by element)*MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>] MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*MUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> MUL <Zdn>.<T>, <Zdn>.<T>, #<imm> MUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>!MUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]!MUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]!MUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]
MUL -- A64MultiplyMUL <Wd>, <Wn>, <Wm>MADD   <Wd>, <Wn>, <Wm>, WZRMUL <Xd>, <Xn>, <Xm>MADD   <Xd>, <Xn>, <Xm>, XZRorns�+Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #ORNS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Baddspl�Add the Streaming SVE predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDSPL <Xd|SP>, <Xn|SP>, #<imm>ldarhLoad-acquire register halfwordLDARH <Wt>, [<Xn|SP>{, #0}]ldaddah Atomic add on halfword in memoryLDADDAH <Ws>, <Wt>, [<Xn|SP>]LDADDALH <Ws>, <Wt>, [<Xn|SP>]LDADDH <Ws>, <Wt>, [<Xn|SP>]LDADDLH <Ws>, <Wt>, [<Xn|SP>]rcwsswp3Read check write software swap doubleword in memoryRCWSSWP <Xs>, <Xt>, [<Xn|SP>]RCWSSWPA <Xs>, <Xt>, [<Xn|SP>]RCWSSWPAL <Xs>, <Xt>, [<Xn|SP>]RCWSSWPL <Xs>, <Xt>, [<Xn|SP>]rprfmRange prefetch memory+RPRFM  ( <rprfop>|#<imm6>), <Xm>, [<Xn|SP>]ldtrsh,Load register signed halfword (unprivileged)!LDTRSH <Wt>, [<Xn|SP>{, #<simm>}]!LDTRSH <Xt>, [<Xn|SP>{, #<simm>}]sumops>The 8-bit integer variant works with a 32-bit element ZA tile./SUMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/SUMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.HbfmopsoThe BFloat16 floating-point sum of outer products and subtract instruction works with a 32-bit element ZA tile./BFMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H/BFMOPS <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Huabdlt�+Compute the absolute difference between the odd-numbered unsigned integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UABDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>umlslb�Multiply the corresponding even-numbered unsigned elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLSLB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLSLB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]usmmlaEUnsigned and signed 8-bit integer matrix multiply-accumulate (vector)"USMMLA <Vd>.4S, <Vn>.16B, <Vm>.16BUSMMLA <Zda>.S, <Zn>.B, <Zm>.BaesmcAES mix columnsAESMC <Vd>.16B, <Vn>.16BAESMC <Zdn>.B, <Zdn>.BadrForm PC-relative addressADR <Xd>, <label>4ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>{, <mod> <amount>}]-ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW{ <amount>}]-ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW{ <amount>}]st2d�4Contiguous store two-doubleword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]umulhUnsigned multiply highUMULH <Xd>, <Xn>, <Xm>,UMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"UMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>brkn�If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise leaves the destination and second source predicate unchanged. Does not set the condition flags.%BRKN <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.Bstxrh!Store exclusive register halfword!STXRH <Ws>, <Wt>, [<Xn|SP>{, #0}]eorBitwise exclusive-OR (vector)	 EOR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>EOR <Wd|WSP>, <Wn>, #<imm>EOR <Xd|SP>, <Xn>, #<imm>)EOR <Wd>, <Wn>, <Wm>{, <shift> #<amount>})EOR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*EOR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>"EOR <Zdn>.<T>, <Zdn>.<T>, #<const>EOR <Zd>.D, <Zn>.D, <Zm>.DldgLoad Allocation TagLDG <Xt>, [<Xn|SP>{, #<simm>}]smlslb�Multiply the corresponding even-numbered signed elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SMLSLB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%SMLSLB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%SMLSLB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]uaddlUnsigned add long (vector)*UADDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>fscale(Floating-point adjust exponent by vector#FSCALE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FSCALE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>EFSCALE { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>EFSCALE { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>TFSCALE { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TFSCALE { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }-FSCALE <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>bic%Bitwise bit clear (vector, immediate)&BIC <Vd>.<T>, #<imm8>{, LSL #<amount>}&BIC <Vd>.<T>, #<imm8>{, LSL #<amount>} BIC <Vd>.<T>, <Vn>.<T>, <Vm>.<T>)BIC <Wd>, <Wn>, <Wm>{, <shift> #<amount>})BIC <Xd>, <Xn>, <Xm>{, <shift> #<amount>}"BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B*BIC <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>BIC <Zd>.D, <Zn>.D, <Zm>.DBIC (immediate)�CBitwise clear bits using immediate with each 64-bit element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated."BIC <Zdn>.<T>, <Zdn>.<T>, #<const>*AND  <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)ldff1sh�ZGather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
.LDFF1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}].LDFF1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]5LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]7LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]7LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]4LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]4LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]5LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]-LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]movaThe instruction operates on two consecutive horizontal or vertical slices within a named ZA tile of the specified element size.:MOVA { <Zd1>.B-<Zd2>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs2>]<MOVA { <Zd1>.H-<Zd2>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs2>]<MOVA { <Zd1>.S-<Zd2>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs2>]<MOVA { <Zd1>.D-<Zd2>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs2>]:MOVA { <Zd1>.B-<Zd4>.B }, ZA0<HV>.B[<Ws>, <offs1>:<offs4>]<MOVA { <Zd1>.H-<Zd4>.H }, <ZAn><HV>.H[<Ws>, <offs1>:<offs4>]<MOVA { <Zd1>.S-<Zd4>.S }, <ZAn><HV>.S[<Ws>, <offs1>:<offs4>]<MOVA { <Zd1>.D-<Zd4>.D }, <ZAn><HV>.D[<Ws>, <offs1>:<offs4>]4MOVA { <Zd1>.D-<Zd2>.D }, ZA.D[<Wv>, <offs>{, VGx2}]4MOVA { <Zd1>.D-<Zd4>.D }, ZA.D[<Wv>, <offs>{, VGx4}],MOVA <Zd>.B, <Pg>/M, ZA0<HV>.B[<Ws>, <offs>].MOVA <Zd>.H, <Pg>/M, <ZAn><HV>.H[<Ws>, <offs>].MOVA <Zd>.S, <Pg>/M, <ZAn><HV>.S[<Ws>, <offs>].MOVA <Zd>.D, <Pg>/M, <ZAn><HV>.D[<Ws>, <offs>].MOVA <Zd>.Q, <Pg>/M, <ZAn><HV>.Q[<Ws>, <offs>]>MOVA    ZA0 <HV>.B[<Ws>, <offs1>:<offs2>], { <Zn1>.B-<Zn2>.B }<MOVA <ZAd><HV>.H[<Ws>, <offs1>:<offs2>], { <Zn1>.H-<Zn2>.H }<MOVA <ZAd><HV>.S[<Ws>, <offs1>:<offs2>], { <Zn1>.S-<Zn2>.S }<MOVA <ZAd><HV>.D[<Ws>, <offs1>:<offs2>], { <Zn1>.D-<Zn2>.D }>MOVA    ZA0 <HV>.B[<Ws>, <offs1>:<offs4>], { <Zn1>.B-<Zn4>.B }<MOVA <ZAd><HV>.H[<Ws>, <offs1>:<offs4>], { <Zn1>.H-<Zn4>.H }<MOVA <ZAd><HV>.S[<Ws>, <offs1>:<offs4>], { <Zn1>.S-<Zn4>.S }<MOVA <ZAd><HV>.D[<Ws>, <offs1>:<offs4>], { <Zn1>.D-<Zn4>.D }8MOVA    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }8MOVA    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }0MOVA    ZA0 <HV>.B[<Ws>, <offs>], <Pg>/M, <Zn>.B.MOVA <ZAd><HV>.H[<Ws>, <offs>], <Pg>/M, <Zn>.H.MOVA <ZAd><HV>.S[<Ws>, <offs>], <Pg>/M, <Zn>.S.MOVA <ZAd><HV>.D[<Ws>, <offs>], <Pg>/M, <Zn>.D.MOVA <ZAd><HV>.Q[<Ws>, <offs>], <Pg>/M, <Zn>.Q	pacibsppc;Pointer Authentication Code for return address, using key B
PACIBSPPC swp!Swap word or doubleword in memorySWP <Ws>, <Wt>, [<Xn|SP>]SWPA <Ws>, <Wt>, [<Xn|SP>]SWPAL <Ws>, <Wt>, [<Xn|SP>]SWPL <Ws>, <Wt>, [<Xn|SP>]SWP <Xs>, <Xt>, [<Xn|SP>]SWPA <Xs>, <Xt>, [<Xn|SP>]SWPAL <Xs>, <Xt>, [<Xn|SP>]SWPL <Xs>, <Xt>, [<Xn|SP>]ldaddabAtomic add on byte in memoryLDADDAB <Ws>, <Wt>, [<Xn|SP>]LDADDALB <Ws>, <Wt>, [<Xn|SP>]LDADDB <Ws>, <Wt>, [<Xn|SP>]LDADDLB <Ws>, <Wt>, [<Xn|SP>]dvp
DVP -- A64,Data value prediction restriction by contextDVP  RCTX, <Xt>SYS   #3, C7, C3, #5, <Xt>fdivr�5Reversed divide active floating-point elements of the second source vector by corresponding floating-point elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.,FDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>zip2�Interleave alternating elements from the lowest or highest halves of the first and second source predicates and place in elements of the destination predicate. This instruction is unpredicated.!ZIP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!ZIP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ZIP2 <Zd>.Q, <Zn>.Q, <Zm>.Q!ZIP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ZIP1 <Zd>.Q, <Zn>.Q, <Zm>.Q!ZIP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>ldeorb5Atomic exclusive-OR on byte in memory, without return2STEORB <Ws>, [<Xn|SP>]LDEORB  <Ws>, WZR, [<Xn|SP>]4STEORLB <Ws>, [<Xn|SP>]LDEORLB  <Ws>, WZR, [<Xn|SP>]incpxCounts the number of true elements in the source predicate and then uses the result to increment the scalar destination.INCP <Xdn>, <Pm>.<T>INCP <Zdn>.<T>, <Pm>.<T>str-Store SIMD&amp;FP register (immediate offset)!STR <Bt>, [<Xn|SP>], #<simm>STR <Ht>, [<Xn|SP>], #<simm>STR <St>, [<Xn|SP>], #<simm>STR <Dt>, [<Xn|SP>], #<simm>STR <Qt>, [<Xn|SP>], #<simm>STR <Bt>, [<Xn|SP>, #<simm>]!STR <Ht>, [<Xn|SP>, #<simm>]!STR <St>, [<Xn|SP>, #<simm>]!STR <Dt>, [<Xn|SP>, #<simm>]!STR <Qt>, [<Xn|SP>, #<simm>]!STR <Bt>, [<Xn|SP>{, #<pimm>}]STR <Ht>, [<Xn|SP>{, #<pimm>}]STR <St>, [<Xn|SP>{, #<pimm>}]STR <Dt>, [<Xn|SP>{, #<pimm>}]STR <Qt>, [<Xn|SP>{, #<pimm>}]STR <Wt>, [<Xn|SP>], #<simm>STR <Xt>, [<Xn|SP>], #<simm>STR <Wt>, [<Xn|SP>, #<simm>]!STR <Xt>, [<Xn|SP>, #<simm>]!STR <Wt>, [<Xn|SP>{, #<pimm>}]STR <Xt>, [<Xn|SP>{, #<pimm>}]%STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]5STR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}])STR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]7STR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7STR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]%STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]7STR     ZA[ <Wv>, <offs>], [<Xn|SP>{, #<offs>, MUL VL}]STR     ZT0, [ <Xn|SP>]	sqrshrunt�=Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2'SQRSHRUNT <Zd>.<T>, <Zn>.<Tb>, #<const>sqincb�jDetermines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCB <Xdn>{, <pattern>{, MUL #<imm>}}chkfeatCheck feature status
CHKFEAT  X16 cmgt$Compare signed greater than (vector)CMGT  D <d>, D<n>, D<m>!CMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>CMGT  D <d>, D<n>, #0CMGT <Vd>.<T>, <Vn>.<T>, #0ldursh(Load register signed halfword (unscaled)!LDURSH <Wt>, [<Xn|SP>{, #<simm>}]!LDURSH <Xt>, [<Xn|SP>{, #<simm>}]fnmul'Floating-point multiply-negate (scalar)FNMUL <Hd>, <Hn>, <Hm>FNMUL <Sd>, <Sn>, <Sm>FNMUL <Dd>, <Dn>, <Dm>ngc
NGC -- A64Negate with carryNGC <Wd>, <Wm>SBC   <Wd>, WZR, <Wm>NGC <Xd>, <Xm>SBC   <Xd>, XZR, <Xm>sxtwSXTW -- A64Sign extend wordSXTW <Xd>, <Wn>SBFM   <Xd>, <Xn>, #0, #31bfcvtnt�0Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the results in the odd-numbered 16-bit elements of the destination vector, leaving the even-numbered elements unchanged. Inactive elements in the destination vector register remain unmodified.BFCVTNT <Zd>.H, <Pg>/M, <Zn>.SstlrbStore-release register byteSTLRB <Wt>, [<Xn|SP>{, #0}]ssublb�Subtract the even-numbered signed elements of the second source vector from the corresponding signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SSUBLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>uminUnsigned minimum (vector)!UMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>UMIN <Wd>, <Wn>, #<uimm>UMIN <Xd>, <Xn>, #<uimm>CUMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CUMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RUMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RUMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }UMIN <Wd>, <Wn>, <Wm>UMIN <Xd>, <Xn>, <Xm>+UMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!UMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>usubwb�Subtract the even-numbered unsigned elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$USUBWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>ctermeq�Detect termination conditions in serialized vector loops. Tests whether the comparison between the scalar source operands holds true and if not tests the state of the CTERMEQ <R><n>, <R><m>CTERMNE <R><n>, <R><m>fmlallbt�This 8-bit floating-point multiply-add long-long instruction widens the second 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2 FMLALLBT <Zda>.S, <Zn>.B, <Zm>.B'FMLALLBT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]umull+Unsigned multiply long (vector, by element)3UMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]*UMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>UMULL -- A64Unsigned multiply longUMULL <Xd>, <Wn>, <Wm>UMADDL   <Xd>, <Wn>, <Wm>, XZRsaddlSigned add long (vector)*SADDL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>sm4e
SM4 encodeSM4E <Vd>.4S, <Vn>.4SSM4E <Zdn>.S, <Zdn>.S, <Zm>.Ssmlall�xThis signed integer multiply-add long-long instruction multiplies each signed 8-bit or 16-bit element in the one, two, or four first source vectors with each signed 8-bit or 16-bit indexed element of second source vector, widens each product to 32-bits or 64-bits and destructively adds these values to the corresponding 32-bit or 64-bit elements of the ZA quad-vector groups.=SMLALL  ZA.S[ <Wv>, <offs1>:<offs4>], <Zn>.B, <Zm>.B[<index>]=SMLALL  ZA.D[ <Wv>, <offs1>:<offs4>], <Zn>.H, <Zm>.H[<index>]RSMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RSMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RSMLALL  ZA.S[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]RSMLALL  ZA.D[ <Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]<SMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>], <Zn>.<Tb>, <Zm>.<Tb>TSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>TSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>dSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }dSMLALL  ZA. <T>[<Wv>, <offs1>:<offs4>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }gcspopcxGCSPOPCX -- A64=Guarded Control Stack pop and compare exception return record	GCSPOPCX SYS   #0, C7, C7, #5{, <Xt>}clrexClear exclusiveCLREX  {# <imm>}ldrh"Load register halfword (immediate)LDRH <Wt>, [<Xn|SP>], #<simm>LDRH <Wt>, [<Xn|SP>, #<simm>]!LDRH <Wt>, [<Xn|SP>{, #<pimm>}]8LDRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]sabdlt�!Compute the absolute difference between odd-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in overlapping double-width elements of the destination vector. This instruction is unpredicated.%SABDLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>shlShift left (immediate)SHL  D <d>, D<n>, #<shift> SHL <Vd>.<T>, <Vn>.<T>, #<shift>ldlarbLoad LOAcquire register byteLDLARB <Wt>, [<Xn|SP>{, #0}]stnp;Store pair of SIMD&amp;FP registers, with non-temporal hint&STNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]&STNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]&STNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]&STNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]&STNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]cpyprtMemory copy, reads unprivileged CPYPRT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMRT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYERT  [ <Xd>]!, [<Xs>]!, <Xn>!pacia171615@Pointer Authentication Code for instruction address, using key APACIA171615 csetmCSETM -- A64Conditional set maskCSETM <Wd>, <invcond>CSINV   <Wd>, WZR, WZR, <cond>CSETM <Xd>, <invcond>CSINV   <Xd>, XZR, XZR, <cond>ldaxrh(Load-acquire exclusive register halfwordLDAXRH <Wt>, [<Xn|SP>{, #0}]sqdecp�Counts the number of true elements in the source predicate and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits.SQDECP <Xdn>, <Pm>.<T>, <Wdn>SQDECP <Xdn>, <Pm>.<T>SQDECP <Zdn>.<T>, <Pm>.<T>dcps3Debug change PE state to EL3DCPS3  {# <imm>}sturbStore register byte (unscaled) STURB <Wt>, [<Xn|SP>{, #<simm>}]umlslt�Multiply the corresponding odd-numbered unsigned elements of the first and second source vectors and destructively subtract from the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLSLT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLSLT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLSLT <Zda>.D, <Zn>.S, <Zm>.S[<imm>]f1cvtlt�Convert each odd-numbered 8-bit floating-point element of the source vector to half-precision while downscaling the value, and place the results in the overlapping 16-bit elements of the destination vector. F1CVTLT scales the values by 2F1CVTLT <Zd>.H, <Zn>.BF2CVTLT <Zd>.H, <Zn>.Bsetgpt)Memory set with tag setting, unprivilegedSETGPT  [ <Xd>]!, <Xn>!, <Xs>SETGMT  [ <Xd>]!, <Xn>!, <Xs>SETGET  [ <Xd>]!, <Xn>!, <Xs>cmplsCMPLS (vectors)�[Compare active unsigned integer elements in the first source vector being lower than or same as corresponding unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the *CMPLS <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-CMPHS    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>bfadd�Add active BFloat16 elements of the second source vector to corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.&BFADD <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.HBFADD <Zd>.H, <Zn>.H, <Zm>.H8BFADD   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }8BFADD   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }fabs&Floating-point absolute value (vector)FABS <Vd>.<T>, <Vn>.<T>FABS <Vd>.<T>, <Vn>.<T>FABS <Hd>, <Hn>FABS <Sd>, <Sn>FABS <Dd>, <Dn>FABS <Zd>.<T>, <Pg>/M, <Zn>.<T>fmaxqv�,Floating-point maximum of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as -Infinity. FMAXQV <Vd>.<T>, <Pg>, <Zn>.<Tb>fminv$Floating-point minimum across vectorFMINV <V><d>, <Vn>.<T>FMINV  S <d>, <Vn>.4SFMINV <V><d>, <Pg>, <Zn>.<T>pmullt�Polynomial multiply over [0, 1] the corresponding odd-numbered elements of the first and second source vectors, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%PMULLT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>PMULLT <Zd>.Q, <Zn>.D, <Zm>.DurhaddUnsigned rounding halving add#URHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>-URHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldtrh%Load register halfword (unprivileged) LDTRH <Wt>, [<Xn|SP>{, #<simm>}]ushllt�2Shift left by immediate each odd-numbered unsigned element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$USHLLT <Zd>.<T>, <Zn>.<Tb>, #<const>lslvLogical shift left variableLSLV <Wd>, <Wn>, <Wm>LSLV <Xd>, <Xn>, <Xm>facleFACLE��Compare active absolute values of floating-point elements in the first source vector being less than or equal to corresponding absolute values of elements in the second source vector, and place the boolean results of the comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.*FACLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>-FACGE    <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>bfxilBFXIL -- A64&Bitfield extract and insert at low end"BFXIL <Wd>, <Wn>, #<lsb>, #<width>,BFM   <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)"BFXIL <Xd>, <Xn>, #<lsb>, #<width>,BFM   <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)subptSubtract checked pointer-SUBPT <Xd|SP>, <Xn|SP>, <Xm>{, LSL #<amount>}&SUBPT <Zdn>.D, <Pg>/M, <Zdn>.D, <Zm>.DSUBPT <Zd>.D, <Zn>.D, <Zm>.Dsmnegl
SMNEGL -- A64Signed multiply-negate longSMNEGL <Xd>, <Wn>, <Wm>SMSUBL   <Xd>, <Wn>, <Wm>, XZRsqrshrnt�6Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's signed integer range -2&SQRSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>cpyfptnHMemory copy forward-only, reads and writes unprivileged and non-temporal!CPYFPTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFETN  [ <Xd>]!, [<Xs>]!, <Xn>!fexpaThe FEXPA <Zd>.<T>, <Zn>.<T>udot4Dot product unsigned arithmetic (vector, by element)+UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]$UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>UDOT <Zda>.S, <Zn>.H, <Zm>.H#UDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]$UDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>#UDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]#UDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]IUDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IUDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@UDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@UDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMUDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MUDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }IUDOT    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]IUDOT    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IUDOT    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]IUDOT    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]KUDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, <Zm>.<Tb>KUDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, <Zm>.<Tb>[UDOT    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<Tb>-<Zn2>.<Tb> }, { <Zm1>.<Tb>-<Zm2>.<Tb> }[UDOT    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<Tb>-<Zn4>.<Tb> }, { <Zm1>.<Tb>-<Zm4>.<Tb> }rcwcasp4Read check write compare and swap quadword in memory1RCWCASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]2RCWCASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]3RCWCASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]2RCWCASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]nbsl�CSelects bits from the first source vector where the corresponding bit in the third source vector is '1', and from the second source vector where the corresponding bit in the third source vector is '0'. The inverted result is placed destructively in the destination and first source vector. This instruction is unpredicated.%NBSL <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dbgrp��This instruction separates bits in each element of the first source vector by gathering from the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector to the lowest-numbered contiguous bits of the corresponding destination element, and from positions indicated by zero bits to the highest-numbered bits of the destination element, preserving the bit order within each group. This instruction is unpredicated.!BGRP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ldarbLoad-acquire register byteLDARB <Wt>, [<Xn|SP>{, #0}]st2w�.Contiguous store two-word structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]stlxrb%Store-release exclusive register byte"STLXRB <Ws>, <Wt>, [<Xn|SP>{, #0}]stnt1h� Contiguous store non-temporal of halfwords from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>STNT1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]>STNT1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]:STNT1H { <Zt1>.H-<Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]:STNT1H { <Zt1>.H-<Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]?STNT1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]QSTNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}];STNT1H { <Zt1>.H, <Zt2>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1]MSTNT1H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <PNg>, [<Xn|SP>, <Xm>, LSL #1])STNT1H { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}])STNT1H { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]4STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]	paciasppc;Pointer Authentication Code for return address, using key A
PACIASPPC ccmpConditional compare (immediate)"CCMP <Wn>, #<imm>, #<nzcv>, <cond>"CCMP <Xn>, #<imm>, #<nzcv>, <cond> CCMP <Wn>, <Wm>, #<nzcv>, <cond> CCMP <Xn>, <Xm>, #<nzcv>, <cond>sqshrunt�?Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2&SQSHRUNT <Zd>.<T>, <Zn>.<Tb>, #<const>ld1roh�Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]subhnSubtract returning high narrow*SUBHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>stlurh*Store-release register halfword (unscaled)!STLURH <Wt>, [<Xn|SP>{, #<simm>}]uqshrnb�EShift each unsigned integer value in the source vector elements right by an immediate value, and place the truncated results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's unsigned integer range 0 to (2%UQSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>cmnCMN (extended register) -- A64$Compare negative (extended register)*CMN <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2ADDS   WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}+CMN <Xn|SP>, <R><m>{, <extend> {#<amount>}}3ADDS   XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}CMN (immediate) -- A64Compare negative (immediate)CMN <Wn|WSP>, #<imm>{, <shift>}'ADDS   WZR, <Wn|WSP>, #<imm>{, <shift>}CMN <Xn|SP>, #<imm>{, <shift>}&ADDS   XZR, <Xn|SP>, #<imm>{, <shift>}CMN (shifted register) -- A64#Compare negative (shifted register)#CMN <Wn>, <Wm>{, <shift> #<amount>}+ADDS   WZR, <Wn>, <Wm>{, <shift> #<amount>}#CMN <Xn>, <Xm>{, <shift> #<amount>}+ADDS   XZR, <Xn>, <Xm>{, <shift> #<amount>}histcnt�_This instruction compares each active 32 or 64-bit element of the first source vector with all active elements with an element number less than or equal to its own in the second source vector, and places the count of matching elements in the corresponding element of the destination vector. Inactive elements in the destination vector are set to zero.,HISTCNT <Zd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>ldumax7Atomic unsigned maximum on word or doubleword in memoryLDUMAX <Ws>, <Wt>, [<Xn|SP>]LDUMAXA <Ws>, <Wt>, [<Xn|SP>]LDUMAXAL <Ws>, <Wt>, [<Xn|SP>]LDUMAXL <Ws>, <Wt>, [<Xn|SP>]LDUMAX <Xs>, <Xt>, [<Xn|SP>]LDUMAXA <Xs>, <Xt>, [<Xn|SP>]LDUMAXAL <Xs>, <Xt>, [<Xn|SP>]LDUMAXL <Xs>, <Xt>, [<Xn|SP>]2STUMAX <Ws>, [<Xn|SP>]LDUMAX  <Ws>, WZR, [<Xn|SP>]4STUMAXL <Ws>, [<Xn|SP>]LDUMAXL  <Ws>, WZR, [<Xn|SP>]2STUMAX <Xs>, [<Xn|SP>]LDUMAX  <Xs>, XZR, [<Xn|SP>]4STUMAXL <Xs>, [<Xn|SP>]LDUMAXL  <Xs>, XZR, [<Xn|SP>]ldapursb1Load-acquire RCpc register signed byte (unscaled)#LDAPURSB <Wt>, [<Xn|SP>{, #<simm>}]#LDAPURSB <Xt>, [<Xn|SP>{, #<simm>}]ldnf1h��Contiguous load with non-faulting behavior of unsigned halfwords to elements of a vector register from the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.6LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]6LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]uabal0Unsigned absolute difference and accumulate long*UABAL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>frsqrts*Floating-point reciprocal square root stepFRSQRTS <Hd>, <Hn>, <Hm>FRSQRTS <V><d>, <V><n>, <V><m>$FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FRSQRTS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ld1rsh�Load a single signed halfword from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 2 in the range 0 to 126..LD1RSH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}].LD1RSH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]ldlarh Load LOAcquire register halfwordLDLARH <Wt>, [<Xn|SP>{, #0}]tbnzTest bit and branch if nonzeroTBNZ <R><t>, #<imm>, <label>cpypwtn?Memory copy, writes unprivileged, reads and writes non-temporal!CPYPWTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMWTN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYEWTN  [ <Xd>]!, [<Xs>]!, <Xn>!fcvtzsKFloating-point convert to signed fixed-point, rounding toward zero (vector)FCVTZS <V><d>, <V><n>, #<fbits>#FCVTZS <Vd>.<T>, <Vn>.<T>, #<fbits>FCVTZS <Hd>, <Hn>FCVTZS <V><d>, <V><n>FCVTZS <Vd>.<T>, <Vn>.<T>FCVTZS <Vd>.<T>, <Vn>.<T>FCVTZS <Wd>, <Hn>, #<fbits>FCVTZS <Xd>, <Hn>, #<fbits>FCVTZS <Wd>, <Sn>, #<fbits>FCVTZS <Xd>, <Sn>, #<fbits>FCVTZS <Wd>, <Dn>, #<fbits>FCVTZS <Xd>, <Dn>, #<fbits>FCVTZS <Wd>, <Hn>FCVTZS <Xd>, <Hn>FCVTZS <Wd>, <Sn>FCVTZS <Xd>, <Sn>FCVTZS <Wd>, <Dn>FCVTZS <Xd>, <Dn>/FCVTZS { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FCVTZS { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }FCVTZS <Zd>.H, <Pg>/M, <Zn>.HFCVTZS <Zd>.S, <Pg>/M, <Zn>.HFCVTZS <Zd>.D, <Pg>/M, <Zn>.HFCVTZS <Zd>.S, <Pg>/M, <Zn>.SFCVTZS <Zd>.D, <Pg>/M, <Zn>.SFCVTZS <Zd>.S, <Pg>/M, <Zn>.DFCVTZS <Zd>.D, <Pg>/M, <Zn>.Dldaddb,Atomic add on byte in memory, without return2STADDB <Ws>, [<Xn|SP>]LDADDB  <Ws>, WZR, [<Xn|SP>]4STADDLB <Ws>, [<Xn|SP>]LDADDLB  <Ws>, WZR, [<Xn|SP>]fminnm&Floating-point minimum number (vector)#FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>#FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FMINNM <Hd>, <Hn>, <Hm>FMINNM <Sd>, <Sn>, <Sm>FMINNM <Dd>, <Dn>, <Dm>EFMINNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>EFMINNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>TFMINNM { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }TFMINNM { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>-FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>mrs0Move System register to general-purpose register4MRS <Xt>, (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>)mnegMNEG -- A64Multiply-negateMNEG <Wd>, <Wn>, <Wm>MSUB   <Wd>, <Wn>, <Wm>, WZRMNEG <Xd>, <Xn>, <Xm>MSUB   <Xd>, <Xn>, <Xm>, XZRuqshl*Unsigned saturating shift left (immediate)UQSHL <V><d>, <V><n>, #<shift>"UQSHL <Vd>.<T>, <Vn>.<T>, #<shift>UQSHL <V><d>, <V><n>, <V><m>"UQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>,UQSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>usublb�	Subtract the even-numbered unsigned elements of the second source vector from the corresponding unsigned elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%USUBLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>setffr%Initialise the first-fault register (SETFFR sqdmulhDSigned saturating doubling multiply returning high half (by element)*SQDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>].SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQDMULH <V><d>, <V><n>, <V><m>$SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FSQDMULH { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>FSQDMULH { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>USQDMULH { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }USQDMULH { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }$SQDMULH <Zd>.<T>, <Zn>.<T>, <Zm>.<T>%SQDMULH <Zd>.H, <Zn>.H, <Zm>.H[<imm>]%SQDMULH <Zd>.S, <Zn>.S, <Zm>.S[<imm>]%SQDMULH <Zd>.D, <Zn>.D, <Zm>.D[<imm>]stgmStore Allocation Tag multipleSTGM <Xt>, [<Xn|SP>]bmopawThis instruction works with 32-bit element ZA tile. This instruction generates an outer product of the first source SVL.BMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.SaddAdd (extended register)4ADD <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}4ADD <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}})ADD <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}'ADD <Xd|SP>, <Xn|SP>, #<imm>{, <shift>})ADD <Wd>, <Wn>, <Wm>{, <shift> #<amount>})ADD <Xd>, <Xn>, <Xm>{, <shift> #<amount>}ADD  D <d>, D<n>, D<m> ADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>BADD { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>BADD { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>*ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>+ADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>} ADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>ADD     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }>ADD     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }HADD     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>HADD     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>WADD     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }WADD     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }sshlSigned shift left (register)SSHL  D <d>, D<n>, D<m>!SSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>st4d�6Contiguous store four-doubleword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]bfminnm�Determine the minimum number value of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.<BFMINNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H<BFMINNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HIBFMINNM { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }IBFMINNM { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }(BFMINNM <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.Hptrue�#Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to false otherwise. If the constraint specifies more elements than are available at the current vector length then all elements of the destination predicate are set to false.PTRUE <Pd>.<T>{, <pattern>}PTRUE <PNd>.<T>usubwUnsigned subtract wide*USUBW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>pext�$Converts the source predicate-as-counter into a four register wide predicate-as-mask, and copies the portion of the mask value selected by the portion index to the destination predicate register. A portion corresponds to a one predicate register fraction of the wider predicate-as-mask value.PEXT <Pd>.<T>, <PNn>[<imm>]+PEXT { <Pd1>.<T>, <Pd2>.<T> }, <PNn>[<imm>]st4q�4Contiguous store four-quadword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q, <Zt4>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #4]sqdmull5Signed saturating doubling multiply long (by element)5SQDMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>],SQDMULL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]!SQDMULL <Va><d>, <Vb><n>, <Vb><m>,SQDMULL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>sqincw�kDetermines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQINCW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQINCW <Xdn>{, <pattern>{, MUL #<imm>}})SQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}sqnegSigned saturating negateSQNEG <V><d>, <V><n>SQNEG <Vd>.<T>, <Vn>.<T> SQNEG <Zd>.<T>, <Pg>/M, <Zn>.<T>madpt�Multiply with overflow check the elements of the first and second source vectors and add with pointer check to elements of the third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector.MADPT <Zdn>.D, <Zm>.D, <Za>.DsmaxSigned maximum (vector)!SMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SMAX <Wd>, <Wn>, #<simm>SMAX <Xd>, <Xn>, #<simm>CSMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>CSMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>RSMAX { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }RSMAX { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }SMAX <Wd>, <Wn>, <Wm>SMAX <Xd>, <Xn>, <Xm>+SMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!SMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>subs+Subtract (extended register), setting flags1SUBS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}2SUBS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}&SUBS <Wd>, <Wn|WSP>, #<imm>{, <shift>}%SUBS <Xd>, <Xn|SP>, #<imm>{, <shift>}*SUBS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*SUBS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}cpyprtrn0Memory copy, reads unprivileged and non-temporal"CPYPRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYMRTRN  [ <Xd>]!, [<Xs>]!, <Xn>!"CPYERTRN  [ <Xd>]!, [<Xs>]!, <Xn>!rorvRotate right variableRORV <Wd>, <Wn>, <Wm>RORV <Xd>, <Xn>, <Xm>uqincw�*Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCW <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCW <Xdn>{, <pattern>{, MUL #<imm>}})UQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}rev16-Reverse elements in 16-bit halfwords (vector)REV16 <Vd>.<T>, <Vn>.<T>REV16 <Wd>, <Wn>REV16 <Xd>, <Xn>fmlalltt�This 8-bit floating-point multiply-add long-long instruction widens the fourth 8-bit element of each 32-bit container in the first and second source vectors to single-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2 FMLALLTT <Zda>.S, <Zn>.B, <Zm>.B'FMLALLTT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]ngcsNGCS -- A64 Negate with carry, setting flagsNGCS <Wd>, <Wm>SBCS   <Wd>, WZR, <Wm>NGCS <Xd>, <Xm>SBCS   <Xd>, XZR, <Xm>ld2d�3Contiguous load two-doubleword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]addhnb�.Add each vector element of the first source vector to the corresponding vector element of the second source vector, and place the most significant half of the result in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. This instruction is unpredicated.%ADDHNB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>sha1mSHA1 hash update (majority)SHA1M <Qd>, <Sn>, <Vm>.4Sfrint64xLFloating-point round to 64-bit integer, using current rounding mode (vector)FRINT64X <Vd>.<T>, <Vn>.<T>FRINT64X <Sd>, <Sn>FRINT64X <Dd>, <Dn>ldnt1sh�0Gather load non-temporal of signed halfwords to active elements of a vector register from memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LDNT1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}],LDNT1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]sqcadd��Add the real and imaginary components of the integral complex numbers from the first source vector to the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the direction from the positive real axis towards the positive imaginary axis, when considered in polar representation, equivalent to multiplying the complex numbers in the second source vector by ±.SQCADD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, <const>eor3Three-way exclusive-OR+EOR3 <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B%EOR3 <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Duzp2Unzip vectors (secondary)!UZP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>bfmlsl��This BFloat16 floating-point multiply-subtract long instruction widens all 16-bit BFloat16 elements in the one, two, or four first source vectors and the indexed element of the second source vector to single-precision format, then multiplies the corresponding elements and destructively subtracts these values without intermediate rounding from the overlapping 32-bit single-precision elements of the ZA double-vector groups.=BFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4BFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VBFMLSL  ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }adclb�TAdd the even-numbered elements of the first source vector and the 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector to the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#ADCLB <Zda>.<T>, <Zn>.<T>, <Zm>.<T>fnmad�[Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.+FNMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>insr�Shift the destination vector left by one element, and then place a copy of the least-significant bits of the general-purpose register in element 0 of the destination vector. This instruction is unpredicated.INSR <Zdn>.<T>, <R><m>INSR <Zdn>.<T>, <V><m>umaxpUnsigned maximum pairwise"UMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UMAXP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>bfmlalt��This BFloat16 floating-point multiply-add long instruction widens the odd-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLALT <Zda>.S, <Zn>.H, <Zm>.H&BFMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]flogbcThis instruction returns the signed integer base 2 logarithm of each floating-point input element | FLOGB <Zd>.<T>, <Pg>/M, <Zn>.<T>clz Count leading zero bits (vector)CLZ <Vd>.<T>, <Vn>.<T>CLZ <Wd>, <Wn>CLZ <Xd>, <Xn>CLZ <Zd>.<T>, <Pg>/M, <Zn>.<T>ftsmulThe #FTSMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>cas-Compare and swap word or doubleword in memoryCAS <Ws>, <Wt>, [<Xn|SP>{, #0}] CASA <Ws>, <Wt>, [<Xn|SP>{, #0}]!CASAL <Ws>, <Wt>, [<Xn|SP>{, #0}] CASL <Ws>, <Wt>, [<Xn|SP>{, #0}]CAS <Xs>, <Xt>, [<Xn|SP>{, #0}] CASA <Xs>, <Xt>, [<Xn|SP>{, #0}]!CASAL <Xs>, <Xt>, [<Xn|SP>{, #0}] CASL <Xs>, <Xt>, [<Xn|SP>{, #0}]ldumaxab)Atomic unsigned maximum on byte in memoryLDUMAXAB <Ws>, <Wt>, [<Xn|SP>]LDUMAXALB <Ws>, <Wt>, [<Xn|SP>]LDUMAXB <Ws>, <Wt>, [<Xn|SP>]LDUMAXLB <Ws>, <Wt>, [<Xn|SP>]ld2w�-Contiguous load two-word structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]sshll"Signed shift left long (immediate))SSHLL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>, #<shift>axflagBConvert floating-point condition flags from Arm to external formatAXFLAG fsqrt#Floating-point square root (vector)FSQRT <Vd>.<T>, <Vn>.<T>FSQRT <Vd>.<T>, <Vn>.<T>FSQRT <Hd>, <Hn>FSQRT <Sd>, <Sn>FSQRT <Dd>, <Dn> FSQRT <Zd>.<T>, <Pg>/M, <Zn>.<T>smstop
SMSTOP -- A64ADisables access to Streaming SVE mode and SME architectural stateSMSTOP  { <option>}MSR   <pstatefield>,   #0gcsss2
GCSSS2 -- A64$Guarded Control Stack switch stack 2GCSSS2 <Xt>SYSL   <Xt>, #3, C7, C7, #3sysp128-bit system instruction1SYSP  # <op1>, <Cn>, <Cm>, #<op2>{, <Xt1>, <Xt2>}cpypwt Memory copy, writes unprivileged CPYPWT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMWT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYEWT  [ <Xd>]!, [<Xs>]!, <Xn>!eretException returnERET rcwswp*Read check write swap doubleword in memoryRCWSWP <Xs>, <Xt>, [<Xn|SP>]RCWSWPA <Xs>, <Xt>, [<Xn|SP>]RCWSWPAL <Xs>, <Xt>, [<Xn|SP>]RCWSWPL <Xs>, <Xt>, [<Xn|SP>]rcwscasp=Read check write software compare and swap quadword in memory2RCWSCASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]3RCWSCASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]4RCWSCASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]3RCWSCASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>]fmlsDFloating-point fused multiply-subtract from accumulator (by element) FMLS <Hd>, <Hn>, <Vm>.H[<index>]'FMLS <V><d>, <V><n>, <Vm>.<Ts>[<index>](FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]+FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]!FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*FMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>#FMLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]#FMLS <Zda>.S, <Zn>.S, <Zm>.S[<imm>]#FMLS <Zda>.D, <Zn>.D, <Zm>.D[<imm>]IFMLS    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IFMLS    ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.S-<Zn2>.S }, <Zm>.S[<index>]IFMLS    ZA.D[ <Wv>, <offs>{, VGx2}], { <Zn1>.D-<Zn2>.D }, <Zm>.D[<index>]IFMLS    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]IFMLS    ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.S-<Zn4>.S }, <Zm>.S[<index>]IFMLS    ZA.D[ <Wv>, <offs>{, VGx4}], { <Zn1>.D-<Zn4>.D }, <Zm>.D[<index>]HFMLS    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>@FMLS    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HHFMLS    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>@FMLS    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HWFMLS    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }MFMLS    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }WFMLS    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }MFMLS    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }ldclrab"Atomic bit clear on byte in memoryLDCLRAB <Ws>, <Wt>, [<Xn|SP>]LDCLRALB <Ws>, <Wt>, [<Xn|SP>]LDCLRB <Ws>, <Wt>, [<Xn|SP>]LDCLRLB <Ws>, <Wt>, [<Xn|SP>]uadalp)Unsigned add and accumulate long pairwiseUADALP <Vd>.<Ta>, <Vn>.<Tb>#UADALP <Zda>.<T>, <Pg>/M, <Zn>.<Tb>pacmPointer authentication modifierPACM sqrshlr��Shift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's signed integer range -2.SQRSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldaddh0Atomic add on halfword in memory, without return2STADDH <Ws>, [<Xn|SP>]LDADDH  <Ws>, WZR, [<Xn|SP>]4STADDLH <Ws>, [<Xn|SP>]LDADDLH  <Ws>, WZR, [<Xn|SP>]uaddwb�Add the even-numbered unsigned elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$UADDWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>cpyfpt7Memory copy forward-only, reads and writes unprivileged CPYFPT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFMT  [ <Xd>]!, [<Xs>]!, <Xn>! CPYFET  [ <Xd>]!, [<Xs>]!, <Xn>!uqcvt�Saturate the unsigned integer value in each element of the two source vectors to half the original source element width, and place the results in the half-width destination elements.!UQCVT <Zd>.H, { <Zn1>.S-<Zn2>.S })UQCVT <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }fcvtl8Floating-point convert to higher precision long (vector)FCVTL{ 2}  <Vd>.<Ta>, <Vn>.<Tb>!FCVTL { <Zd1>.S-<Zd2>.S }, <Zn>.Hwhilege��Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, signed scalar operand is greater than or equal to the second scalar operand and false thereafter down to the lowest numbered element. WHILEGE <Pd>.<T>, <R><n>, <R><m>#WHILEGE <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEGE { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>ld1rw�Load a single unsigned word from a memory address generated by a 64-bit scalar base address plus an immediate offset which is a multiple of 4 in the range 0 to 252.-LD1RW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]-LD1RW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]sev
Send eventSEV zipq2�Interleave alternating elements from high halves of the corresponding 128-bit vector segments of the first and second source vectors and place in elements of the corresponding destination vector segment. This instruction is unpredicated."ZIPQ2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>addha��Add each element of the source vector to the corresponding active element of each horizontal slice of a ZA tile. The tile elements are predicated by a pair of governing predicates. An element of a horizontal slice is considered active if its corresponding element in the second governing predicate is TRUE and the element corresponding to its horizontal slice number in the first governing predicate is TRUE. Inactive elements in the destination tile remain unmodified.&ADDHA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S&ADDHA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.Dldxrh Load exclusive register halfwordLDXRH <Wt>, [<Xn|SP>{, #0}]rcwsetp5Read check write atomic bit set on quadword in memoryRCWSETP <Xt1>, <Xt2>, [<Xn|SP>] RCWSETPA <Xt1>, <Xt2>, [<Xn|SP>]!RCWSETPAL <Xt1>, <Xt2>, [<Xn|SP>] RCWSETPL <Xt1>, <Xt2>, [<Xn|SP>]urshr)Unsigned rounding shift right (immediate)URSHR  D <d>, D<n>, #<shift>"URSHR <Vd>.<T>, <Vn>.<T>, #<shift>,URSHR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>esbError synchronization barrierESB adrp$Form PC-relative address to 4KB pageADRP <Xd>, <label>stz2gStore Allocation Tags, zeroing!STZ2G <Xt|SP>, [<Xn|SP>], #<simm>"STZ2G <Xt|SP>, [<Xn|SP>, #<simm>]!#STZ2G <Xt|SP>, [<Xn|SP>{, #<simm>}]asrvArithmetic shift right variableASRV <Wd>, <Wn>, <Wm>ASRV <Xd>, <Xn>, <Xm>btiBranch target identificationBTI  { <targets>}frint32z;Floating-point round to 32-bit integer toward zero (vector)FRINT32Z <Vd>.<T>, <Vn>.<T>FRINT32Z <Sd>, <Sn>FRINT32Z <Dd>, <Dn>srshl%Signed rounding shift left (register)SRSHL  D <d>, D<n>, D<m>"SRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>DSRSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>DSRSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>SSRSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SSRSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },SRSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uqincb�)Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQINCB <Wdn>{, <pattern>{, MUL #<imm>}}'UQINCB <Xdn>{, <pattern>{, MUL #<imm>}}ld4d�5Contiguous load four-doubleword structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]LLD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]svcSupervisor callSVC  # <imm>st4h�4Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]tbxTable vector lookup extension&TBX <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>2TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>>TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>JTBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta> TBX <Zd>.<T>, <Zn>.<T>, <Zm>.<T>bfmopaqThe BFloat16 floating-point sum of outer products and accumulate instruction works with a 32-bit element ZA tile./BFMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H/BFMOPA <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.HhvcHypervisor callHVC  # <imm>moviMove immediate (vector) MOVI <Vd>.<T>, #<imm8>{, LSL #0}'MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}'MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}%MOVI <Vd>.<T>, #<imm8>, MSL #<amount>MOVI <Dd>, #<imm>MOVI <Vd>.2D, #<imm>hintHint instruction
HINT  # <imm>fcvtmuSFloating-point convert to unsigned integer, rounding toward minus infinity (vector)
FCVTMU <Hd>, <Hn>FCVTMU <V><d>, <V><n>FCVTMU <Vd>.<T>, <Vn>.<T>FCVTMU <Vd>.<T>, <Vn>.<T>FCVTMU <Wd>, <Hn>FCVTMU <Xd>, <Hn>FCVTMU <Wd>, <Sn>FCVTMU <Xd>, <Sn>FCVTMU <Wd>, <Dn>FCVTMU <Xd>, <Dn>frintm@Floating-point round to integral, toward minus infinity (vector)FRINTM <Vd>.<T>, <Vn>.<T>FRINTM <Vd>.<T>, <Vn>.<T>FRINTM <Hd>, <Hn>FRINTM <Sd>, <Sn>FRINTM <Dd>, <Dn>/FRINTM { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTM { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }ldaprLoad-acquire RCpc registerLDAPR <Wt>, [<Xn|SP>], #4LDAPR <Xt>, [<Xn|SP>], #8LDAPR <Wt>, [<Xn|SP> {, #0}]LDAPR <Xt>, [<Xn|SP> {, #0}]sqshrun9Signed saturating shift right unsigned narrow (immediate)"SQSHRUN <Vb><d>, <Va><n>, #<shift>+SQSHRUN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>lduminb9Atomic unsigned minimum on byte in memory, without return4STUMINB <Ws>, [<Xn|SP>]LDUMINB  <Ws>, WZR, [<Xn|SP>]6STUMINLB <Ws>, [<Xn|SP>]LDUMINLB  <Ws>, WZR, [<Xn|SP>]sqrdmlahXSigned saturating rounding doubling multiply accumulate returning high half (by element)+SQRDMLAH <V><d>, <V><n>, <Vm>.<Ts>[<index>]/SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]SQRDMLAH <V><d>, <V><n>, <V><m>%SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>&SQRDMLAH <Zda>.<T>, <Zn>.<T>, <Zm>.<T>'SQRDMLAH <Zda>.H, <Zn>.H, <Zm>.H[<imm>]'SQRDMLAH <Zda>.S, <Zn>.S, <Zm>.S[<imm>]'SQRDMLAH <Zda>.D, <Zn>.D, <Zm>.D[<imm>]setgpn)Memory set with tag setting, non-temporalSETGPN  [ <Xd>]!, <Xn>!, <Xs>SETGMN  [ <Xd>]!, <Xn>!, <Xs>SETGEN  [ <Xd>]!, <Xn>!, <Xs>cfp
CFP -- A64.Control flow prediction restriction by contextCFP  RCTX, <Xt>SYS   #3, C7, C3, #4, <Xt>shsubSigned halving subtract"SHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SHSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>rdffrRead the first-fault register (RDFFR <Pd>.BRDFFR <Pd>.B, <Pg>/Zcpp
CPP -- A640Cache prefetch prediction restriction by contextCPP  RCTX, <Xt>SYS   #3, C7, C3, #7, <Xt>ldsminb7Atomic signed minimum on byte in memory, without return4STSMINB <Ws>, [<Xn|SP>]LDSMINB  <Ws>, WZR, [<Xn|SP>]6STSMINLB <Ws>, [<Xn|SP>]LDSMINLB  <Ws>, WZR, [<Xn|SP>]bics3Bitwise bit clear (shifted register), setting flags*BICS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*BICS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}#BICS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bld1b�Contiguous load of unsigned bytes to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LD1B { <Zt1>.B-<Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]2LD1B { <Zt1>.B-<Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]?LD1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]3LD1B { <Zt1>.B, <Zt2>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]ELD1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>/Z, [<Xn|SP>, <Xm>]+LD1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]+LD1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}](LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>](LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>](LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>](LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]1LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]1LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]*LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D];LD1B { ZA0<HV>.B[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>}]umopa5This instruction works with a 32-bit element ZA tile..UMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.UMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.UMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Husmopa>The 8-bit integer variant works with a 32-bit element ZA tile./USMOPA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B/USMOPA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hmvn
MVN -- A64Bitwise NOT (vector)MVN <Vd>.<T>, <Vn>.<T>NOT   <Vd>.<T>, <Vn>.<T>
MVN -- A64Bitwise NOT#MVN <Wd>, <Wm>{, <shift> #<amount>}*ORN   <Wd>, WZR, <Wm>{, <shift> #<amount>}#MVN <Xd>, <Xm>{, <shift> #<amount>}*ORN   <Xd>, XZR, <Xm>{, <shift> #<amount>}fmad�SMultiply the corresponding active floating-point elements of the first and second source vectors and add to elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.*FMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>uqxtnt�Saturate the unsigned integer value in each source element to half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.UQXTNT <Zd>.<T>, <Zn>.<Tb>stlur4Store-release SIMD&amp;FP register (unscaled offset) STLUR <Bt>, [<Xn|SP>{, #<simm>}] STLUR <Ht>, [<Xn|SP>{, #<simm>}] STLUR <St>, [<Xn|SP>{, #<simm>}] STLUR <Dt>, [<Xn|SP>{, #<simm>}] STLUR <Qt>, [<Xn|SP>{, #<simm>}] STLUR <Wt>, [<Xn|SP>{, #<simm>}] STLUR <Xt>, [<Xn|SP>{, #<simm>}]ttestTest transaction state
TTEST <Xt>stlurb&Store-release register byte (unscaled)!STLURB <Wt>, [<Xn|SP>{, #<simm>}]fmaxnmv+Floating-point maximum number across vectorFMAXNMV <V><d>, <Vn>.<T>FMAXNMV  S <d>, <Vn>.4SFMAXNMV <V><d>, <Pg>, <Zn>.<T>sha512h2SHA512 hash update part 2SHA512H2 <Qd>, <Qn>, <Vm>.2Dfmaxnmp:Floating-point maximum number of pair of elements (scalar)FMAXNMP  H <d>, <Vn>.2HFMAXNMP <V><d>, <Vn>.<T>$FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>$FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>.FMAXNMP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uhsubr�9Subtract active unsigned elements of the first source vector from corresponding unsigned elements of the second source vector, shift right one bit, and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.-UHSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>whilerwnThis instruction checks two addresses for a conflict or overlap between address ranges of the form [addr,addr+WHILERW <Pd>.<T>, <Xn>, <Xm>wrffrJRead the source predicate register and place in the first-fault register (WRFFR <Pn>.Bcmpeq�Compare active integer elements in the source vector with an immediate, and place the boolean results of the specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the (CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>(CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D(CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D*CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>*CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>compact�Read the active elements from the source vector and pack them into the lowest-numbered elements of the destination vector. Then set any remaining elements of the destination vector to zero. COMPACT <Zd>.<T>, <Pg>, <Zn>.<T>fccmpe5Floating-point conditional signaling compare (scalar)"FCCMPE <Hn>, <Hm>, #<nzcv>, <cond>"FCCMPE <Sn>, <Sm>, #<nzcv>, <cond>"FCCMPE <Dn>, <Dm>, #<nzcv>, <cond>pmulPolynomial multiply!PMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>PMUL <Zd>.B, <Zn>.B, <Zm>.Bsqrshrnb�:Shift each signed integer value in the source vector elements right by an immediate value, and place the rounded results in the even-numbered half-width destination elements, while setting the odd-numbered elements to zero. Each result element is saturated to the half-width N-bit element's signed integer range -2&SQRSHRNB <Zd>.<T>, <Zn>.<Tb>, #<const>whilelo�Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, unsigned scalar operand is lower than the second scalar operand and false thereafter up to the highest numbered element. WHILELO <Pd>.<T>, <R><n>, <R><m>#WHILELO <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILELO { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>st3q�6Contiguous store three-quadword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #4]sm3tt1bSM3TT1B(SM3TT1B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]sri"Shift right and insert (immediate)SRI  D <d>, D<n>, #<shift> SRI <Vd>.<T>, <Vn>.<T>, #<shift> SRI <Zd>.<T>, <Zn>.<T>, #<const>ldap1JLoad-acquire RCpc one single-element structure to one lane of one register%LDAP1  { <Vt>.D }[<index>], [<Xn|SP>]saddlpSigned add long pairwiseSADDLP <Vd>.<Ta>, <Vn>.<Tb>frecpx+Floating-point reciprocal exponent (scalar)FRECPX <Hd>, <Hn>FRECPX <V><d>, <V><n>!FRECPX <Zd>.<T>, <Pg>/M, <Zn>.<T>ldursb$Load register signed byte (unscaled)!LDURSB <Wt>, [<Xn|SP>{, #<simm>}]!LDURSB <Xt>, [<Xn|SP>{, #<simm>}]cmlt&Compare signed less than zero (vector)CMLT  D <d>, D<n>, #0CMLT <Vd>.<T>, <Vn>.<T>, #0bdep�PThis instruction scatters the lowest-numbered contiguous bits within each element of the first source vector to the bit positions indicated by non-zero bits in the corresponding mask element of the second source vector, preserving their order, and set the bits corresponding to a zero mask bit to zero. This instruction is unpredicated.!BDEP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>fcvtlt�
Convert odd-numbered floating-point elements from the source vector to the next higher precision, and place the results in the active overlapping double-width elements of the destination vector. Inactive elements in the destination vector register remain unmodified.FCVTLT <Zd>.S, <Pg>/M, <Zn>.HFCVTLT <Zd>.D, <Pg>/M, <Zn>.Sldnp:Load pair of SIMD&amp;FP registers, with non-temporal hint&LDNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]&LDNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]&LDNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]&LDNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]&LDNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]brkpa�yIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.$BRKPA <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BfmopsuThe half-precision floating-point sum of outer products and subtract instruction works with a 32-bit element ZA tile..FMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPS <ZAda>.H, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.FMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S, <Zm>.S.FMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.D, <Zm>.Dfclamp�|Clamp each floating-point element in the two or four destination vectors to between the floating-point minimum value in the corresponding element of the first source vector and the floating-point maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.2FCLAMP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>2FCLAMP { <Zd1>.<T>-<Zd4>.<T> }, <Zn>.<T>, <Zm>.<T>#FCLAMP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>fnmadd2Floating-point negated fused multiply-add (scalar)FNMADD <Hd>, <Hn>, <Hm>, <Ha>FNMADD <Sd>, <Sn>, <Sm>, <Sa>FNMADD <Dd>, <Dn>, <Dm>, <Da>sbcSubtract with carrySBC <Wd>, <Wn>, <Wm>SBC <Xd>, <Xn>, <Xm>smops5This instruction works with a 32-bit element ZA tile..SMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.H.SMOPS <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.B, <Zm>.B.SMOPS <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.H, <Zm>.Hsqrshr�Shift right by an immediate value, the signed integer value in each element of the two source vectors and place the rounded results in the half-width destination elements. Each result element is saturated to the half-width N-bit element's signed integer range -2,SQRSHR <Zd>.H, { <Zn1>.S-<Zn2>.S }, #<const>4SQRSHR <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }, #<const>cls Count leading sign bits (vector)CLS <Vd>.<T>, <Vn>.<T>CLS <Wd>, <Wn>CLS <Xd>, <Xn>CLS <Zd>.<T>, <Pg>/M, <Zn>.<T>ld64bSingle-copy atomic 64-byte LoadLD64B <Xt>, [<Xn|SP> {, #0}]ldsetah$Atomic bit set on halfword in memoryLDSETAH <Ws>, <Wt>, [<Xn|SP>]LDSETALH <Ws>, <Wt>, [<Xn|SP>]LDSETH <Ws>, <Wt>, [<Xn|SP>]LDSETLH <Ws>, <Wt>, [<Xn|SP>]rmifRotate, mask insert flagsRMIF <Xn>, #<shift>, #<mask>st47Store multiple 4-element structures from four registers=ST4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]DST4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CST4  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>>ST4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]>ST4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]>ST4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]>ST4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]BST4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4DST4  { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>BST4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8DST4  { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>CST4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16DST4  { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>CST4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32DST4  { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>urecpeUnsigned reciprocal estimateURECPE <Vd>.<T>, <Vn>.<T>URECPE <Zd>.S, <Pg>/M, <Zn>.Sldclrh6Atomic bit clear on halfword in memory, without return2STCLRH <Ws>, [<Xn|SP>]LDCLRH  <Ws>, WZR, [<Xn|SP>]4STCLRLH <Ws>, [<Xn|SP>]LDCLRLH  <Ws>, WZR, [<Xn|SP>]pacia@Pointer Authentication Code for instruction address, using key APACIA <Xd>, <Xn|SP>PACIZA <Xd>
PACIA1716 PACIASP PACIAZ sm3tt2aSM3TT2A(SM3TT2A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]stlxr Store-release exclusive register!STLXR <Ws>, <Wt>, [<Xn|SP>{, #0}]!STLXR <Ws>, <Xt>, [<Xn|SP>{, #0}]ldsetb0Atomic bit set on byte in memory, without return2STSETB <Ws>, [<Xn|SP>]LDSETB  <Ws>, WZR, [<Xn|SP>]4STSETLB <Ws>, [<Xn|SP>]LDSETLB  <Ws>, WZR, [<Xn|SP>]fcmgt,Floating-point compare greater than (vector)FCMGT <Hd>, <Hn>, <Hm>FCMGT <V><d>, <V><n>, <V><m>"FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FCMGT <Hd>, <Hn>, #0.0FCMGT <V><d>, <V><n>, #0.0FCMGT <Vd>.<T>, <Vn>.<T>, #0.0FCMGT <Vd>.<T>, <Vn>.<T>, #0.0stxrbStore exclusive register byte!STXRB <Ws>, <Wt>, [<Xn|SP>{, #0}]bsl1n�CSelects bits from the inverted first source vector where the corresponding bit in the third source vector is '1', and from the second source vector where the corresponding bit in the third source vector is '0'. The result is placed destructively in the destination and first source vector. This instruction is unpredicated.&BSL1N <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dbfmls�UMultiply the corresponding active BFloat16 elements of the first and second source vectors and subtract from elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.%BFMLS <Zda>.H, <Pg>/M, <Zn>.H, <Zm>.H$BFMLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]IBFMLS   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IBFMLS   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@BFMLS   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@BFMLS   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMBFMLS   ZA.H[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MBFMLS   ZA.H[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }sabalb�!Compute the absolute difference between even-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&SABALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>trn2Transpose vectors (secondary)!TRN2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>tstTST (immediate) -- A64Test bits (immediate)TST <Wn>, #<imm>ANDS   WZR, <Wn>, #<imm>TST <Xn>, #<imm>ANDS   XZR, <Xn>, #<imm>TST (shifted register) -- A64Test (shifted register)#TST <Wn>, <Wm>{, <shift> #<amount>}+ANDS   WZR, <Wn>, <Wm>{, <shift> #<amount>}#TST <Xn>, <Xm>{, <shift> #<amount>}+ANDS   XZR, <Xn>, <Xm>{, <shift> #<amount>}xpacd!Strip Pointer Authentication Code
XPACD <Xd>
XPACI <Xd>XPACLRI prfb�Gather prefetch of bytes from the active memory addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive addresses are not prefetched from memory.&PRFB <prfop>, <Pg>, [<Zn>.S{, #<imm>}]&PRFB <prfop>, <Pg>, [<Zn>.D{, #<imm>}]/PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]#PRFB <prfop>, <Pg>, [<Xn|SP>, <Xm>],PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>],PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]%PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]umlalb�Multiply the corresponding even-numbered unsigned elements of the first and second source vectors and destructively add to the overlapping double-width elements of the addend vector. This instruction is unpredicated.&UMLALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>%UMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]%UMLALB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]sqxtun)Signed saturating extract unsigned narrowSQXTUN <Vb><d>, <Va><n> SQXTUN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>fdivFloating-point divide (vector)!FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FDIV <Hd>, <Hn>, <Hm>FDIV <Sd>, <Sn>, <Sm>FDIV <Dd>, <Dn>, <Dm>+FDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>aeseAES single round encryptionAESE <Vd>.16B, <Vn>.16BAESE <Zdn>.B, <Zdn>.B, <Zm>.Bldrsw%Load register signed word (immediate)LDRSW <Xt>, [<Xn|SP>], #<simm>LDRSW <Xt>, [<Xn|SP>, #<simm>]! LDRSW <Xt>, [<Xn|SP>{, #<pimm>}]LDRSW <Xt>, <label>9LDRSW <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]uaddwt�Add the odd-numbered unsigned elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$UADDWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>ldeorab%Atomic exclusive-OR on byte in memoryLDEORAB <Ws>, <Wt>, [<Xn|SP>]LDEORALB <Ws>, <Wt>, [<Xn|SP>]LDEORB <Ws>, <Wt>, [<Xn|SP>]LDEORLB <Ws>, <Wt>, [<Xn|SP>]uminvUnsigned minimum across vectorUMINV <V><d>, <Vn>.<T>UMINV <V><d>, <Pg>, <Zn>.<T>zero;The instruction zeroes two or four ZA single-vector groups.
!ZERO    ZA.D[ <Wv>, <offs>, VGx2]!ZERO    ZA.D[ <Wv>, <offs>, VGx4]$ZERO    ZA.D[ <Wv>, <offs1>:<offs2>]*ZERO    ZA.D[ <Wv>, <offs1>:<offs2>, VGx2]*ZERO    ZA.D[ <Wv>, <offs1>:<offs2>, VGx4]$ZERO    ZA.D[ <Wv>, <offs1>:<offs4>]*ZERO    ZA.D[ <Wv>, <offs1>:<offs4>, VGx2]*ZERO    ZA.D[ <Wv>, <offs1>:<offs4>, VGx4]ZERO { <mask> }ZERO { ZT0 }usubwt�-Subtract the odd-numbered unsigned elements of the second source vector from the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated. This instruction is unpredicated.$USUBWT <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>sqdech�kDetermines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-extended to 64 bits..SQDECH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}'SQDECH <Xdn>{, <pattern>{, MUL #<imm>}})SQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}ldnt1d�!Contiguous load non-temporal of doublewords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.@LDNT1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]@LDNT1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]<LDNT1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]<LDNT1D { <Zt1>.D-<Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]ALDNT1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]SLDNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]=LDNT1D { <Zt1>.D, <Zt2>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]OLDNT1D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #3]+LDNT1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]6LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]mvni Move inverted immediate (vector)'MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}'MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}%MVNI <Vd>.<T>, #<imm8>, MSL #<amount>retaa3Return from subroutine, with pointer authenticationRETAA RETAB smaddlSigned multiply-add longSMADDL <Xd>, <Wn>, <Wm>, <Xa>uqdecb�)Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECB <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECB <Xdn>{, <pattern>{, MUL #<imm>}}gcspopxGCSPOPX -- A641Guarded Control Stack pop exception return recordGCSPOPX SYS   #0, C7, C7, #6{, <Xt>}bfmin�Determine the minimum of BFloat16 elements of the second source vector and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.:BFMIN { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H:BFMIN { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, <Zm>.HGBFMIN { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>.H }GBFMIN { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>.H }&BFMIN <Zdn>.H, <Pg>/M, <Zdn>.H, <Zm>.Hbrkpb�}If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.$BRKPB <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.BadcsAdd with carry, setting flagsADCS <Wd>, <Wn>, <Wm>ADCS <Xd>, <Xn>, <Xm>clrbhbClear branch historyCLRBHB ldsmaxah+Atomic signed maximum on halfword in memoryLDSMAXAH <Ws>, <Wt>, [<Xn|SP>]LDSMAXALH <Ws>, <Wt>, [<Xn|SP>]LDSMAXH <Ws>, <Wt>, [<Xn|SP>]LDSMAXLH <Ws>, <Wt>, [<Xn|SP>]fcvtnt}Convert each single-precision element of the group of two source vectors to 8-bit floating-point while scaling the value by 2"FCVTNT <Zd>.B, { <Zn1>.S-<Zn2>.S }FCVTNT <Zd>.H, <Pg>/M, <Zn>.SFCVTNT <Zd>.S, <Pg>/M, <Zn>.DnopNo operationNOP ld1row�Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]2LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]srshlr��Shift active signed elements of the second source vector by corresponding elements of the first source vector and destructively place the rounded results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Inactive elements in the destination vector register remain unmodified.-SRSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>
autibsppcr9Authenticate return address using key B, using a registerAUTIBSPPCR <Xn>andqv��Bitwise AND of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as all ones.ANDQV <Vd>.<T>, <Pg>, <Zn>.<Tb>ld4b�/Contiguous load four-byte structures, each to the same element number in four vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,PLD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]DLD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]swpahSwap halfword in memorySWPAH <Ws>, <Wt>, [<Xn|SP>]SWPALH <Ws>, <Wt>, [<Xn|SP>]SWPH <Ws>, <Wt>, [<Xn|SP>]SWPLH <Ws>, <Wt>, [<Xn|SP>]st2q�2Contiguous store two-quadword structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]8ST2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>, [<Xn|SP>, <Xm>, LSL #4]cpypn*Memory copy, reads and writes non-temporalCPYPN  [ <Xd>]!, [<Xs>]!, <Xn>!CPYMN  [ <Xd>]!, [<Xs>]!, <Xn>!CPYEN  [ <Xd>]!, [<Xs>]!, <Xn>!msubpt!Multiply-subtract checked pointerMSUBPT <Xd>, <Xn>, <Xm>, <Xa>fsubr�0Reversed subtract from an immediate each active floating-point element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive elements in the destination vector register remain unmodified.+FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>,FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>sminqv�%Signed minimum of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the maximum signed integer for the element size. SMINQV <Vd>.<T>, <Pg>, <Zn>.<Tb>ptest	Sets the PTEST <Pg>, <Pn>.BextrExtract registerEXTR <Wd>, <Wn>, <Wm>, #<lsb>EXTR <Xd>, <Xn>, <Xm>, #<lsb>st1q�Scatter store of quadwords from the active elements of a vector register to the memory addresses generated by a vector base plus a 64-bit unscaled scalar register offset. Inactive elements are not written to memory.'ST1Q { <Zt>.Q }, <Pg>, [<Zn>.D{, <Xm>}]CST1Q { <ZAt><HV>.Q[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>, LSL #4}]st1b�Contiguous store of bytes from elements of two or four consecutive vector registers to the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.<ST1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]<ST1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST1B { <Zt1>.B-<Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]0ST1B { <Zt1>.B-<Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>]=ST1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]OST1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>{, #<imm>, MUL VL}]1ST1B { <Zt1>.B, <Zt2>.B }, <PNg>, [<Xn|SP>, <Xm>]CST1B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <PNg>, [<Xn|SP>, <Xm>])ST1B { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}])ST1B { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]4ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}](ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>]/ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]/ST1B { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>](ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]9ST1B { ZA0<HV>.B[<Ws>, <offs>] }, <Pg>, [<Xn|SP>{, <Xm>}]syslSystem instruction with result%SYSL <Xt>, #<op1>, <Cn>, <Cm>, #<op2>whilehi�Generate a predicate that starting from the highest numbered element is true while the decrementing value of the first, unsigned scalar operand is higher than the second scalar operand and false thereafter down to the lowest numbered element. WHILEHI <Pd>.<T>, <R><n>, <R><m>#WHILEHI <PNd>.<T>, <Xn>, <Xm>, <vl>,WHILEHI { <Pd1>.<T>, <Pd2>.<T> }, <Xn>, <Xm>stlxp)Store-release exclusive pair of registers)STLXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{, #0}])STLXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{, #0}]fmaxp3Floating-point maximum of pair of elements (scalar)FMAXP  H <d>, <Vn>.2HFMAXP <V><d>, <Vn>.<T>"FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,FMAXP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>st2b�.Contiguous store two-byte structures, each from the same element number in two vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,<ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]0ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>, <Xm>]stlrStore-release registerSTLR <Wt>, [<Xn|SP>{, #0}]STLR <Xt>, [<Xn|SP>{, #0}]STLR <Wt>, [<Xn|SP>, #-4]!STLR <Xt>, [<Xn|SP>, #-8]!subSubtract (extended register)4SUB <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}4SUB <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}})SUB <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}'SUB <Xd|SP>, <Xn|SP>, #<imm>{, <shift>})SUB <Wd>, <Wn>, <Wm>{, <shift> #<amount>})SUB <Xd>, <Xn>, <Xm>{, <shift> #<amount>}SUB  D <d>, D<n>, D<m> SUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>*SUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>+SUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>} SUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>SUB     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }>SUB     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }HSUB     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, <Zm>.<T>HSUB     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, <Zm>.<T>WSUB     ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zn1>.<T>-<Zn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }WSUB     ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zn1>.<T>-<Zn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> }umov8Unsigned move vector element to general-purpose registerUMOV <Wd>, <Vn>.<Ts>[<index>]UMOV <Xd>, <Vn>.D[<index>]cospCOSP -- A649Clear other speculative prediction restriction by contextCOSP  RCTX, <Xt>SYS   #3, C7, C3, #6, <Xt>fcmge5Floating-point compare greater than or equal (vector)FCMGE <Hd>, <Hn>, <Hm>FCMGE <V><d>, <V><n>, <V><m>"FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FCMGE <Hd>, <Hn>, #0.0FCMGE <V><d>, <V><n>, #0.0FCMGE <Vd>.<T>, <Vn>.<T>, #0.0FCMGE <Vd>.<T>, <Vn>.<T>, #0.0rsubhn'Rounding subtract returning high narrow+RSUBHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>cpyfprn,Memory copy forward-only, reads non-temporal!CPYFPRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFERN  [ <Xd>]!, [<Xs>]!, <Xn>!sshllt�0Shift left by immediate each odd-numbered signed element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$SSHLLT <Zd>.<T>, <Zn>.<Tb>, #<const>ssubltb�Subtract the even-numbered signed elements of the second source vector from the odd-numbered signed elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.&SSUBLTB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>ldursw$Load register signed word (unscaled)!LDURSW <Xt>, [<Xn|SP>{, #<simm>}]uqincp�Counts the number of true elements in the source predicate and then uses the result to increment the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.UQINCP <Wdn>, <Pm>.<T>UQINCP <Xdn>, <Pm>.<T>UQINCP <Zdn>.<T>, <Pm>.<T>st4w�0Contiguous store four-word structures, each from the same element number in four vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that is multiplied by the vector's in-memory size, irrespective of predication,NST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]JST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]addhnAdd returning high narrow*ADDHN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>cpy�Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination vector register are set to zero.'CPY <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}'CPY <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}CPY <Zd>.<T>, <Pg>/M, <R><n|SP>CPY <Zd>.<T>, <Pg>/M, <V><n>cpypwn Memory copy, writes non-temporal CPYPWN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMWN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYEWN  [ <Xd>]!, [<Xs>]!, <Xn>!whilewrnThis instruction checks two addresses for a conflict or overlap between address ranges of the form [addr,addr+WHILEWR <Pd>.<T>, <Xn>, <Xm>csinvConditional select invertCSINV <Wd>, <Wn>, <Wm>, <cond>CSINV <Xd>, <Xn>, <Xm>, <cond>ld35Load multiple 3-element structures to three registers2LD3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]9LD3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8LD3  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>5LD3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]5LD3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]5LD3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]5LD3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]9LD3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3;LD3  { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>9LD3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6;LD3  { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>:LD3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12;LD3  { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>:LD3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24;LD3  { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>luti4$Lookup table read with 4-bit indices+LUTI4 <Vd>.16B, { <Vn>.16B }, <Vm>[<index>]4LUTI4 <Vd>.8H, { <Vn1>.8H, <Vn2>.8H }, <Vm>[<index>]1LUTI4 { <Zd1>.<T>-<Zd2>.<T> }, ZT0, <Zn>[<index>]2LUTI4 { <Zd1>.<T>, <Zd2>.<T> }, ZT0, <Zn>[<index>]/LUTI4 { <Zd1>.B-<Zd4>.B }, ZT0, { <Zn1>-<Zn2> }BLUTI4 { <Zd1>.B, <Zd2>.B, <Zd3>.B, <Zd4>.B }, ZT0, { <Zn1>-<Zn2> }1LUTI4 { <Zd1>.<T>-<Zd4>.<T> }, ZT0, <Zn>[<index>]@LUTI4 { <Zd1>.H, <Zd2>.H, <Zd3>.H, <Zd4>.H }, ZT0, <Zn>[<index>]"LUTI4 <Zd>.<T>, ZT0, <Zn>[<index>]'LUTI4 <Zd>.B, { <Zn>.B }, <Zm>[<index>]1LUTI4 <Zd>.H, { <Zn1>.H, <Zn2>.H }, <Zm>[<index>]'LUTI4 <Zd>.H, { <Zn>.H }, <Zm>[<index>]sqcvtun�Saturate the signed integer value in each element of the group of two source vectors to unsigned integer value that is half the original source element width, and place the two-way interleaved results in the half-width destination elements.#SQCVTUN <Zd>.H, { <Zn1>.S-<Zn2>.S }+SQCVTUN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }index�Populates the destination vector by setting the first element to the first signed immediate integer operand and monotonically incrementing the value by the second signed immediate integer operand for each subsequent element. This instruction is unpredicated. INDEX <Zd>.<T>, #<imm1>, #<imm2>INDEX <Zd>.<T>, #<imm>, <R><m>INDEX <Zd>.<T>, <R><n>, #<imm>INDEX <Zd>.<T>, <R><n>, <R><m>movkMove wide with keep!MOVK <Wd>, #<imm>{, LSL #<shift>}!MOVK <Xd>, #<imm>{, LSL #<shift>}negsNEGS -- A64Negate, setting flags$NEGS <Wd>, <Wm>{, <shift> #<amount>}+SUBS   <Wd>, WZR, <Wm>{, <shift> #<amount>}$NEGS <Xd>, <Xm>{, <shift> #<amount>}+SUBS   <Xd>, XZR, <Xm>{, <shift> #<amount>}uqdecw�*Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is saturated to the general-purpose register's unsigned integer range.'UQDECW <Wdn>{, <pattern>{, MUL #<imm>}}'UQDECW <Xdn>{, <pattern>{, MUL #<imm>}})UQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}dghData gathering hintDGH fmlalBFloating-point fused multiply-add long to accumulator (by element)+FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>],FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]%FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>&FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>=FMLAL   ZA.H[ <Wv>, <offs1>:<offs2>], <Zn>.B, <Zm>.B[<index>]RFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.B[<index>]RFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]4FMLAL   ZA.H[ <Wv>, <offs1>:<offs2>], <Zn>.B, <Zm>.BIFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.B-<Zn2>.B }, <Zm>.BIFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.BVFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.B-<Zn2>.B }, { <Zm1>.B-<Zm2>.B }VFMLAL   ZA.H[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.B-<Zn4>.B }, { <Zm1>.B-<Zm4>.B }=FMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.H[<index>]RFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]RFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]4FMLAL   ZA.S[ <Wv>, <offs1>:<offs2>], <Zn>.H, <Zm>.HIFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.HIFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HVFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }VFMLAL   ZA.S[ <Wv>, <offs1>:<offs2>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }ld2b�-Contiguous load two-byte structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]2LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]ands&Bitwise AND (immediate), setting flagsANDS <Wd>, <Wn>, #<imm>ANDS <Xd>, <Xn>, #<imm>*ANDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}*ANDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}#ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Brshrnt�]Shift each unsigned integer value in the source vector elements right by an immediate value, and place the rounded results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. This instruction is unpredicated.$RSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>pnext�]An instruction used to construct a loop which iterates over all true elements in the vector select predicate register. If all elements in the first source predicate register are false it determines the first true element in the vector select predicate register, otherwise it determines the next true element in the vector select predicate register that follows the last true element in the first source predicate register. All elements of the destination predicate register are set to false, except the element corresponding to the determined vector select element, if any, which is set to true. Sets the  PNEXT <Pdn>.<T>, <Pv>, <Pdn>.<T>stl1FStore-release a single-element structure from one lane of one register$STL1  { <Vt>.D }[<index>], [<Xn|SP>]fnmls�cMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in the destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.+FNMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>fcmp%Floating-point quiet compare (scalar)FCMP <Hn>, <Hm>FCMP <Hn>, #0.0FCMP <Sn>, <Sm>FCMP <Sn>, #0.0FCMP <Dn>, <Dm>FCMP <Dn>, #0.0cpyptrn>Memory copy, reads and writes unprivileged, reads non-temporal!CPYPTRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYMTRN  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYETRN  [ <Xd>]!, [<Xs>]!, <Xn>!sttrStore register (unprivileged)STTR <Wt>, [<Xn|SP>{, #<simm>}]STTR <Xt>, [<Xn|SP>{, #<simm>}]uqaddUnsigned saturating addUQADD <V><d>, <V><n>, <V><m>"UQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,UQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>-UQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}"UQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>cpypt*Memory copy, reads and writes unprivilegedCPYPT  [ <Xd>]!, [<Xs>]!, <Xn>!CPYMT  [ <Xd>]!, [<Xs>]!, <Xn>!CPYET  [ <Xd>]!, [<Xs>]!, <Xn>!ld1MLoad multiple single-element structures to one, two, three, or four registersLD1  { <Vt>.<T> }, [<Xn|SP>]'LD1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]2LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]=LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]#LD1  { <Vt>.<T> }, [<Xn|SP>], <imm>"LD1  { <Vt>.<T> }, [<Xn|SP>], <Xm>.LD1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>-LD1  { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>9LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>8LD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>DLD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>CLD1  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>#LD1  { <Vt>.B }[<index>], [<Xn|SP>]#LD1  { <Vt>.H }[<index>], [<Xn|SP>]#LD1  { <Vt>.S }[<index>], [<Xn|SP>]#LD1  { <Vt>.D }[<index>], [<Xn|SP>]'LD1  { <Vt>.B }[<index>], [<Xn|SP>], #1)LD1  { <Vt>.B }[<index>], [<Xn|SP>], <Xm>'LD1  { <Vt>.D }[<index>], [<Xn|SP>], #8)LD1  { <Vt>.D }[<index>], [<Xn|SP>], <Xm>'LD1  { <Vt>.H }[<index>], [<Xn|SP>], #2)LD1  { <Vt>.H }[<index>], [<Xn|SP>], <Xm>'LD1  { <Vt>.S }[<index>], [<Xn|SP>], #4)LD1  { <Vt>.S }[<index>], [<Xn|SP>], <Xm>faminFloating-point absolute minimum"FAMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>"FAMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>SFAMIN { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SFAMIN { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },FAMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>uclamp�jClamp each unsigned element in the two or four destination vectors to between the unsigned minimum value in the corresponding element of the first source vector and the unsigned maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.2UCLAMP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>2UCLAMP { <Zd1>.<T>-<Zd4>.<T> }, <Zn>.<T>, <Zm>.<T>#UCLAMP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ushllb�3Shift left by immediate each even-numbered unsigned element of the source vector, and place the results in the overlapping double-width elements of the destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element minus 1. This instruction is unpredicated.$USHLLB <Zd>.<T>, <Zn>.<Tb>, #<const>ldumaxh=Atomic unsigned maximum on halfword in memory, without return4STUMAXH <Ws>, [<Xn|SP>]LDUMAXH  <Ws>, WZR, [<Xn|SP>]6STUMAXLH <Ws>, [<Xn|SP>]LDUMAXLH  <Ws>, WZR, [<Xn|SP>]bcaxBit clear and exclusive-OR+BCAX <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B%BCAX <Zdn>.D, <Zdn>.D, <Zm>.D, <Zk>.Dfmlalt�This 8-bit floating-point multiply-add long instruction widens the odd 8-bit elements in the first and second source vectors to half-precision format and multiplies the corresponding elements. The intermediate products are scaled by 2FMLALT <Zda>.H, <Zn>.B, <Zm>.B%FMLALT <Zda>.H, <Zn>.B, <Zm>.B[<imm>]FMLALT <Zda>.S, <Zn>.H, <Zm>.H%FMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]ldurh!Load register halfword (unscaled) LDURH <Wt>, [<Xn|SP>{, #<simm>}]rdvl�Multiply the current vector register size in bytes by an immediate in the range -32 to 31 and place the result in the 64-bit destination general-purpose register.RDVL <Xd>, #<imm>cmppCMPP -- A64Compare with tagCMPP <Xn|SP>, <Xm|SP>SUBPS   XZR, <Xn|SP>, <Xm|SP>brkns�If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise leaves the destination and second source predicate unchanged. Sets the &BRKNS <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.Bmlapt�Multiply with overflow check the elements of the first and second source vectors and add pointer check to elements of the third source (addend) vector. Destructively place the results in the destination and third source (addend) vector.MLAPT <Zda>.D, <Zn>.D, <Zm>.Dtrn1Transpose vectors (primary)!TRN1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!TRN1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!TRN2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>!TRN1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>TRN1 <Zd>.Q, <Zn>.Q, <Zm>.Q!TRN2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>TRN2 <Zd>.Q, <Zn>.Q, <Zm>.Qaddsvl�Add the Streaming SVE vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose register or current stack pointer.ADDSVL <Xd|SP>, <Xn|SP>, #<imm>st3d�8Contiguous store three-doubleword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]fcvtxnt�?Convert active double-precision floating-point elements from the source vector to single-precision, rounding to Odd, and place the results in the odd-numbered 32-bit elements of the destination vector, leaving the even-numbered elements unchanged. Inactive elements in the destination vector register remain unmodified.FCVTXNT <Zd>.S, <Pg>/M, <Zn>.DtstartStart transactionTSTART <Xt>crc32cbCRC32C checksumCRC32CB <Wd>, <Wn>, <Wm>CRC32CH <Wd>, <Wn>, <Wm>CRC32CW <Wd>, <Wn>, <Wm>CRC32CX <Wd>, <Wn>, <Xm>bf1cvtl18-bit floating-point convert to BFloat16 (vector)BF1CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>BF2CVTL{ 2}  <Vd>.8H, <Vn>.<Ta>#BF1CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.B#BF2CVTL { <Zd1>.H-<Zd2>.H }, <Zn>.Bld2q�1Contiguous load two-quadword structures, each to the same element number in two vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that is multiplied by the vector's in-memory size, irrespective of predication,>LD2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD2Q { <Zt1>.Q, <Zt2>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #4]rcwsclrp@Read check write software atomic bit clear on quadword in memory RCWSCLRP <Xt1>, <Xt2>, [<Xn|SP>]!RCWSCLRPA <Xt1>, <Xt2>, [<Xn|SP>]"RCWSCLRPAL <Xt1>, <Xt2>, [<Xn|SP>]!RCWSCLRPL <Xt1>, <Xt2>, [<Xn|SP>]saddwb�Add the even-numbered signed elements of the second source vector to the overlapping double-width elements of the first source vector and place the results in the corresponding double-width elements of the destination vector. This instruction is unpredicated.$SADDWB <Zd>.<T>, <Zn>.<T>, <Zm>.<Tb>rcwclr9Read check write atomic bit clear on doubleword in memoryRCWCLR <Xs>, <Xt>, [<Xn|SP>]RCWCLRA <Xs>, <Xt>, [<Xn|SP>]RCWCLRAL <Xs>, <Xt>, [<Xn|SP>]RCWCLRL <Xs>, <Xt>, [<Xn|SP>]aesimcAES inverse mix columnsAESIMC <Vd>.16B, <Vn>.16BAESIMC <Zdn>.B, <Zdn>.Bandv�Bitwise AND horizontally across all lanes of a vector, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as all ones.ANDV <V><d>, <Pg>, <Zn>.<T>ftsselThe #FTSSEL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>lsr�SShift right by immediate, inserting zeroes, each active element of the source vector, and destructively place the results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.*LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>(LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D*LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T> LSR <Zd>.<T>, <Zn>.<T>, #<const>LSR <Zd>.<T>, <Zn>.<T>, <Zm>.DLSR (register) -- A64Logical shift right (register)LSR <Wd>, <Wn>, <Wm>LSRV   <Wd>, <Wn>, <Wm>LSR <Xd>, <Xn>, <Xm>LSRV   <Xd>, <Xn>, <Xm>LSR (immediate) -- A64Logical shift right (immediate)LSR <Wd>, <Wn>, #<shift> UBFM   <Wd>, <Wn>, #<shift>, #31LSR <Xd>, <Xn>, #<shift> UBFM   <Xd>, <Xn>, #<shift>, #63brkb�VSets destination predicate elements up to but not including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.BRKB <Pd>.B, <Pg>/<ZM>, <Pn>.Bgcssttr(Guarded Control Stack unprivileged storeGCSSTTR <Xt>, [<Xn|SP>]rdsvl�Multiply the Streaming SVE vector register size in bytes by an immediate in the range -32 to 31 and place the result in the 64-bit destination general-purpose register.RDSVL <Xd>, #<imm>ldsmaxb7Atomic signed maximum on byte in memory, without return4STSMAXB <Ws>, [<Xn|SP>]LDSMAXB  <Ws>, WZR, [<Xn|SP>]6STSMAXLB <Ws>, [<Xn|SP>]LDSMAXLB  <Ws>, WZR, [<Xn|SP>]fsub Floating-point subtract (vector)!FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FSUB <Hd>, <Hn>, <Hm>FSUB <Sd>, <Sn>, <Sm>FSUB <Dd>, <Dn>, <Dm>*FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!FSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>FSUB    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }8FSUB    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }>FSUB    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }8FSUB    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }bifBitwise insert if false BIF <Vd>.<T>, <Vn>.<T>, <Vm>.<T>f1cvt�Convert each 8-bit floating-point element of the source vector to half-precision while downscaling the value, and place the results in the corresponding 16-bit elements of the destination vectors. F1CVT scales the values by 2!F1CVT { <Zd1>.H-<Zd2>.H }, <Zn>.B!F2CVT { <Zd1>.H-<Zd2>.H }, <Zn>.BF1CVT <Zd>.H, <Zn>.BF2CVT <Zd>.H, <Zn>.BaddpAdd pair of elements (scalar)ADDP  D <d>, <Vn>.2D!ADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>+ADDP <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>ldsmaxab'Atomic signed maximum on byte in memoryLDSMAXAB <Ws>, <Wt>, [<Xn|SP>]LDSMAXALB <Ws>, <Wt>, [<Xn|SP>]LDSMAXB <Ws>, <Wt>, [<Xn|SP>]LDSMAXLB <Ws>, <Wt>, [<Xn|SP>]sabdlb�&Compute the absolute difference between even-numbered signed integer values in elements of the second source vector and corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%SABDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>cpyptn;Memory copy, reads and writes unprivileged and non-temporal CPYPTN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYMTN  [ <Xd>]!, [<Xs>]!, <Xn>! CPYETN  [ <Xd>]!, [<Xs>]!, <Xn>!ld3d�7Contiguous load three-doubleword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]	sm3partw1	SM3PARTW1#SM3PARTW1 <Vd>.4S, <Vn>.4S, <Vm>.4Slslr��Reversed shift left active elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements in the destination vector register remain unmodified.+LSLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>st64bv07Single-copy atomic 64-byte EL0 store with status resultST64BV0 <Xs>, <Xt>, [<Xn|SP>]cmeqCompare bitwise equal (vector)CMEQ  D <d>, D<n>, D<m>!CMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>CMEQ  D <d>, D<n>, #0CMEQ <Vd>.<T>, <Vn>.<T>, #0frintp?Floating-point round to integral, toward plus infinity (vector)FRINTP <Vd>.<T>, <Vn>.<T>FRINTP <Vd>.<T>, <Vn>.<T>FRINTP <Hd>, <Hn>FRINTP <Sd>, <Sn>FRINTP <Dd>, <Dn>/FRINTP { <Zd1>.S-<Zd2>.S }, { <Zn1>.S-<Zn2>.S }/FRINTP { <Zd1>.S-<Zd4>.S }, { <Zn1>.S-<Zn4>.S }pacdb9Pointer Authentication Code for data address, using key BPACDB <Xd>, <Xn|SP>PACDZB <Xd>ext#Extract vector from pair of vectors*EXT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<index>(EXT <Zd>.B, { <Zn1>.B, <Zn2>.B }, #<imm>$EXT <Zdn>.B, <Zdn>.B, <Zm>.B, #<imm>udivr� Unsigned reversed divide active elements of the second source vector by corresponding elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.,UDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>bitBitwise insert if true BIT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>uabdlb�0Compute the absolute difference between the even-numbered unsigned integer values in elements of the second source vector and the corresponding elements of the first source vector, and place the results in the overlapping double-width elements of the destination vector. This instruction is unpredicated.%UABDLB <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>frsqrte.Floating-point reciprocal square root estimateFRSQRTE <Hd>, <Hn>FRSQRTE <V><d>, <V><n>FRSQRTE <Vd>.<T>, <Vn>.<T>FRSQRTE <Vd>.<T>, <Vn>.<T>FRSQRTE <Zd>.<T>, <Zn>.<T>sdivr�Signed reversed divide active elements of the second source vector by corresponding elements of the first source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the destination vector register remain unmodified.,SDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>rcwssetp>Read check write software atomic bit set on quadword in memory RCWSSETP <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSETPA <Xt1>, <Xt2>, [<Xn|SP>]"RCWSSETPAL <Xt1>, <Xt2>, [<Xn|SP>]!RCWSSETPL <Xt1>, <Xt2>, [<Xn|SP>]fmaxnmqv�9Floating-point maximum number of the same element numbers from each 128-bit source vector segment using a recursive pairwise reduction, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as the default NaN."FMAXNMQV <Vd>.<T>, <Pg>, <Zn>.<Tb>shaddSigned halving add"SHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>,SHADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>cntb�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then places the result in the scalar destination.$CNTB <Xd>{, <pattern>{, MUL #<imm>}}$CNTD <Xd>{, <pattern>{, MUL #<imm>}}$CNTH <Xd>{, <pattern>{, MUL #<imm>}}$CNTW <Xd>{, <pattern>{, MUL #<imm>}}addva��Add each element of the source vector to the corresponding active element of each vertical slice of a ZA tile. The tile elements are predicated by a pair of governing predicates. An element of a vertical slice is considered active if its corresponding element in the first governing predicate is TRUE and the element corresponding to its vertical slice number in the second governing predicate is TRUE. Inactive elements in the destination tile remain unmodified.&ADDVA <ZAda>.S, <Pn>/M, <Pm>/M, <Zn>.S&ADDVA <ZAda>.D, <Pn>/M, <Pm>/M, <Zn>.Dld1rob�Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base address..LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]*LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]ld4rLLoad single 4-element structure and replicate to all lanes of four registers>LD4R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]ELD4R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>DLD4R  { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>faddFloating-point add (vector)!FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>!FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>FADD <Hd>, <Hn>, <Hm>FADD <Sd>, <Sn>, <Sm>FADD <Dd>, <Dn>, <Dm>*FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>+FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>!FADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>>FADD    ZA. <T>[<Wv>, <offs>{, VGx2}], { <Zm1>.<T>-<Zm2>.<T> }8FADD    ZA.H[ <Wv>, <offs>{, VGx2}], { <Zm1>.H-<Zm2>.H }>FADD    ZA. <T>[<Wv>, <offs>{, VGx4}], { <Zm1>.<T>-<Zm4>.<T> }8FADD    ZA.H[ <Wv>, <offs>{, VGx4}], { <Zm1>.H-<Zm4>.H }fjcvtzsMFloating-point Javascript convert to signed fixed-point, rounding toward zeroFJCVTZS <Wd>, <Dn>st3b�2Contiguous store three-byte structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]9ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>, <Xm>]uqshlr��Shift active unsigned elements of the second source vector by corresponding elements of the first source vector and destructively place the results in the corresponding elements of the first source vector. A positive shift amount performs a left shift, otherwise a right shift by the negated shift amount is performed. Each result element is saturated to the N-bit element's unsigned integer range 0 to (2-UQSHLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>urshl'Unsigned rounding shift left (register)URSHL  D <d>, D<n>, D<m>"URSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>DURSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, <Zm>.<T>DURSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, <Zm>.<T>SURSHL { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zdn1>.<T>-<Zdn2>.<T> }, { <Zm1>.<T>-<Zm2>.<T> }SURSHL { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zdn1>.<T>-<Zdn4>.<T> }, { <Zm1>.<T>-<Zm4>.<T> },URSHL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>usqadd.Unsigned saturating accumulate of signed valueUSQADD <V><d>, <V><n>USQADD <Vd>.<T>, <Vn>.<T>-USQADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>rsubhnt�9Subtract each vector element of the second source vector from the corresponding vector element in the first source vector, and place the most significant rounded half of the result in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. This instruction is unpredicated.&RSUBHNT <Zd>.<T>, <Zn>.<Tb>, <Zm>.<Tb>saddwSigned add wide*SADDW{ 2}  <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>ldr,Load SIMD&amp;FP register (immediate offset)&LDR <Bt>, [<Xn|SP>], #<simm>LDR <Ht>, [<Xn|SP>], #<simm>LDR <St>, [<Xn|SP>], #<simm>LDR <Dt>, [<Xn|SP>], #<simm>LDR <Qt>, [<Xn|SP>], #<simm>LDR <Bt>, [<Xn|SP>, #<simm>]!LDR <Ht>, [<Xn|SP>, #<simm>]!LDR <St>, [<Xn|SP>, #<simm>]!LDR <Dt>, [<Xn|SP>, #<simm>]!LDR <Qt>, [<Xn|SP>, #<simm>]!LDR <Bt>, [<Xn|SP>{, #<pimm>}]LDR <Ht>, [<Xn|SP>{, #<pimm>}]LDR <St>, [<Xn|SP>{, #<pimm>}]LDR <Dt>, [<Xn|SP>{, #<pimm>}]LDR <Qt>, [<Xn|SP>{, #<pimm>}]LDR <Wt>, [<Xn|SP>], #<simm>LDR <Xt>, [<Xn|SP>], #<simm>LDR <Wt>, [<Xn|SP>, #<simm>]!LDR <Xt>, [<Xn|SP>, #<simm>]!LDR <Wt>, [<Xn|SP>{, #<pimm>}]LDR <Xt>, [<Xn|SP>{, #<pimm>}]LDR <St>, <label>LDR <Dt>, <label>LDR <Qt>, <label>LDR <Wt>, <label>LDR <Xt>, <label>%LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]5LDR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}])LDR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]7LDR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]7LDR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]%LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]7LDR     ZA[ <Wv>, <offs>], [<Xn|SP>{, #<offs>, MUL VL}]LDR     ZT0, [ <Xn|SP>]sbcs"Subtract with carry, setting flagsSBCS <Wd>, <Wn>, <Wm>SBCS <Xd>, <Xn>, <Xm>sqxtunt��Saturate the signed integer value in each source element to an unsigned integer value that is half the original source element width, and place the results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged.SQXTUNT <Zd>.<T>, <Zn>.<Tb>sxthSXTH -- A64Sign extend halfwordSXTH <Wd>, <Wn>SBFM   <Wd>, <Wn>, #0, #15SXTH <Xd>, <Wn>SBFM   <Xd>, <Xn>, #0, #15bfmlalb��This BFloat16 floating-point multiply-add long instruction widens the even-numbered BFloat16 elements in the first source vector and the corresponding elements in the second source vector to single-precision format and then destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is unpredicated.BFMLALB <Zda>.S, <Zn>.H, <Zm>.H&BFMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]adclt�SAdd the odd-numbered elements of the first source vector and the 1-bit carry from the least-significant bit of the odd-numbered elements of the second source vector to the even-numbered elements of the destination and accumulator vector. The 1-bit carry output is placed in the corresponding odd-numbered element of the destination vector.#ADCLT <Zda>.<T>, <Zn>.<T>, <Zm>.<T>brkpas�aIf the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise sets destination predicate elements up to and including the first active and true source element to true, then sets subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the %BRKPAS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Baddqv��Unsigned addition of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as zero.ADDQV <Vd>.<T>, <Pg>, <Zn>.<Tb>incb�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination.%INCB <Xdn>{, <pattern>{, MUL #<imm>}}%INCD <Xdn>{, <pattern>{, MUL #<imm>}}%INCH <Xdn>{, <pattern>{, MUL #<imm>}}%INCW <Xdn>{, <pattern>{, MUL #<imm>}}ldxrbLoad exclusive register byteLDXRB <Wt>, [<Xn|SP>{, #0}]nmatch��This instruction compares each active 8-bit or 16-bit character in the first source vector with all of the characters in the corresponding 128-bit segment of the second source vector. Where the first source element detects no matching characters in the second segment it places true in the corresponding element of the destination predicate, otherwise false. Inactive elements in the destination predicate register are set to zero. Sets the +NMATCH <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>	cpyfpwtwn>Memory copy forward-only, writes unprivileged and non-temporal#CPYFPWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFMWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!#CPYFEWTWN  [ <Xd>]!, [<Xs>]!, <Xn>!rcwscas?Read check write software compare and swap doubleword in memoryRCWSCAS <Xs>, <Xt>, [<Xn|SP>]RCWSCASA <Xs>, <Xt>, [<Xn|SP>]RCWSCASAL <Xs>, <Xt>, [<Xn|SP>]RCWSCASL <Xs>, <Xt>, [<Xn|SP>]sclamp�dClamp each signed element in the two or four destination vectors to between the signed minimum value in the corresponding element of the first source vector and the signed maximum value in the corresponding element of the second source vector and destructively place the clamped results in the corresponding elements of the two or four destination vectors.2SCLAMP { <Zd1>.<T>-<Zd2>.<T> }, <Zn>.<T>, <Zm>.<T>2SCLAMP { <Zd1>.<T>-<Zd4>.<T> }, <Zn>.<T>, <Zm>.<T>#SCLAMP <Zd>.<T>, <Zn>.<T>, <Zm>.<T>ldsmaxh;Atomic signed maximum on halfword in memory, without return4STSMAXH <Ws>, [<Xn|SP>]LDSMAXH  <Ws>, WZR, [<Xn|SP>]6STSMAXLH <Ws>, [<Xn|SP>]LDSMAXLH  <Ws>, WZR, [<Xn|SP>]ubfmUnsigned bitfield move!UBFM <Wd>, <Wn>, #<immr>, #<imms>!UBFM <Xd>, <Xn>, #<immr>, #<imms>uqcvtn�Saturate the unsigned integer value in each element of the group of two source vectors to half the original source element width, and place the two-way interleaved results in the half-width destination elements."UQCVTN <Zd>.H, { <Zn1>.S-<Zn2>.S }*UQCVTN <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }incd�Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements.'INCD <Zdn>.D{, <pattern>{, MUL #<imm>}}'INCH <Zdn>.H{, <pattern>{, MUL #<imm>}}'INCW <Zdn>.S{, <pattern>{, MUL #<imm>}}fcvt)Floating-point convert precision (scalar)FCVT <Sd>, <Hn>FCVT <Dd>, <Hn>FCVT <Hd>, <Sn>FCVT <Dd>, <Sn>FCVT <Hd>, <Dn>FCVT <Sd>, <Dn> FCVT { <Zd1>.S-<Zd2>.S }, <Zn>.H FCVT <Zd>.B, { <Zn1>.H-<Zn2>.H } FCVT <Zd>.B, { <Zn1>.S-<Zn4>.S } FCVT <Zd>.H, { <Zn1>.S-<Zn2>.S }FCVT <Zd>.S, <Pg>/M, <Zn>.HFCVT <Zd>.D, <Pg>/M, <Zn>.HFCVT <Zd>.H, <Pg>/M, <Zn>.SFCVT <Zd>.D, <Pg>/M, <Zn>.SFCVT <Zd>.H, <Pg>/M, <Zn>.DFCVT <Zd>.S, <Pg>/M, <Zn>.Dld1w�Contiguous load of unsigned words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.>LD1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]>LD1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]:LD1W { <Zt1>.S-<Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]:LD1W { <Zt1>.S-<Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]?LD1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]QLD1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>{, #<imm>, MUL VL}];LD1W { <Zt1>.S, <Zt2>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]MLD1W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <PNg>/Z, [<Xn|SP>, <Xm>, LSL #2]+LD1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]+LD1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]4LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]4LD1W { <Zt>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]0LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]0LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]0LD1W { <Zt>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]4LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]4LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]1LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]1LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]2LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]*LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]ELD1W { <ZAt><HV>.S[<Ws>, <offs>] }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]rcwswpp(Read check write swap quadword in memoryRCWSWPP <Xt1>, <Xt2>, [<Xn|SP>] RCWSWPPA <Xt1>, <Xt2>, [<Xn|SP>]!RCWSWPPAL <Xt1>, <Xt2>, [<Xn|SP>] RCWSWPPL <Xt1>, <Xt2>, [<Xn|SP>]bfdot8BFloat16 floating-point dot product (vector, by element)
,BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.2H[<index>]%BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>BFDOT <Zda>.S, <Zn>.H, <Zm>.H$BFDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]IBFDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H[<index>]IBFDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.H[<index>]@BFDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, <Zm>.H@BFDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, <Zm>.HMBFDOT   ZA.S[ <Wv>, <offs>{, VGx2}], { <Zn1>.H-<Zn2>.H }, { <Zm1>.H-<Zm2>.H }MBFDOT   ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.H-<Zn4>.H }, { <Zm1>.H-<Zm4>.H }eorqv�Bitwise exclusive OR of the same element numbers from each 128-bit source vector segment, placing each result into the corresponding element number of the 128-bit SIMD&amp;FP destination register. Inactive elements in the source vector are treated as all zeros.EORQV <Vd>.<T>, <Pg>, <Zn>.<Tb>ldurbLoad register byte (unscaled) LDURB <Wt>, [<Xn|SP>{, #<simm>}]sqdmlalb�Multiply then double the corresponding even-numbered signed elements of the first and second source vectors. Each intermediate value is saturated to the double-width N-bit value's signed integer range -2(SQDMLALB <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>'SQDMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]'SQDMLALB <Zda>.D, <Zn>.S, <Zm>.S[<imm>]faddv�Floating-point add horizontally over all lanes of a vector using a recursive pairwise reduction, and place the result in the SIMD&amp;FP scalar destination register. Inactive elements in the source vector are treated as +0.0.FADDV <V><d>, <Pg>, <Zn>.<T>sqshrn0Signed saturating shift right narrow (immediate)!SQSHRN <Vb><d>, <Va><n>, #<shift>*SQSHRN{ 2}  <Vd>.<Tb>, <Vn>.<Ta>, #<shift>ld1sw�:Gather load of signed words to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.,LD1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]5LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]1LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]5LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]2LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]3LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]+LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]st3h�6Contiguous store three-halfword structures, each from the same element number in three vector registers to the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,EST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]AST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]ld1rsb�Load a single signed byte from a memory address generated by a 64-bit scalar base address plus an immediate offset which is in the range 0 to 63..LD1RSB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}].LD1RSB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}].LD1RSB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]sqcvt�Saturate the signed integer value in each element of the two source vectors to half the original source element width, and place the results in the half-width destination elements.!SQCVT <Zd>.H, { <Zn1>.S-<Zn2>.S })SQCVT <Zd>.<T>, { <Zn1>.<Tb>-<Zn4>.<Tb> }ldtrsw(Load register signed word (unprivileged)!LDTRSW <Xt>, [<Xn|SP>{, #<simm>}]fmsb�ZMultiply the corresponding active floating-point elements of the first and second source vectors and subtract from elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.*FMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>orrs�"Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the #ORRS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.Bsqshrnt�8Shift each signed integer value in the source vector elements right by an immediate value, and place the truncated results in the odd-numbered half-width destination elements, leaving the even-numbered elements unchanged. Each result element is saturated to the half-width N-bit element's signed integer range -2%SQSHRNT <Zd>.<T>, <Zn>.<Tb>, #<const>usvdot��The unsigned by signed integer vertical dot product instruction computes the vertical dot product of corresponding unsigned 8-bit elements from the four first source vectors and four signed 8-bit integer values in the corresponding indexed 32-bit element of the second source vector. The widened dot product result is destructively added to the corresponding 32-bit element of the ZA single-vector groups.IUSVDOT  ZA.S[ <Wv>, <offs>{, VGx4}], { <Zn1>.B-<Zn4>.B }, <Zm>.B[<index>]yieldYieldYIELD ummla:Unsigned 8-bit integer matrix multiply-accumulate (vector)!UMMLA <Vd>.4S, <Vn>.16B, <Vm>.16BUMMLA <Zda>.S, <Zn>.B, <Zm>.Bursra8Unsigned rounding shift right and accumulate (immediate)URSRA  D <d>, D<n>, #<shift>"URSRA <Vd>.<T>, <Vn>.<T>, #<shift>#URSRA <Zda>.<T>, <Zn>.<T>, #<const>frecpe"Floating-point reciprocal estimateFRECPE <Hd>, <Hn>FRECPE <V><d>, <V><n>FRECPE <Vd>.<T>, <Vn>.<T>FRECPE <Vd>.<T>, <Vn>.<T>FRECPE <Zd>.<T>, <Zn>.<T>gcsstrGuarded Control Stack storeGCSSTR <Xt>, [<Xn|SP>]ld3q�5Contiguous load three-quadword structures, each to the same element number in three vector registers from the memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that is multiplied by the vector's in-memory size, irrespective of predication,GLD3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]CLD3Q { <Zt1>.Q, <Zt2>.Q, <Zt3>.Q }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #4]addvAdd across vectorADDV <V><d>, <Vn>.<T>hltHalt instructionHLT  # <imm>smaxvSigned maximum across vectorSMAXV <V><d>, <Vn>.<T>SMAXV <V><d>, <Pg>, <Zn>.<T>tbxq�(For each 128-bit destination vector segment, reads each element of the corresponding second source (index) vector segment and uses its value to select an indexed element from the corresponding first source (table) vector segment. The indexed table element is placed in the element of the destination vector that corresponds to the index vector element. If an index value is greater than or equal to the number of elements in a 128-bit vector segment then the corresponding destination vector element is left unchanged. This instruction is unpredicated.!TBXQ <Zd>.<T>, <Zn>.<T>, <Zm>.<T>cpyfprt,Memory copy forward-only, reads unprivileged!CPYFPRT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFMRT  [ <Xd>]!, [<Xs>]!, <Xn>!!CPYFERT  [ <Xd>]!, [<Xs>]!, <Xn>!udivUnsigned divideUDIV <Wd>, <Wn>, <Wm>UDIV <Xd>, <Xn>, <Xm>+UDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>