Opcode/Instruction | Op/En | 64/32 bit Mode Support | CPUID Feature Flag | Description |
---|---|---|---|---|
EVEX.128.66.0F38.W0 8D /r VPERMB xmm1 {k1}{z}, xmm2, xmm3/m128 | A | V/V | AVX512VL AVX512_VBMI | Permute bytes in xmm3/m128 using byte indexes in xmm2 and store the result in xmm1 using writemask k1. |
EVEX.256.66.0F38.W0 8D /r VPERMB ymm1 {k1}{z}, ymm2, ymm3/m256 | A | V/V | AVX512VL AVX512_VBMI | Permute bytes in ymm3/m256 using byte indexes in ymm2 and store the result in ymm1 using writemask k1. |
EVEX.512.66.0F38.W0 8D /r VPERMB zmm1 {k1}{z}, zmm2, zmm3/m512 | A | V/V | AVX512_VBMI | Permute bytes in zmm3/m512 using byte indexes in zmm2 and store the result in zmm1 using writemask k1. |
Op/En | Tuple Type | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|---|
A | Full Mem | ModRM:reg (w) | EVEX.vvvv (r) | ModRM:r/m (r) | N/A |
Description
Copies bytes from the second source operand (the third operand) to the destination operand (the first operand) according to the byte indices in the first source operand (the second operand). Note that this instruction permits a byte in the source operand to be copied to more than one location in the destination operand.
Only the low 6(EVEX.512)/5(EVEX.256)/4(EVEX.128) bits of each byte index is used to select the location of the source byte from the second source operand.
The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM reg-ister, a 512/256/128-bit memory location. The destination operand is a ZMM/YMM/XMM register updated at byte granularity by the writemask k1.
Operation
VPERMB (EVEX encoded versions)
(KL, VL) = (16, 128), (32, 256), (64, 512)
IF VL = 128:
n := 3;
ELSE IF VL = 256:
n := 4;
ELSE IF VL = 512:
n := 5;
FI;
FOR j := 0 TO KL-1:
id := SRC1[j*8 + n : j*8] ; // location of the source byte
IF k1[j] OR *no writemask* THEN
DEST[j*8 + 7: j*8] := SRC2[id*8 +7: id*8];
ELSE IF zeroing-masking THEN
DEST[j*8 + 7: j*8] := 0;
*ELSE
DEST[j*8 + 7: j*8] remains unchanged*
FI
ENDFOR
DEST[MAX_VL-1:VL] := 0;
Intel C/C++ Compiler Intrinsic Equivalent
VPERMB __m512i _mm512_permutexvar_epi8( __m512i idx, __m512i a);
VPERMB __m512i _mm512_mask_permutexvar_epi8(__m512i s, __mmask64 k, __m512i idx, __m512i a);
VPERMB __m512i _mm512_maskz_permutexvar_epi8( __mmask64 k, __m512i idx, __m512i a);
VPERMB __m256i _mm256_permutexvar_epi8( __m256i idx, __m256i a);
VPERMB __m256i _mm256_mask_permutexvar_epi8(__m256i s, __mmask32 k, __m256i idx, __m256i a);
VPERMB __m256i _mm256_maskz_permutexvar_epi8( __mmask32 k, __m256i idx, __m256i a);
VPERMB __m128i _mm_permutexvar_epi8( __m128i idx, __m128i a);
VPERMB __m128i _mm_mask_permutexvar_epi8(__m128i s, __mmask16 k, __m128i idx, __m128i a);
VPERMB __m128i _mm_maskz_permutexvar_epi8( __mmask16 k, __m128i idx, __m128i a);
SIMD Floating-Point Exceptions
None.
Other Exceptions
See Exceptions Type E4NF.nb in Table 2-50, “Type E4NF Class Exception Conditions.”