Opcode/Instruction | Op/En | 64/32 bit Mode Support | CPUID Feature Flag | Description |
---|---|---|---|---|
EVEX.128.NP.0F3A.W0 56 /r /ib VREDUCEPH xmm1{k1}{z}, xmm2/ imm8 m128/m16bcst, |
A | V/V | AVX512-FP16 AVX512VL | Perform reduction transformation on packed FP16 values in xmm2/m128/m16bcst by subtracting a number of fraction bits specified by the imm8 field. Store the result in xmm1 subject to writemask k1. |
EVEX.256.NP.0F3A.W0 56 /r /ib VREDUCEPH ymm1{k1}{z}, ymm2/m256/m16bcst, imm8 |
A | V/V | AVX512-FP16 AVX512VL | Perform reduction transformation on packed FP16 values in ymm2/m256/m16bcst by subtracting a number of fraction bits specified by the imm8 field. Store the result in ymm1 subject to writemask k1. |
EVEX.512.NP.0F3A.W0 56 /r /ib VREDUCEPH zmm1{k1}{z}, zmm2/m512/m16bcst {sae}, imm8 |
A | V/V | AVX512-FP16 | Perform reduction transformation on packed FP16 values in zmm2/m512/m16bcst by subtracting a number of fraction bits specified by the imm8 field. Store the result in zmm1 subject to writemask k1. |
Op/En | Tuple | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|---|
A | Full | ModRM:reg (w) | ModRM:r/m (r) | imm8 (r) | N/A |
Description
This instruction performs a reduction transformation of the packed binary encoded FP16 values in the source operand (the second operand) and store the reduced results in binary FP format to the destination operand (the first operand) under the writemask k1.
The reduction transformation subtracts the integer part and the leading M fractional bits from the binary FP source value, where M is a unsigned integer specified by imm8[7:4]. Specifically, the reduction transformation can be expressed as:
dest = src − (ROUND(2M * src)) * 2−M
where ROUND() treats src, 2M, and their product as binary FP numbers with normalized significand and biased exponents.
The magnitude of the reduced result can be expressed by considering src = 2p * man2, where ‘man2’ is the normal-ized significand and ‘p’ is the unbiased exponent.
Then if RC=RNE: 0 ≤ |ReducedResult| ≤ 2−M−1.
Then if RC ≠ RNE: 0 ≤ |ReducedResult| < 2−M.
This instruction might end up with a precision exception set. However, in case of SPE set (i.e., Suppress Precision Exception, which is imm8[3]=1), no precision exception is reported.
This instruction may generate tiny non-zero result. If it does so, it does not report underflow exception, even if underflow exceptions are unmasked (UM flag in MXCSR register is 0).
For special cases, see Table 5-20.
Input value | Round Mode | Returned Value |
---|---|---|
|Src1| < 2−M−1 |