Reworked instruction set documentation

This commit is contained in:
tevador 2019-01-10 23:36:53 +01:00
parent d1a808643d
commit 6941b2cb69
2 changed files with 294 additions and 197 deletions

129
doc/isa-ops.md Normal file
View File

@ -0,0 +1,129 @@
# RandomX instruction listing
There are 31 unique instructions divided into 3 groups:
|group|# operations|# opcodes||
|---------|-----------------|----|-|
|integer (IA)|22|144|56.3%|
|floating point (FP)|5|76|29.7%|
|control (CL)|4|36|14.0%
||**31**|**256**|**100%**
## Integer instructions
There are 22 integer instructions. They are divided into 3 classes (MATH, DIV, SHIFT) with different B operand selection rules.
|# opcodes|instruction|class|signed|A width|B width|C|C width|
|-|-|-|-|-|-|-|-|
|12|ADD_64|MATH|no|64|64|`A + B`|64|
|2|ADD_32|MATH|no|32|32|`A + B`|32|
|12|SUB_64|MATH|no|64|64|`A - B`|64|
|2|SUB_32|MATH|no|32|32|`A - B`|32|
|21|MUL_64|MATH|no|64|64|`A * B`|64|
|10|MULH_64|MATH|no|64|64|`A * B`|64|
|15|MUL_32|MATH|no|32|32|`A * B`|64|
|15|IMUL_32|MATH|yes|32|32|`A * B`|64|
|10|IMULH_64|MATH|yes|64|64|`A * B`|64|
|4|DIV_64|DIV|no|64|32|`A / B`|64|
|4|IDIV_64|DIV|yes|64|32|`A / B`|64|
|4|AND_64|MATH|no|64|64|`A & B`|64|
|2|AND_32|MATH|no|32|32|`A & B`|32|
|4|OR_64|MATH|no|64|64|`A | B`|64|
|2|OR_32|MATH|no|32|32|`A | B`|32|
|4|XOR_64|MATH|no|64|64|`A ^ B`|64|
|2|XOR_32|MATH|no|32|32|`A ^ B`|32|
|3|SHL_64|SHIFT|no|64|6|`A << B`|64|
|3|SHR_64|SHIFT|no|64|6|`A >> B`|64|
|3|SAR_64|SHIFT|yes|64|6|`A >> B`|64|
|6|ROL_64|SHIFT|no|64|6|`A <<< B`|64|
|6|ROR_64|SHIFT|no|64|6|`A >>> B`|64|
#### 32-bit operations
Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are set to zero.
#### Multiplication
There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result.
#### Division
For the division instructions, the dividend is 64 bits long and the divisor 32 bits long. The IDIV_64 instruction interprets both operands as signed integers. In case of division by zero or signed overflow, the result is equal to the dividend `A`.
75% of division instructions use a runtime-constant divisor and can be optimized using a multiplication and shifts.
#### Shift and rotate
The shift/rotate instructions use just the bottom 6 bits of the `B` operand (`imm8` is used as the immediate value). All treat `A` as unsigned except SAR_64, which performs an arithmetic right shift by copying the sign bit.
## Floating point instructions
There are 5 floating point instructions. All floating point instructions are vector instructions that operate on two packed double precision floating point values.
|# opcodes|instruction|C|
|-|-|-|-|
|20|FPADD|`A + B`|
|20|FPSUB|`A - B`|
|22|FPMUL|`A * B`|
|8|FPDIV|`A / B`|
|6|FPSQRT|`sqrt(abs(A))`|
#### Conversion of operand A
Operand A is loaded from memory as a 64-bit value. All floating point instructions interpret A as two packed 32-bit signed integers and convert them into two packed double precision floating point values.
#### Rounding
FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` control instruction. Denormal values must be always flushed to zero.
#### NaN
If an operation produces NaN, the result is converted into positive zero. NaN results may never be written into registers or memory. Only division and multiplication must be checked for NaN results (`0.0 / 0.0` and `0.0 * Infinity` result in NaN).
## Control instructions
There are 4 control instructions.
|# opcodes|instruction|description|condition|
|-|-|-|-|
|2|FPROUND|change floating point rounding mode|-
|11|JUMP|conditional jump|(see condition table below)
|11|CALL|conditional procedure call|(see condition table below)
|12|RET|return from procedure|stack is not empty
All control instructions behave as 'arithmetic no-op' and simply copy the input operand A into the destination C.
The JUMP and CALL instructions use a condition function, which takes the lower 32 bits of operand B (register) and the value `imm32` and evaluates a condition based on the `B.LOC.C` flag:
|`B.LOC.C`|signed|jump condition|probability|*x86*|*ARM*
|---|---|----------|-----|--|----|
|0|no|`B <= imm32`|0% - 100%|`JBE`|`BLS`
|1|no|`B > imm32`|0% - 100%|`JA`|`BHI`
|2|yes|`B - imm32 < 0`|50%|`JS`|`BMI`
|3|yes|`B - imm32 >= 0`|50%|`JNS`|`BPL`
|4|yes|`B - imm32` overflows|0% - 50%|`JO`|`BVS`
|5|yes|`B - imm32` doesn't overflow|50% - 100%|`JNO`|`BVC`
|6|yes|`B < imm32`|0% - 100%|`JL`|`BLT`
|7|yes|`B >= imm32`|0% - 100%|`JGE`|`BGE`
The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected jump probability (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).*
### FPROUND
The FPROUND instruction changes the rounding mode for all subsequent FPU operations depending on a two-bit flag. The flag is calculated by rotating A `imm8` bits to the right and taking the two least-significant bits:
```
rounding flag = (A >>> imm8)[1:0]
```
|rounding flag|rounding mode|
|-------|------------|
|00|roundTiesToEven|
|01|roundTowardNegative|
|10|roundTowardPositive|
|11|roundTowardZero|
The rounding modes are defined by the IEEE-754 standard.
*The two-bit flag value exactly corresponds to bits 13-14 of the x86 `MXCSR` register and bits 23 and 22 (reversed) of the ARM `FPSCR` register.*
### JUMP
If the jump condition is `true`, the JUMP instruction performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)` bytes (1-128 instructions forward).
### CALL
If the jump condition is `true`, the CALL instruction pushes the value of `pc` (program counter) onto the stack and then performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)` bytes (1-128 instructions forward).
### RET
If the stack is not empty, the RET instruction pops the return address from the stack (it's the instruction following the previous CALL) and jumps to it.
## Reference implementation
A portable C++ implementation of all integer and floating point instructions is available in [instructionsPortable.cpp](../src/instructionsPortable.cpp).

View File

@ -1,213 +1,181 @@
# RandomX instruction encoding
The instruction set was designed in such way that any random 16-byte word is a valid instruction and any sequence of valid instructions is a valid program. There are no syntax rules.
## RandomX instruction set The encoding of each 128-bit instruction word is following:
RandomX uses a simple low-level language (instruction set), which was designed so that any random bitstring forms a valid program.
Each RandomX instruction has a length of 128 bits. The encoding is following: ![Imgur](https://i.imgur.com/xi8zuAZ.png)
![Imgur](https://i.imgur.com/mbndESz.png) ## opcode
There are 256 opcodes, which are distributed between 3 groups of instructions. There are 31 distinct operations (each operation can be encoded using multiple opcodes - for example opcodes `0x00` to `0x0d` correspond to integer addition).
*All flags are aligned to an 8-bit boundary for easier decoding.* **Table 1: Instruction groups**
|group|# operations|# opcodes||
|---------|-----------------|----|-|
|integer (IA)|22|144|56.3%|
|floating point (FP)|5|76|29.7%|
|control (CL)|4|36|14.0%
||**31**|**256**|**100%**
#### Opcode Full description of all instructions: [isa-ops.md](isa-ops.md).
There are 256 opcodes, which are distributed between 30 instructions based on their weight (how often they will occur in the program on average). Instructions are divided into 5 groups:
|group|number of opcodes||comment| ## A.LOC
|---------|-----------------|----|------| **Table 2: `A.LOC` encoding**
|IA|115|44.9%|integer arithmetic operations
|IS|21|8.2%|bitwise shift and rotate
|FA|70|27.4%|floating point arithmetic operations
|FS|8|3.1%|floating point single-input operations
|CF|42|16.4%|control flow instructions (branches)
||**256**|**100%**
#### Operand A |bits|description|
The first 64-bit operand is read from memory. The location is determined by the `loc(a)` flag: |----|--------|
|0-1|`A.LOC.W` flag|
|2-5|Reserved|
|6-7|`A.LOC.X` flag|
|loc(a)[2:0]|read A from|address size (W) The `A.LOC.W` flag determines the address width when reading operand A from the scratchpad:
**Table 3: Operand A read address width**
|`A.LOC.W`|address width (W)
|---------|-|-| |---------|-|-|
|000|dataset|32 bits| |0|15 bits (256 KiB)|
|001|dataset|32 bits| |1-3|11 bits (16 KiB)|
|010|dataset|32 bits|
|011|dataset|32 bits|
|100|scratchpad|15 bits|
|101|scratchpad|11 bits|
|110|scratchpad|11 bits|
|111|scratchpad|11 bits|
Flag `reg(a)` encodes an integer register `r0`-`r7`. The read address is calculated as: If the `A.LOC.W` flag is zero, the address space covers the whole 256 KiB scratchpad. Otherwise, just the first 16 KiB of the scratchpad are addressed.
If the `A.LOC.X` flag is zero, the instruction mixes the scratchpad read address into the `mx` register using XOR. This mixing happens before the address is truncated to W bits (see pseudocode below).
## A.REG
**Table 4: `A.REG` encoding**
|bits|description|
|----|--------|
|0-2|`A.REG.R` flag|
|3-7|Reserved|
The `A.REG.R` flag encodes "readAddressRegister", which is an integer register `r0`-`r7` to be used for scratchpad read address generation. Read address is generated as follows (pseudocode):
```python
readAddressRegister = IntegerRegister(A.REG.R)
readAddressRegister = readAddressRegister XOR SignExtend(A.mask32)
readAddress = readAddressRegister[31:0]
# dataset is read if the ic register is divisible by 64
IF ic mod 64 == 0:
DatasetRead(readAddress)
# optional mixing into the mx register
IF A.LOC.X == 0:
mx = mx XOR readAddress
# truncate to W bits
W = GetAddressWidth(A.LOC.W)
readAddress = readAddress[W-1:0]
``` ```
reg(a) = reg(a) XOR signExtend(addr(a))
read_addr = reg(a)[W-1:0] Note that the value of the read address register is modified during address generation.
## B.LOC
**Table 5: `B.LOC` encoding**
|bits|description|
|----|--------|
|0-1|`B.LOC.L` flag|
|0-2|`B.LOC.C` flag|
|3-7|Reserved|
The `B.LOC.L` flag determines the B operand. It can be either a register or immediate value.
**Table 6: Operand B**
|`B.LOC.L`|IA/DIV|IA/SHIFT|IA/MATH|FP|CL|
|----|--------|----|------|----|---|
|0|register|register|register|register|register|
|1|`imm32`|register|register|register|register|
|2|`imm32`|`imm8`|register|register|register|
|3|`imm32`|`imm8`|`imm32`|register|register|
Integer instructions are split into 3 classes: integer division (IA/DIV), shift and rotate (IA/SHIFT) and other (IA/MATH). Floating point (FP) and control (CL) instructions always use a register operand.
Register to be used as operand B is encoded in the `B.REG.R` flag (see below).
The `B.LOC.C` flag determines the condition for the JUMP and CALL instructions. The flag partially overlaps with the `B.LOC.L` flag.
## B.REG
**Table 7: `B.REG` encoding**
|bits|description|
|----|--------|
|0-2|`B.REG.R` flag|
|3-7|Reserved|
Register encoded by the `B.REG.R` depends on the instruction group:
**Table 8: Register operands by group**
|group|registers|
|----|--------|
|IA|`r0`-`r7`|
|FP|`f0`-`f7`|
|CL|`r0`-`r7`|
## C.LOC
**Table 9: `C.LOC` encoding**
|bits|description|
|----|--------|
|0-1|`C.LOC.W` flag|
|2|`C.LOC.R` flag|
|3-6|Reserved|
|7|`C.LOC.H` flag|
The `C.LOC.W` flag determines the address width when writing operand C to the scratchpad:
**Table 10: Operand C write address width**
|`C.LOC.W`|address width (W)
|---------|-|-|
|0|15 bits (256 KiB)|
|1-3|11 bits (16 KiB)|
If the `C.LOC.W` flag is zero, the address space covers the whole 256 KiB scratchpad. Otherwise, just the first 16 KiB of the scratchpad are addressed.
The `C.LOC.R` determines the destination where operand C is written:
**Table 11: Operand C destination**
|`C.LOC.R`|groups IA, CL|group FP
|---------|-|-|
|0|scratchpad|register
|1|register|register + scratchpad
Integer and control instructions (groups IA and CL) write either to the scratchpad or to a register. Floating point instructions always write to a register and can also write to the scratchpad. In that case, flag `C.LOC.H` determines if the low or high half of the register is written:
**Table 12: Floating point register write**
|`C.LOC.H`|write bits|
|---------|----------|
|0|0-63|
|1|64-127|
## C.REG
**Table 13: `C.REG` encoding**
|bits|description|
|----|--------|
|0-2|`C.REG.R` flag|
|3-7|Reserved|
The destination register encoded in the `C.REG.R` flag encodes both the write address register (if writing to the scratchpad) and the destination register (if writing to a register). The destination register depends on the instruction group (see Table 8). Write address is always generated from an integer register:
```python
writeAddressRegister = IntegerRegister(C.REG.R)
writeAddress = writeAddressRegister[31:0] XOR C.mask32
# truncate to W bits
W = GetAddressWidth(C.LOC.W)
writeAddress = writeAddress [W-1:0]
``` ```
`W` is the address width from the above table. For reading from the scratchpad, `read_addr` is multiplied by 8 for 8-byte aligned access.
#### Operand B ## imm8
The second operand is loaded either from a register or from an immediate value encoded within the instruction. The `reg(b)` flag encodes an integer register (instruction groups IA and IS) or a floating point register (instruction group FA). Instruction group FS doesn't use operand B. `imm8` is an 8-bit immediate value that is used as the B operand by IA/SHIFT instructions (see Table 6). Additionally, it's used by some control instructions.
|loc(b)[2:0]|B (IA)|B (IS)|B (FA)|B (FS) ## A.mask32
|---------|-|-|-|-| `A.mask32` is a 32-bit address mask that is used to calculate the read address for the A operand. It's sign-extended to 64 bits before use.
|000|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|001|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|010|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|011|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|100|integer `reg(b)`|`imm8`|floating point `reg(b)`|-
|101|integer `reg(b)`|`imm8`|floating point `reg(b)`|-
|110|`imm32`|`imm8`|floating point `reg(b)`|-
|111|`imm32`|`imm8`|floating point `reg(b)`|-
`imm8` is an 8-bit immediate value, which is used for shift and rotate integer instructions (group IS). Only bits 0-5 are used. ## imm32
`imm32` is a 32-bit immediate value which is used for integer instructions from groups IA/DIV and IA/OTHER (see Table 6). The immediate value is sign-extended for instructions that expect 64-bit operands.
`imm32` is a 32-bit immediate value which is used for integer instructions from group IA. ## C.mask32
`C.mask32` is a 32-bit address mask that is used to calculate the write address for the C operand. `C.mask32` is equal to `imm32`.
Floating point instructions don't use immediate values.
#### Operand C
The third operand is the location where the result is stored. It can be a register or a 64-bit scratchpad location, depending on the value of flag `loc(c)`.
|loc\(c\)[2:0]|address size (W)| C (IA, IS)|C (FA, FS)
|---------|-|-|-|-|-|
|000|15 bits|scratchpad|floating point `reg(c)`
|001|11 bits|scratchpad|floating point `reg(c)`
|010|11 bits|scratchpad|floating point `reg(c)`
|011|11 bits|scratchpad|floating point `reg(c)`
|100|15 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|101|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|110|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|111|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
Integer operations write either to the scratchpad or to a register. Floating point operations always write to a register and can also write to the scratchpad. In that case, bit 3 of the `loc(c)` flag determines if the low or high half of the register is written:
|loc\(c\)[3]|write to scratchpad|
|------------|-----------------------|
|0|floating point `reg(c)[63:0]`
|1|floating point `reg(c)[127:64]`
The FPROUND instruction is an exception and always writes the low half of the register.
For writing to the scratchpad, an integer register is always used to calculate the address:
```
write_addr = 8 * (addr(c) XOR reg(c)[31:0])[W-1:0]
```
*CPUs are typically designed for a 2:1 load:store ratio, so each VM instruction performs on average 1 memory read and 0.5 writes to memory.*
#### imm8
An 8-bit immediate value that is used as the shift/rotate count by group IS instructions and as the jump offset of the CALL instruction.
#### addr(a)
A 32-bit address mask that is used to calculate the read address for the A operand. It's sign-extended to 64 bits.
#### addr\(c\)
A 32-bit address mask that is used to calculate the write address for the C operand. `addr(c)` is equal to `imm32`.
### ALU instructions
|weight|instruction|group|signed|A width|B width|C|C width|
|-|-|-|-|-|-|-|-|
|10|ADD_64|IA|no|64|64|`A + B`|64|
|2|ADD_32|IA|no|32|32|`A + B`|32|
|10|SUB_64|IA|no|64|64|`A - B`|64|
|2|SUB_32|IA|no|32|32|`A - B`|32|
|21|MUL_64|IA|no|64|64|`A * B`|64|
|10|MULH_64|IA|no|64|64|`A * B`|64|
|15|MUL_32|IA|no|32|32|`A * B`|64|
|15|IMUL_32|IA|yes|32|32|`A * B`|64|
|10|IMULH_64|IA|yes|64|64|`A * B`|64|
|1|DIV_64|IA|no|64|32|`A / B`|32|
|1|IDIV_64|IA|yes|64|32|`A / B`|32|
|4|AND_64|IA|no|64|64|`A & B`|64|
|2|AND_32|IA|no|32|32|`A & B`|32|
|4|OR_64|IA|no|64|64|`A | B`|64|
|2|OR_32|IA|no|32|32|`A | B`|32|
|4|XOR_64|IA|no|64|64|`A ^ B`|64|
|2|XOR_32|IA|no|32|32|`A ^ B`|32|
|3|SHL_64|IS|no|64|6|`A << B`|64|
|3|SHR_64|IS|no|64|6|`A >> B`|64|
|3|SAR_64|IS|yes|64|6|`A >> B`|64|
|6|ROL_64|IS|no|64|6|`A <<< B`|64|
|6|ROR_64|IS|no|64|6|`A >>> B`|64|
##### 32-bit operations
Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are set to zero.
##### Multiplication
There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result.
##### Division
For the division instructions, the dividend is 64 bits long and the divisor 32 bits long. The IDIV_64 instruction interprets both operands as signed integers. In case of division by zero or signed overflow, the result is equal to the dividend `A`.
*Division by zero can be handled without branching by a conditional move. Signed overflow happens only for the signed variant when the minimum negative value is divided by -1. This rare case must be handled in x86 (ARM produces the "correct" result).*
##### Shift and rotate
The shift/rotate instructions use just the bottom 6 bits of the `B` operand (`imm8` is used as the immediate value). All treat `A` as unsigned except SAR_64, which performs an arithmetic right shift by copying the sign bit.
### FPU instructions
|weight|instruction|group|C|
|-|-|-|-|
|20|FPADD|FA|`A + B`|
|20|FPSUB|FA|`A - B`|
|22|FPMUL|FA|`A * B`|
|8|FPDIV|FA|`A / B`|
|6|FPSQRT|FS|`sqrt(abs(A))`|
|2|FPROUND|FS|`convertSigned52(A)`|
All floating point instructions apart FPROUND are vector instructions that operate on two packed double precision floating point values.
#### Conversion of operand A
Operand A is loaded from memory as a 64-bit value. All floating point instructions apart FPROUND interpret A as two packed 32-bit signed integers and convert them into two packed double precision floating point values.
The FPROUND instruction has a scalar output and interprets A as a 64-bit signed integer. The 11 least-significant bits are cleared before conversion to a double precision format. This is done so the number fits exactly into the 52-bit mantissa without rounding. Output of FPROUND is always written into the lower half of the result register and only this lower half may be written into the scratchpad.
#### Rounding
FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` instruction. Denormal values must be flushed to zero.
#### NaN
If an operation produces NaN, the result is converted into positive zero. NaN results may never be written into registers or memory. Only division and multiplication must be checked for NaN results (`0.0 / 0.0` and `0.0 * Infinity` result in NaN).
##### FPROUND
The FPROUND instruction changes the rounding mode for all subsequent FPU operations depending on the two least-significant bits of A.
|A[1:0]|rounding mode|
|-------|------------|
|00|roundTiesToEven|
|01|roundTowardNegative|
|10|roundTowardPositive|
|11|roundTowardZero|
The rounding modes are defined by the IEEE-754 standard.
*The two-bit flag value exactly corresponds to bits 13-14 of the x86 `MXCSR` register and bits 23 and 22 (reversed) of the ARM `FPSCR` register.*
### Control instructions
The following 2 control instructions are supported:
|weight|instruction|function|condition|
|-|-|-|-|
|20|CALL|near procedure call|(see condition table below)
|22|RET|return from procedure|stack is not empty
Both instructions are conditional. If the condition evaluates to `false`, CALL and RET behave as "arithmetic no-op" and simply copy operand A into destination C without jumping.
##### CALL
The CALL instruction uses a condition function, which takes the lower 32 bits of integer register `reg(b)` and the value `imm32` and evaluates a condition based on the `loc(b)` flag:
|loc(b)[2:0]|signed|jump condition|probability|*x86*|*ARM*
|---|---|----------|-----|--|----|
|000|no|`reg(b)[31:0] <= imm32`|0% - 100%|`JBE`|`BLS`
|001|no|`reg(b)[31:0] > imm32`|0% - 100%|`JA`|`BHI`
|010|yes|`reg(b)[31:0] - imm32 < 0`|50%|`JS`|`BMI`
|011|yes|`reg(b)[31:0] - imm32 >= 0`|50%|`JNS`|`BPL`
|100|yes|`reg(b)[31:0] - imm32` overflows|0% - 50%|`JO`|`BVS`
|101|yes|`reg(b)[31:0] - imm32` doesn't overflow|50% - 100%|`JNO`|`BVC`
|110|yes|`reg(b)[31:0] < imm32`|0% - 100%|`JL`|`BLT`
|111|yes|`reg(b)[31:0] >= imm32`|0% - 100%|`JGE`|`BGE`
The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected jump probability (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).*
Taken CALL instruction pushes the values `A` and `pc` (program counter) onto the stack and then performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)`. Maximum jump distance is therefore 128 instructions forward (this means that at least 4 correctly spaced CALL instructions are needed to form a loop in the program).
##### RET
The RET instruction is taken only if the stack is not empty. Taken RET instruction pops the return address `raddr` from the stack (it's the instruction following the previous CALL), then pops a return value `retval` from the stack and sets `C = A XOR retval`. Finally, the instruction jumps back to `raddr`.
## Reference implementation
A portable C++ implementation of all ALU and FPU instructions is available in [instructionsPortable.cpp](../src/instructionsPortable.cpp).