From 3daceace48727fcb0da786951f9054e830437fb1 Mon Sep 17 00:00:00 2001 From: tevador Date: Mon, 10 Jun 2019 16:36:55 +0200 Subject: [PATCH] Clarifications in the documentation --- README.md | 10 +++++----- doc/design.md | 6 +++--- doc/specs.md | 40 +++++++++++++++++++++++++++++++++------- 3 files changed, 41 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 9a6f251..2563e3e 100644 --- a/README.md +++ b/README.md @@ -3,14 +3,14 @@ RandomX is a proof-of-work (PoW) algorithm that is optimized for general-purpose ## Overview -RandomX behaves like a keyed hashing function: it accepts a key `K` and arbitrary input `H` and produces a 256-bit result `R`. Under the hood, RandomX utilizes a virtual machine that executes programs in a special instruction set that consists of a mix of integer math, floating point math and branches. These programs can be translated into the CPU's native machine code on the fly. Example of a RandomX program translated into x86-64 assembly is [program.asm](doc/program.asm). A portable interpreter mode is also provided. +RandomX utilizes a virtual machine that executes programs in a special instruction set that consists of integer math, floating point math and branches. These programs can be translated into the CPU's native machine code on the fly (example: [program.asm](doc/program.asm)). At the end, the outputs of the executed programs are consolidated into a 256-bit result using a cryptographic hashing function ([Blake2b](https://blake2.net/)). RandomX can operate in two main modes with different memory requirements: * **Fast mode** - requires 2080 MiB of shared memory. * **Light mode** - requires only 256 MiB of shared memory, but runs significantly slower -Both modes are interchangeable as they give the same results. The fast mode is suitable for mining, while the light mode is expected to be used only for proof verification. +Both modes are interchangeable as they give the same results. The fast mode is suitable for "mining", while the light mode is expected to be used only for proof verification. ## Documentation @@ -43,11 +43,11 @@ Precompiled `benchmark` binaries are available on the [Releases page](https://gi RandomX was primarily designed as a PoW algorithm for [Monero](https://www.getmonero.org/). The recommended usage is following: * The key `K` is selected to be the hash of a block in the blockchain - this block is called the 'key block'. For optimal mining and verification performance, the key should change every 2048 blocks (~2.8 days) and there should be a delay of 64 blocks (~2 hours) between the key block and the change of the key `K`. This can be achieved by changing the key when `blockHeight % 2048 == 64` and selecting key block such that `keyBlockHeight % 2048 == 0`. -* The input `H` is the standard hashing blob. +* The input `H` is the standard hashing blob with a selected nonce value. If you wish to use RandomX as a PoW algorithm for your cryptocurrency, we strongly recommend not using the [default parameters](src/configuration.h) to avoid compatibility with Monero. -### CPU mining performance +### CPU performance Preliminary performance of selected CPUs using the optimal number of threads (T) and large pages (if possible), in hashes per second (H/s): |CPU|RAM|OS|AES|Fast mode|Light mode| @@ -59,7 +59,7 @@ Raspberry Pi 3|1 GB DDR2|Ubuntu 16.04|software|-|2.0 H/s (4T) †| † Using the interpreter mode. Compiled mode is expected to increase performance by a factor of 10. -### GPU mining performance +### GPU performance SChernykh is developing GPU mining code for RandomX. Benchmarks are included in the following repositories: diff --git a/doc/design.md b/doc/design.md index a007127..14aa7b8 100644 --- a/doc/design.md +++ b/doc/design.md @@ -9,7 +9,7 @@ There are two distinct classes of general processing devices: central processing ## 1. Design considerations -The most basic idea of a CPU-bound proof of work is that the work must be dynamic. This takes advantage of the fact that CPUs accept two kinds of inputs: *data* (the main input) and *code* (which specifies what to perform with the data). +The most basic idea of a CPU-bound proof of work is that the "work" must be dynamic. This takes advantage of the fact that CPUs accept two kinds of inputs: *data* (the main input) and *code* (which specifies what to perform with the data). Conversely, typical cryptographic hashing functions [[3](https://en.wikipedia.org/wiki/Cryptographic_hash_function)] do not represent suitable work for the CPU because their only input is *data*, while the sequence of operations is fixed and can be performed more efficiently by a specialized integrated circuit. @@ -147,7 +147,7 @@ The VM uses 8 integer registers and 12 floating point registers. This is the max ### 2.4 Integer operations -RandomX uses all primitive integer operations that preserve entropy: addition (IADD_RS, IADD_M), subtraction (ISUB_R, ISUB_M, INEG_R), multiplication (IMUL_R, IMUL_M, IMULH_R, IMULH_M, ISMULH_R, ISMULH_M, IMUL_RCP), exclusive or (IXOR_R, IXOR_M) and rotation (IROR_R, IROL_R). +RandomX uses all primitive integer operations that have high output entropy: addition (IADD_RS, IADD_M), subtraction (ISUB_R, ISUB_M, INEG_R), multiplication (IMUL_R, IMUL_M, IMULH_R, IMULH_M, ISMULH_R, ISMULH_M, IMUL_RCP), exclusive or (IXOR_R, IXOR_M) and rotation (IROR_R, IROL_R). #### 2.4.1 IADD_RS @@ -466,7 +466,7 @@ The following figure shows the sensitivity of SuperscalarHash to changing a sing This shows that SuperscalaHash has quite low sensitivity to high-order bits and somewhat decreased sensitivity to the lowest-order bits. Sensitivity is highest for bits 3-53 (inclusive). -When calculating a Dataset item, the input of the first SuperscalarHash depends only on the item number. To ensure a good distribution of results, the initial set of register values must have unique values of bits 3-53 for *all* item numbers in the range 0-34078718 (the Dataset contains 34078719 items). The constants described in section 7.3 of the Specification were chosen to meet this requirement. All initial register values for all Dataset item numbers were checked to make sure bits 3-53 of each register are unique and there are no collisions (source code: [superscalar-init.cpp](../src/tests/superscalar-init.cpp)). +When calculating a Dataset item, the input of the first SuperscalarHash depends only on the item number. To ensure a good distribution of results, the constants described in section 7.3 of the Specification were chosen to provide unique values of bits 3-53 for *all* item numbers in the range 0-34078718 (the Dataset contains 34078719 items). All initial register values for all Dataset item numbers were checked to make sure bits 3-53 of each register are unique and there are no collisions (source code: [superscalar-init.cpp](../src/tests/superscalar-init.cpp)). While this is not strictly necessary to get unique output from SuperscalarHash, it's a security precaution that mitigates the non-perfect avalanche properties of the randomly generated SuperscalarHash instances. ## References diff --git a/doc/specs.md b/doc/specs.md index 7872cd8..c59037a 100644 --- a/doc/specs.md +++ b/doc/specs.md @@ -88,21 +88,47 @@ and outputs a 256-bit result `R`. The algorithm consists of the following steps: -1. The Dataset is initialized using the key value `K` (see chapter 7 for details). +1. The Dataset is initialized using the key value `K` (described in chapter 7). 1. 64-byte seed `S` is calculated as `S = Hash512(H)`. 1. Let `gen1 = AesGenerator1R(S)`. 1. The Scratchpad is filled with `RANDOMX_SCRATCHPAD_L3` random bytes using generator `gen1`. 1. Let `gen4 = AesGenerator4R(gen1.state)` (use the final state of `gen1`). -1. The value of the VM register `fprc` is set to 0 (default rounding mode - see chapter 4.3). -1. The VM is programmed using `128 + 8 * RANDOMX_PROGRAM_SIZE` random bytes using generator `gen4` (see chapter 4.5). -1. The VM is executed (see chapter 4.6). -1. New 64-byte seed is calculated as `S = Hash512(RegisterFile)`. +1. The value of the VM register `fprc` is set to 0 (default rounding mode - chapter 4.3). +1. The VM is programmed using `128 + 8 * RANDOMX_PROGRAM_SIZE` random bytes using generator `gen4` (chapter 4.5). +1. The VM is executed (chapter 4.6). +1. A new 64-byte seed is calculated as `S = Hash512(RegisterFile)`. 1. Set `gen4.state = S` (modify the state of the generator). 1. Steps 7-10 are performed a total of `RANDOMX_PROGRAM_COUNT` times. The last iteration skips steps 9 and 10. 1. Scratchpad fingerprint is calculated as `A = AesHash1R(Scratchpad)`. -1. The binary values of the VM registers `a0`-`a3` (4×16 bytes) are set to the value of `A`. +1. Bytes 192-255 of the Register File are set to the value of `A`. 1. Result is calculated as `R = Hash256(RegisterFile)`. +The input of the `Hash512` function in step 9 is the following 256 bytes: +``` + +---------------------------------+ + | registers r0-r7 | (64 bytes) + +---------------------------------+ + | registers f0-f3 | (64 bytes) + +---------------------------------+ + | registers e0-e3 | (64 bytes) + +---------------------------------+ + | registers a0-a3 | (64 bytes) + +---------------------------------+ +``` + +The input of the `Hash256` function in step 14 is the following 256 bytes: +``` + +---------------------------------+ + | registers r0-r7 | (64 bytes) + +---------------------------------+ + | registers f0-f3 | (64 bytes) + +---------------------------------+ + | registers e0-e3 | (64 bytes) + +---------------------------------+ + | AesHash1R(Scratchpad) | (64 bytes) + +---------------------------------+ +``` + ## 3 Custom functions ### 3.1 Definitions @@ -909,5 +935,5 @@ The item data is represented by 8 64-bit integer registers: `r0`-`r7`. The constants used to initialize register values in step 1 were determined as follows: * Multiplier `6364136223846793005` was selected because it gives an excellent distribution for linear generators (D. Knuth: The Art of Computer Programming – Vol 2., also listed in [Commonly used LCG parameters](https://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use)) -* XOR constants used to initialize registers `r1`-`r7` were determined by calculating a 512-bit Blake2b hash of the ASCII value `RandomX SuperScalarHash initialize` and taking bytes 8-63 as 7 little-endian unsigned 64-bit integers. Additionally, the constant for `r1` was increased by 233+700 and the constant for `r3` was increased by 214 (these changes are necessary to ensure that all registers have unique initial values for all values of `itemNumber`). +* XOR constants used to initialize registers `r1`-`r7` were determined by calculating `Hash512` of the ASCII value `"RandomX SuperScalarHash initialize"` and taking bytes 8-63 as 7 little-endian unsigned 64-bit integers. Additionally, the constant for `r1` was increased by 233+700 and the constant for `r3` was increased by 214 (these changes are necessary to ensure that all registers have unique initial values for all values of `itemNumber`).