Updated readme

This commit is contained in:
tevador 2019-05-03 22:09:52 +02:00
parent b62b1f8717
commit 197cd90e07
2 changed files with 52 additions and 46 deletions

View File

@ -4,88 +4,94 @@ RandomX is a proof-of-work (PoW) algorithm that is optimized for general-purpose
* Prevent the development of a single-chip [ASIC](https://en.wikipedia.org/wiki/Application-specific_integrated_circuit) * Prevent the development of a single-chip [ASIC](https://en.wikipedia.org/wiki/Application-specific_integrated_circuit)
* Minimize the efficiency advantage of specialized hardware compared to a general-purpose CPU * Minimize the efficiency advantage of specialized hardware compared to a general-purpose CPU
## Specification ## Overview
RandomX behaves like a keyed hashing function: it accepts a key `K` and arbitrary input `H` and produces a 256-bit result `R`. Under the hood, RandomX utilizes a virtual machine that executes programs in a special instruction set that consists of a mix of integer math, floating point math and branches. These programs can be translated into the CPU's native machine code on the fly. Example of a RandomX program translated into x86-64 assembly is [program.asm](doc/program.asm). A portable interpreter mode is also provided.
RandomX can operate in two modes:
* **Fast mode** - requires 2080 MiB of shared memory.
* **Light mode** - requires only 256 MiB of shared memory, but runs significantly slower and uses more power per hash.
## Documentation
Full specification available in [specs.md](doc/specs.md). Full specification available in [specs.md](doc/specs.md).
## Design
Design notes available in [design.md](doc/design.md). Design notes available in [design.md](doc/design.md).
## Build ## Build
Build using `make`. Requires a C++11 compliant compiler. There are no dependencies. RandomX is written in C++11 and builds a static library with a C API provided by header file [randomx.h](src/randomx.h). Minimal API usage example is provided in [api-example1.c](src/tests/api-example1.c). The reference code includes a `benchmark` executable for testing.
Precompiled test binaries are available on the [Releases page](https://github.com/tevador/RandomX/releases). ### Ubuntu/Debian
## Usage Build dependencies: `make` and `gcc` (minimum version 4.8, but version 7+ is recommended).
``` Build using the provided makefile.
Usage: randomx [OPTIONS]
Supported options:
--help shows this message
--mine mining mode: 2 GiB, x86-64 JIT compiled VM
--verify verification mode: 256 MiB
--jit x86-64 JIT compiled verification mode (default: interpreter)
--largePages use large pages
--softAes use software AES (default: x86 AES-NI)
--threads T use T threads (default: 1)
--init Q initialize dataset with Q threads (default: 1)
--nonces N run N nonces (default: 1000)
--genAsm generate x86-64 asm code for nonce N
--genNative generate RandomX code for nonce N
```
### Mining mode ### Windows
Mining mode requires >2 GiB of RAM and optimal performance should be obtained with at least 16 KiB of L1 cache, 256 KiB of L2 cache and 2 MiB of L3 cache per mining thread.
The reference miner supports only x86 64-bit CPUs at the moment. [AES-NI](https://en.wikipedia.org/wiki/AES_instruction_set) support is not required, but using the `--softAes` option reduces mining performance by about 40%. Build dependencies: Visual Studio 2017.
It is recommended to use [large pages](https://en.wikipedia.org/wiki/Page_(computer_memory)#Multiple_page_sizes) with the `--largePages` option. Using the default page size can reduce performance by up to 50% due to [TLB thrashing](https://en.wikipedia.org/wiki/Thrashing_(computer_science)#TLB_thrashing). A solution file is provided.
[NUMA](https://en.wikipedia.org/wiki/Non-uniform_memory_access) systems should run one instance of RandomX per NUMA node. ### Precompiled binaries
### Light mode Precompiled `benchmark` binaries are available on the [Releases page](https://github.com/tevador/RandomX/releases).
Verification is done in the 'light' mode, which requires only 256 MiB of memory, but runs much slower than the mining mode. Use the `--jit` option on x86-64 CPUs for maximum verification performance. ## Proof of work
RandomX was primarily designed as a PoW algorithm for [Monero](https://www.getmonero.org/). The recommended usage is following:
* The key `K` is selected to be the hash of a block in the blockchain - this block is called the 'key block'. For optimal mining and verification performance, the key should change every 2048 blocks (~2.8 days) and there should be a delay of 64 blocks (~2 hours) between the key block and the change of the key `K`. This can be achieved by changing the key when `blockHeight % 2048 == 64` and selecting key block such that `keyBlockHeight % 2048 == 0`.
* The input `H` is the standard hashing blob.
### Performance ### Performance
Preliminary performance using the optimal number of threads and large pages (if possible): Preliminary performance of selected CPUs using the optimal number of threads (T) and large pages (if possible), in hashes per second (H/s):
|CPU|RAM|OS|AES|RandomX (mining)|RandomX (light)| |CPU|RAM|OS|AES|Fast mode|Light mode|
|---|---|--|---|---------|--------------| |---|---|--|---|---------|--------------|
AMD Ryzen 7 1700|16 GB DDR4|Ubuntu 16.04|HW|4250 H/s (8T)|640 H/s (16T)| AMD Ryzen 7 1700|16 GB DDR4|Ubuntu 16.04|hardware|4080 H/s (8T)|620 H/s (16T)|
Intel Core i7-8550U|16 GB DDR4|Windows 10|HW|1660 H/s (4T)|128 H/s (4T)| Intel Core i7-8550U|16 GB DDR4|Windows 10|hardware|1700 H/s (4T)|350 H/s (8T)|
Intel Core i3-3220|2 GB DDR3|Ubuntu 16.04|software|-|187 H/s (4T)| Intel Core i3-3220|2 GB DDR3|Ubuntu 16.04|software|-|120 H/s (4T)|
Raspberry Pi 3|1 GB DDR2|Ubuntu 16.04|software|-|12.3 H/s (4T)| Raspberry Pi 3|1 GB DDR2|Ubuntu 16.04|software|-|2.0 H/s (4T) †|
† Using the interpreter mode. Compiled mode is expected to increase performance by a factor of 10.
# FAQ # FAQ
### Can RandomX run on a GPU? ### Can RandomX run on a GPU?
RandomX was designed to be efficient on CPUs. Designing an algorithm compatible with both CPUs and GPUs brings too many limitations and ultimately decreases ASIC resistance. RandomX was designed to be efficient on CPUs. Designing an algorithm compatible with both CPUs and GPUs brings many limitations and ultimately decreases ASIC resistance.
GPUs are expected to be at a disadvantage when running RandomX, but the exact performance has not been determined yet due to lack of a working GPU implementation. GPUs are expected to be at a disadvantage when running RandomX, but the exact performance has not been determined yet due to lack of a working GPU implementation.
A rough estimate for AMD Vega 56 GPU gave an upper limit of 1200 H/s, comparable to a quad core CPU (details in issue [#24](https://github.com/tevador/RandomX/issues/24)). A rough estimate for AMD Vega 56 GPU gave an upper limit of 1200 H/s, comparable to a quad core CPU (details in issue [#24](https://github.com/tevador/RandomX/issues/24)).
### Does RandomX facilitate botnets/malware mining or web mining? ### Does RandomX facilitate botnets/malware mining or web mining?
Quite the opposite. Efficient mining requires 2 GiB of memory, which is difficult to hide in an infected computer and disqualifies many low-end machines such as IoT devices. Web mining is nearly impossible due to the large memory requirements and low performance in interpreted mode. Efficient mining requires more than 2 GiB of memory, which is difficult to hide in an infected computer and disqualifies many low-end machines such as IoT devices. Web mining is nearly impossible due to the large memory requirement and low performance in interpreted mode.
### Since RandomX uses floating point calculations, how can it give reproducible results on different platforms? ### Since RandomX uses floating point math, does it give reproducible results on different platforms?
RandomX uses only operations that are guaranteed to give correctly rounded results by the [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) standard: addition, subtraction, multiplication, division and square root. Special care is taken to avoid corner cases such as NaN values or denormals. RandomX uses only operations that are guaranteed to give correctly rounded results by the [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) standard: addition, subtraction, multiplication, division and square root. Special care is taken to avoid corner cases such as NaN values or denormals.
The reference implementation has been validated on the following platforms:
* x86+SSE2 (32-bit, little-endian)
* x86-64 (64-bit, little-endian)
* ARMv7+NEON (32-bit, little-endian)
* ARMv8 (64-bit, little-endian)
* PPC64 (64-bit, big-endian)
## Acknowledgements ## Acknowledgements
The following people have contributed to the design of RandomX: * [SChernykh](https://github.com/SChernykh) - contributed significantly to the design of RandomX
* [SChernykh](https://github.com/SChernykh) * [hyc](https://github.com/hyc) - original idea of using random code execution for PoW
* [hyc](https://github.com/hyc) * [nioroso-x3](https://github.com/nioroso-x3) - provided access to PowerPC for testing purposes
RandomX uses some source code from the following 3rd party repositories: RandomX uses some source code from the following 3rd party repositories:
* Argon2d, Blake2b hashing functions: https://github.com/P-H-C/phc-winner-argon2 * Argon2d, Blake2b hashing functions: https://github.com/P-H-C/phc-winner-argon2
## Donations ## Donations
XMR: XMR (tevador):
``` ```
845xHUh5GvfHwc2R8DVJCE7BT2sd4YEcmjG8GNSdmeNsP5DTEjXd1CNgxTcjHjiFuthRHAoVEJjM7GyKzQKLJtbd56xbh7V 845xHUh5GvfHwc2R8DVJCE7BT2sd4YEcmjG8GNSdmeNsP5DTEjXd1CNgxTcjHjiFuthRHAoVEJjM7GyKzQKLJtbd56xbh7V
``` ```

View File

@ -622,20 +622,20 @@ Whenever a register is selected as the operand of a CBRANCH instruction, its `co
The CBRANCH instruction performs the following steps: The CBRANCH instruction performs the following steps:
1. A constant `b` is calculated as `mod.cond + RANDOMX_JUMP_OFFSET`. 1. A constant `b` is calculated as `mod.cond + RANDOMX_JUMP_OFFSET`.
1. A constant `conditionImmediate` is constructed as sign-extended `imm32` with bit `b` set to 1 and bit `b-1` set to 0 (if `b > 0`). 1. A constant `cimm` is constructed as sign-extended `imm32` with bit `b` set to 1 and bit `b-1` set to 0 (if `b > 0`).
1. `conditionImmediate` is added to `creg`. 1. `cimm` is added to `creg`.
1. If bits `b` to `b + RANDOMX_JUMP_BITS - 1` of `creg` are zero, execution jumps to instruction `creg.lastUsed + 1` (the instruction following the instruction where `creg` was last modified). 1. If bits `b` to `b + RANDOMX_JUMP_BITS - 1` of `creg` are zero, execution jumps to instruction `creg.lastUsed + 1` (the instruction following the instruction where `creg` was last modified).
Bits in immediate and register values are numbered from 0 to 63 with 0 being the least significant bit. For example, for `b = 10` and `RANDOMX_JUMP_BITS = 8`, the bits are arranged like this: Bits in immediate and register values are numbered from 0 to 63 with 0 being the least significant bit. For example, for `b = 10` and `RANDOMX_JUMP_BITS = 8`, the bits are arranged like this:
``` ```
conditionImmediate = SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSMMMMMMMMMMMMMMMMMMMMM10MMMMMMMMM cimm = SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSMMMMMMMMMMMMMMMMMMMMM10MMMMMMMMM
creg = ..............................................XXXXXXXX.......... creg = ..............................................XXXXXXXX..........
``` ```
`S` is a copied sign bit from `imm32`. `M` denotes bits of `imm32`. The 9th bit is set to 0 and the 10th bit is set to 1. This value would be added to `creg`. `S` is a copied sign bit from `imm32`. `M` denotes bits of `imm32`. The 9th bit is set to 0 and the 10th bit is set to 1. This value would be added to `creg`.
The second line uses `X` to mark bits of `creg` that would be checked by the condition. If all these bits are 0 after adding `conditionImmediate`, the jump is executed. The second line uses `X` to mark bits of `creg` that would be checked by the condition. If all these bits are 0 after adding `cimm`, the jump is executed.
The construction of the CBRANCH instruction ensures that no inifinite loops are possible in the program. The construction of the CBRANCH instruction ensures that no inifinite loops are possible in the program.