Minimalistic JIT code generator for random math sequence in CryptonightR.
Usage:
- Allocate writable and executable memory
- Call v4_generate_JIT_code with "buf" pointed to memory allocated on the previous step
- Call the generated code instead of "v4_random_math(code, r)", omit the "code" parameter
NetBSD emits:
warning: Warning: reference to the libc supplied alloca(3); this most likely will not work. Please use the compiler provided version of alloca(3), by supplying the appropriate compiler flags (e.g. not -std=c89).
and man 3 alloca says:
Normally, gcc(1) translates calls to alloca() with inlined code. This is not done when either the -ansi, -std=c89, -std=c99, or the
-std=c11 option is given and the header <alloca.h> is not included. Otherwise, (without an -ansi or -std=c* option) the glibc version of
<stdlib.h> includes <alloca.h> and that contains the lines:
#ifdef __GNUC__
#define alloca(size) __builtin_alloca (size)
#endif
It looks like alloca is a bad idea in modern C/C++, so we use
VLAs for C and std::vector for C++.
- These functions are declared twice in slow-hash.c. Remove one of the copies.
- The declarations have the wrong return type, should be void, not int.
Function definitions here: 1e74586ee9/src/crypto/aesb.c (L151-L180)
Test plan: make release-test
9137ad2c blockchain: add a testnet v9 a day after v8 (moneromooo-monero)
ac4f71c2 wallet2: bump testnet rollback to account for coming reorg (moneromooo-monero)
8f418a6d bulletproofs: #include <openssl/bn.h> (moneromooo-monero)
2bf63650 bulletproofs: speed up the latest changes a bit (moneromooo-monero)
044dff5a bulletproofs: scale points by 8 to ensure subgroup validity (moneromooo-monero)
c83012c4 bulletproofs: match aggregated verification to sarang's latest prototype (moneromooo-monero)
ce0c7432 performance_tests: add padded bulletproof construction (moneromooo-monero)
1224e53b core_tests: add a test for 4-aggregated BP verification (moneromooo-monero)
0e6ed559 fuzz_tests: add a bulletproof fuzz test (moneromooo-monero)
463434d1 more comprehensive test for ge_p3 comparison to identity/point at infinity (moneromooo-monero)
d0a0565f unit_tests: add a few more multiexp unit tests (moneromooo-monero)
6526d87f core_tests: add a test for a tx with empty bulletproof (moneromooo-monero)
a129bbd9 multiexp: fix maxscalar off by one (moneromooo-monero)
7ed496cc ringct: error out when hashToPoint* returns the point at infinity (moneromooo-monero)
d1591853 cryptonote_basic: check output type before using it (moneromooo-monero)
61632dc1 ringct: prevent a potential very large allocation (moneromooo-monero)
a4317e61 crypto: some paranoid checks in generate_signature/check_signature (moneromooo-monero)
7434df1c crypto: never return zero in random32_unbiased (moneromooo-monero)
0825e974 multiexp: fix wrong Bos-Coster result for 1 non trivial input (moneromooo-monero)
a1359ad4 Check inputs to addKeys are in range (moneromooo-monero)
fe0fa3b9 bulletproofs: reject x, y, z, or w[i] being zero (moneromooo-monero)
5ffb2ff9 v8: per byte fee, pad bulletproofs, fixed 11 ring size (moneromooo-monero)
869b3bf8 bulletproofs: a few fixes from the Kudelski review (moneromooo-monero)
c4291762 bulletproofs: reject points not in the main subgroup (moneromooo-monero)
15697177 bulletproofs: speed up a few multiplies using existing Hi cache (moneromooo-monero)
0b05a0fa Add Pippenger cache and limit Straus cache size (moneromooo-monero)
51eb3bdc add pippenger unit tests (moneromooo-monero)
b17b8db3 performance_tests: add stats and loop count multiplier options (moneromooo-monero)
7314d919 perf_timer: split timer class into a base one and a logging one (moneromooo-monero)
d126a02b performance_tests: add aggregated bulletproof tx verification (moneromooo-monero)
263431c4 Pippenger multiexp (moneromooo-monero)
1ed0ed4d multiexp: cut down on memory allocations (moneromooo-monero)
1b867e7f precalc the ge_p3 representation of H (moneromooo-monero)
ef56529f performance_tests: document the tested bulletproof layouts (moneromooo-monero)
30111780 unit_tests: a couple more bulletproof unit tests for gamma (moneromooo-monero)
c444b1b2 require canonical multi output bulletproof layout (moneromooo-monero)
7e67c52f Add a define for the max number of bulletproof multi-outputs (moneromooo-monero)
2a8fcb42 Bulletproof aggregated verification and tests (moneromooo-monero)
126196b0 multiexp: some speedups (moneromooo-monero)
71d67bda aligned: aligned memory alloc/realloc/free (moneromooo-monero)
cb9ecab1 performance_tests: add signature generation/verification (moneromooo-monero)
bacf0a1e bulletproofs: add aggregated verification (moneromooo-monero)
e895c3de make straus cached mode thread safe, and add tests for it (moneromooo-monero)
7f48bf05 multiexp: bos coster now works for just one point (moneromooo-monero)
9ce9f8ca bulletproofs: add multi output bulletproofs to rct (moneromooo-monero)
f34e2e20 performance_tests: add tx checking tests with more than 2 outputs (moneromooo-monero)
0793184b performance_tests: add a --verbose flag, and default to terse (moneromooo-monero)
939bc223 add Straus multiexp (moneromooo-monero)
9ff6e6a0 ringct: add bos coster multiexp (moneromooo-monero)
e9164bb3 bulletproofs: misc optimizations (moneromooo-monero)
112f32f0 performance_tests: add crypto ops (moneromooo-monero)
f5d7b993 performance_tests: add bulletproofs (moneromooo-monero)
8f4ce989 performance_tests: add RingCT MLSAG gen/ver tests (moneromooo-monero)
1aa10c43 performance_tests: add (Borromean) range proofs (moneromooo-monero)
aacfd6e3 bulletproofs: multi-output bulletproofs (moneromooo-monero)
cb1cc757 performance_tests: don't override log level to 0 (moneromooo-monero)
- fix integer overflow in n_bulletproof_amounts
- check input scalars are in range
- remove use of environment variable to tweak straus performance
- do not use implementation defined signed shift for signum
Contains two modifications to improve ASIC resistance: shuffle and integer math.
Shuffle makes use of the whole 64-byte cache line instead of 16 bytes only, making Cryptonight 4 times more demanding for memory bandwidth.
Integer math adds 64:32 bit integer division followed by 64 bit integer square root, adding large and unavoidable computational latency to the main loop.
More details and performance numbers: https://github.com/SChernykh/xmr-stak-cpu/blob/master/README.md