You are right... I owe you that tip... sorry I have been very busy can you send me a BTC address. Thanks
Ditto.
In other news, I think I've finally, finally, actually done it (again).
As we discussed previously, it looks like using a cycle-detection algorithm is no-go because the hash doesn't map to itself.
However, what does work is making a pseudo-rainbow table in memory, on the fly.
The result: the ratio of memory used vs search rate remains exactly the same, but we can scale the size arbitrarily. For example, instead of taking 768 MB memory and 1 minute per search, we can use (768 MB / X) memory and X minutes per search.
This sounds useless at first, but consider the following: given that we can arbitrarily scale the memory requirements, the only limiting factor how fast we can find collisions is our brute computing power. What's good at brute computing power? GPUs and ASICs.
Here's my proof:
Overview:
https://en.wikipedia.org/wiki/Rainbow_table
Make a "rainbow table" map to hold the equivalent hash space of SEARCH_SPACE
For chains of length CHAIN_LENGTH,
rainbow table needs (SEARCH_SPACE / CHAIN_LENGTH) elements [name: RAINBOW_ENTRIES]
Rainbow table will require (8 bytes (hash) + 4 bytes (nonce)) * RAINBOW_ENTRIES bytes total
To compress each SEARCH_SPACE_BITS hash to a new nonce in the range MAX_MOMENTUM_NONCE,
simply use (hash % MAX_MOMENTUM_NONCE)
Pseudocode:
For the first RAINBOW_ENTRIES
nonce = element number
For i = 1 to CHAIN_LENGTH
cur_hash = HASH(nonce)
Check if any part of cur_hash is in rainbow table
If yes and generated by different nonce: add to result list
For each chunk in cur_hash
rainbow_map[chunk] = element number
nonce = compress(cur_hash)
For the next (SEARCH_SPACE - RAINBOW_ENTRIES) elements:
nonce = element number
For i = 1 to CHAIN_LENGTH
cur_hash = Hash(nonce)
Check if any part of cur_hash is in rainbow table
If yes and generated by different nonce: add to result list
nonce = compress(cur_hash)
return result list
// MATH SECTION /////
Given:
CALLS(CHAIN_LENGTH) = total calls to SHA512 using a rainbow table with chain length CHAIN_LENGTH
CALLS(CHAIN_LENGTH) = (SEARCH_SPACE / CHAIN_LENGTH) + (SEARCH_SPACE * CHAIN_LENGTH) - (SEARCH_SPACE / CHAIN_LENGTH)
+ (SEARCH_SPACE / CHAIN_LENGTH) = Calls to generate rainbow table
+ (SEARCH_SPACE * CHAIN_LENGTH) = Calls to check SEARCH_SPACE nonce values
- (SEARCH_SPACE * CHAIN_LENGTH) = Except we skip the nonces used to make the rainbow table
CALLS(CHAIN_LENGTH) = (SEARCH_SPACE * CHAIN_LENGTH)
Given:
MEMORY(CHAIN_LENGTH) = number of bytes required to store a rainbow table representing
SEARCH_SPACE elements with chain length CHAIN_LENGTH
MEMORY(CHAIN_LENGTH) = rainbow table elements * (size of hash chunk + size of nonce)
MEMORY(CHAIN_LENGTH) = (8 + 4) * SEARCH_SPACE / CHAIN_LENGTH
Examples:
Minimum number of SHA512 calls is CHAIN_LENGTH = 1 (equivalent to the "default" semiordered map setup)
Requires (SEARCH_SPACE * 12) = 768 MB memory
Requires (SEARCH_SPACE) calls to SHA512
For CHAIN_LENGTH = N
Requires (SEARCH_SPACE * 12 / N) bytes of memory
Requires (SEARCH_SPACE * N) calls to SHA512
For CHAIN_LENGTH = 1024
Requires (2^26 * 12 / 1024) = 768 KB of memory
Requires (2*26 * 1024) calls to SHA512
Bottom line:
Using a rainbow table map of chain length N,
memory usage is multiplied by a factor of 1/N
and SHA512 calls are multiplied by a factor of N
This means that regardless the CHAIN_LENGTH selected,
the ratio of CALLS(CHAIN_LENGTH) to MEMORY(CHAIN_LENGTH)
is inversely proportional.
EFFICIENCY IS CONSTANT AT ANY MEMORY TARGET.
All that matters is brute computational strength.
Example:
If a GPU/ASIC/FPGA has PROCESSOR number of parallel computing units,
and MEM_MAX bytes of available memory, then each
Momentum hash instance can use up to (MEM_MAX / PROCESSOR) bytes of memory.
To calculate the CHAIN_LEN to acheive this:
CHAIN_LEN(MEM_MAX, PROCESSOR) = (8 + 4) * SEARCH_SPACE / (MEM_MAX / PROCESSOR)
CHAIN_LEN(MEM_MAX, PROCESSOR) = (8 + 4) * SEARCH_SPACE * PROCESSOR / MEM_MAX
Note: This calculation is just the inverse of MEMORY(CHAIN_LEN)
Assuming: GPU with 128 compute units, 1 GB memory, and SEARCH_SPACE = MAX_MOMENTUM_NONCE
CHAIN_LEN(2^30, 2^7) = (8 + 4) * SEARCH_SPACE * PROCESSOR / MEM_MAX
CHAIN_LEN(2^30, 2^7) = (8 + 4) * SEARCH_SPACE * 2^7 / 2^30
CHAIN_LEN(2^30, 2^7) = 12 * SEARCH_SPACE / 2^23
CHAIN_LEN(2^30, 2^7) = 12 * 2^26 / 2^23
CHAIN_LEN(2^30, 2^7) = 12 * 2^3
CHAIN_LEN(2^30, 2^7) = 96
Verification:
MEMORY(CHAIN_LENGTH) = (8 + 4) * SEARCH_SPACE / CHAIN_LENGTH
MEMORY(96) = (8 + 4) * 2^26 / 96
MEMORY(96) = 8388608 bytes
MEMORY(96) = 8 MB
CALLS(CHAIN_LENGTH) = (SEARCH_SPACE * CHAIN_LENGTH)
Result: 1/96th the normal memory usage of 768 MB, 96x the SHA512 calls.
The problem is no longer about memory, it's about computation speed.
I've actually got a somewhat working proof of concept, although it's quite sub-optimal. Right now the main issue is using the overkill boost::unordered_map for storage instead of a more bespoke memory structure. The boost implementation likes to decide it needs to redo the bucket arrangement from time to time, resulting in slowdowns.
It also has an unholy number of debug statements.
If you'd like to check on my progress,
the source is right here. It's a drop-in replacement for "momentum.cpp".
Edit: There's other optimizations possible when using the time/memory tradeoff. For example, if we normally took 1 minute per nonce space search, cranking up the time/memory ratio to 16 would mean we'd expect 16 minutes per search, which is considerably longer than the target time per block. To compensate, we can have the algorithm return immediately on first collision discovery. When a new block is detected, all the miner threads should be stopped and recreated.