Here's a proof that your algorithm scales sublinear with RAM i.e. adding more RAM up to the TARGET_RAM_USE actually has slight diminishing returns and even in the asymptomatic case is at best linear in speed up.
Informally, my impression is that you misunderstand what the actual steps in your algorithm look like. Because of the difficulty function, a user wants to find as many BirthdayHash collisions as possible rather than just the first match and each time a new nonce is BirthdayHashed and put into the hashtable it is compared to the others rather than all hashes being compared in one n^2 computation at the end.
I'll reference my original post on the bitcoin forum and quote extensively from it.
https://bitcointalk.org/index.php?topic=313479.msg3365654#msg3365654Let B = the number of ‘birthdays’ (hashspace of BirthdayHash).
Let S = the number of BirthdayHash(nonce+H) stored in RAM.
Let M = maximum number that S can be given a particular system’s RAM.
If we find a new BirthdayHash(A + H), the expected number of matches ( BirthdayHash(A+H)==BirthdayHash(B+H) ) = S / B
Now we calculate the total number of expected matches during a time frame (average block creation time for instance). we take the number of BirthdayHash() we compute in that time (N). S will increase from 0 to N unless M<N, meaning we do not have enough RAM to store N results, from your comments I gather that you intend for M (TARGET_RAM_USE/(nounce size +hashtable overhead per nounce)) >= N.
N * Mean(S) / B = Total expected matches
If M>=N:
Mean(S) = (1 + 2 + 3 + … + N)/N = (N+1)/2
Now let's see what happens when a user only has M/2 RAM available and M/2<N<=M. To when your challenge I must prove that with M/2 RAM a user would still have at least half as many matches ((N+1)/2)*.5.
For the first M/2 hashes both users produce the same results. Then, the user stuck at M/2 RAM only has M/2 nonces to compare each new one to like this:
(1+2+3 + … M/2 + M/2 + … + M/2) = ((M/2)*(M/2 + 1)/2 + (M/2)*(N-M/2))
The mean then is ((M/2)*(M/2 + 1)/2 + (M/2)*(N-M/2))/N
Total expected matches for M/2 RAM is (Ns cancel out): ((M/2)*(M/2 + 1)/2 + (M/2)*(N-M/2))/B
(M/2 scenario) ((M/2)*(M/2 + 1)/2 + (M/2)*(N-M/2))/B vs (N*(N+1)/2)/B (N<=M scenario)
Plug in the numbers M = 2^26 nonces worth of RAM, N = 2^26 hashes during the time period since it’s always more efficient to continue hashing until the nonce space is complete to avoid extra cost in rebuild the hashtable of potential matches.
(M/2 scenario)
((2^26/2)*(2^26/2 + 1)/2 + (2^26/2)*(2^26-2^26/2)) /2^50= 1.5000000149
vs,
(M scenario)
(2^26*(2^26+1)/2)/2^50 = 2.0000000298
So in the same time frame with half the RAM, I will find ~1.5 matches vs your ~2 matches your scaling is actually worse than a linear increase when you double the RAM.
Let me know if you have any questions. I suggest you run the empirical experiment yourself if you’re not convinced. Just change the code to stop adding new results to the hash table once you’re half way through the nonce space to simulate halving your RAM.
Hope this helps your design. You can get asymptotically linear increases per RAM increase if you make the TARGET_RAM_USE ridiculously huge like 1TB+.
Thanks,
Nathaniel
16EeLa9BrZc1A4sasztaRdnfp3aU8DmuVC