Author Topic: $5000 Bounty - Momentum Proof-of-Work Hacking [PAID]  (Read 54057 times)

0 Members and 1 Guest are viewing this topic.

Offline bytemaster

gigawatt:
Quote
I've managed to make a huge step forward in proving Momentum being not nearly as good for proof-of-work as intended.

The magic lies in using a Bloom filter to store the intermediate hashes.
As a result, instead of using 12 bytes per hash/nonce in a Semi-Ordered Map (which results in ~750 MB of memory), the required memory is (-1 * (2^26 * ln(0.01)) / (ln(2)^2)), or about ~76 MB.

This number can be reduced arbitrarily if we're willing to have a false positive rate greater than 1%.  For example, if we allowed up to a 50% chance of having a false positive, the memory requirement drops to ~11 MB.


Here's a overview of how the algorithm works:

Make a "main" bloom filter of size 2^26 with a false positive rate 1%: ~76 MB
Make a tiny "clash" bloom filter of size 2^16 and false positive rate 2^-26: ~0.7 MB
Make a vector of pairs< hash, nonce > to store candidate birthday collisions.
For each nonce in the search space, check if its hash exists in the "main" bloom filter.  If it is, add it's entry to the "clash" bloom filter.
The "main" bloom is no longer required and can be discarded.
For each nonce in the search check if its hash exists in the "clash" bloom filter.  If it does, add < hash, nonce > to a candidate list for investigation.
Sort the list of candidates by hash.
For each pair in the candidate list, see if the previous element has the same hash.  If it does, add it to the output list.  This step removes false positives by comparing the actual hash instead of the bloom filter's idea of a hash.
Return the output list as normal.


For your testing pleasure, I also built a working proof of concept.
(Most of the magic is right here.  The bloom filter is a modified version of "bloom.cpp" called "bigbloom.cpp")

Unmodified source: http://i.imgur.com/k2cNrmd.png

Source using bloom filters: http://i.imgur.com/w8Enf9e.png

In exchange for lowering the memory requirements by a factor of ~10, the algorithm runs at about 1/4th speed, mainly due to the doubling calls to SHA512 and the hash calls within the bloom filters.  The overall result is a net efficiency gain of approximately 2.5
The reduction in memory requirement means that if we could fit N instances of Momentum in GPU memory, we can instead fit 10*N instances.  If we up the false positive rate in exchange for more time spent sorting, we can achieve ratios of up to 70*N.

Given that bloom filters, SHA512, and sorting data are all parallel/GPU friendly, we can conclude that Momentum as a proof-of-work isn't nearly as GPU-hard as initially intended.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline bytemaster

Thanks. I still think you're underestimating the issue. I think you should change it to have a large nonce space much larger than anyone could store in memory so that you have asymptotically linear scaling in terms of RAM.

Could you clarify what exactly you require in order to claim the full bounty?

Nathaniel,  to collect the full bounty I must be convinced that a momentum derived proof-of-work is not much better than SCRYPT at hindering an ASIC take over.    There is clearly a lot to learn about the structure and behavior of this class of proof-of-work.   From what I can tell it is still clearly the best proof-of-work system conceived of thus far.     

Let me repost an attack on this proof of work from bitcointalk for discussion here.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline Nathaniel B

  • Newbie
  • *
  • Posts: 6
    • View Profile
Thanks. I still think you're underestimating the issue. I think you should change it to have a large nonce space much larger than anyone could store in memory so that you have asymptotically linear scaling in terms of RAM.

Could you clarify what exactly you require in order to claim the full bounty?
« Last Edit: November 07, 2013, 11:54:17 pm by Nathaniel B »

Offline bytemaster

Nathaniel B...  I have been thinking about it and will tip you 0.5 BTC for your contribution to understanding this proof of work and because I specified a poor metric for evaluating whether or not the proof of work was 'broken'.   You have clearly helped.

Thank you.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline bytemaster

Ok... my own graphs show that the area 'over' the green curve is less than half the area under the 'blue' curve which would seem to indicate the principle you are referring to.

Namely, that reducing memory by 50% only hurts performance by ~20% (eyeballing it).       However, it does hurt performance.   

I calculated the area under the curve for a 1/1000 reduction in memory requirement (down to the size of SCRYPT) and performance is 253.58x slower...

To get back to equal performance with single threading you would have to go parallel 254 threads each using 1 MB of memory for a total of 253 MB of memory, a 75% memory reduction.

To run this on a GPU the most parallel threads you could get going on a graphics card your would need  2GB video memory.   The most CPU-core equivalent performance you could get is 8 x.   

All of this assumes of course that the GPU threads are of equal performance to the CPU threads (which they are not).   So an 8-core CPU is still in the same neighborhood has a 2GB GPU using your trick to reduce memory usage.
« Last Edit: November 07, 2013, 10:15:10 pm by bytemaster »
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline Nathaniel B

  • Newbie
  • *
  • Posts: 6
    • View Profile
I decided I would graph the probability of NOT FINDING a hash after searching the NONCE space using three different amounts of memory... 100%   50% and 25%



Both use cases search the same number of nonces and thus do the same number of SHA512... but you take a performance hit if you reduce your memory...

Obviously, slight reductions in memory don't hurt too much, but if you attempted to reduce your memory from 768 MB down to 1 MB so that you could operate 1000 runs in parallel then your probability of NOT finding a match would be: 
0.368196 WITH MEMORY,
0.997402 WITHOUT MEMORY

So your probability of finding a match is 243x smaller for the same number of computations. 

So I guess this means that you can trade 765 MB of RAM for 243 parallel processes each with 1 MB of RAM and have the SAME hash rate.    Now you only actually decreased your RAM requirements by 33%...  while increasing the number of computations required by 243x...

The issue is that mining is NOT about finding 1 or more matches it's about finding as many matches as possible. Your numbers and graphs ALL describe mining as if the goal is to find just the first match whereas your algorithm let's any match be tested against the difficulty function to find a block. A miner wants to find many matches not just the first one.

What sort of implementation proof do you want? If I find the hashtable in the code base and make the change I suggested to artificially limit the number of nonces stored to half and then show it mining at more than half the speed would that be sufficient to convince you and win the bounty?

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
You have NOT CONVINCED ME...

Perfect. Confirmed my suspicions about what I've heard.

I have hashed (pun intended) this out with many other intelligent people who also disagree with your assessments.

And so why don't you list the technical reasons why you and they disagree.
« Last Edit: November 07, 2013, 11:31:53 pm by AnonyMint »

Offline bytemaster

Ok, lets do this on a GPU..... I have found through experience that theoretical gains on the GPU often do not pan out as you expect.   So I need to see an implementation.

Your bounty only called for showing it is NO BETTER than Scrypt. I already did. I don't have to prove it is much faster on the CPU (but it will also be, yet that is irrelevant to how you stated your bounty).

If you choose to blissfully ignore what has been revealed to you, then so be it. I love it if you go down the wrong road. Please do. Worth much more to me than the $5000.

You have NOT CONVINCED ME... perhaps you have convinced yourself, but that is not the same thing.   If I had to choose between momentum and scrypt I would pick momentum... I have hashed (pun intended) this out with many other intelligent people who also disagree with your assessments.    It is possible to convince me, but it will require much more than handwaving about theoretical performance on theoretical GPUs with a theoretical GPU algorithm...   

ProtoShares is ultimately a big bounty on Momentum that can easily be claimed by anyone who is able to execute the designs you describe with any meaningful effect.    Trust me or not... I have much more to lose than $5000 if this is no better than SCRYPT.   
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
Ok, lets do this on a GPU..... I have found through experience that theoretical gains on the GPU often do not pan out as you expect.   So I need to see an implementation.

Your bounty only called for showing it is NO BETTER than Scrypt. I already did. I don't have to prove it is much faster on the GPU (but it will also be, yet that is irrelevant to how you stated your bounty).

If you choose to blissfully ignore what has been revealed to you, then so be it. I love it if you go down the wrong road. Please do. Worth much more to me than the $5000.
« Last Edit: November 07, 2013, 09:09:53 pm by AnonyMint »

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
[redacted my explanation of why $100 per hour is an insult to me]

You wanted help on insights into weaknesses. I have explained it. If there is something you don't understand, ask.

Your attitude sucks. If you want to discuss why you doubt my claim, then fine we can engage. But if you expect everyone to be your servant, then I will stop helping you.
« Last Edit: November 07, 2013, 09:38:52 pm by AnonyMint »

Offline bytemaster

Your original bounty didn't ask for an implementation. And $5000 doesn't buy enough time from me.

It is up to you how far you want to go down the wrong road.

There is no way to make CPU-only and fast verification. I realized that recently in analysis and research. I realize this is big problem for your coin design.

If you cannot build a GPU algorithm in 6 days at $100 an hour... someone else will.   Even so, you have to CONVINCE ME it is not better than SCRYPT and a GPU implementation would have to have as much gain in performance vs the CPU as the GPU implementation of SCRYPT vs the CPU implementation of SCRYPT just to be in the running.       
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline bytemaster

From bitcointalk.org thread,  link=topic=313479.msg3363346#msg3363346 date=1382116292

Quote from: bytemaster
If you are able to convince me this proof-of-work is no better than Scrypt then you will win the 30 btc bounty.

Given my immediately prior post, I hereby claim the 30 BTC bounty.

Wow... so you are a mind reader and a troll.    I have dealt with you extensively in the past and set bitcointalk to IGNORE you... So I will do the same here.   I kindly suggest you go else where because I do not have the time to reply to you with this attitude.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
Your original bounty didn't ask for an implementation. And $5000 doesn't buy enough time from me.

It is up to you how far you want to go down the wrong road.

There is no way to make CPU-only and fast verification. I realized that recently in analysis and research. I realize this is big problem for your coin design.

Offline bytemaster

The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.

If you don't require that the first match in the sequence of generated pseudo-random numbers, then my upthread described parallelism attack works to allow GPUs to blow away CPUs, assuming the pseudo-random sequence can be generated much faster than the random memory latency bound on the CPU so the GPU can try 100s of values from the same pseudo-random sequence in parallel to hide the memory latency.

If you do require it, then you lose the fast validation time.

Do you get it now or you going to continue to ignore me?

Ok, lets do this on a GPU..... I have found through experience that theoretical gains on the GPU often do not pan out as you expect.   So I need to see an implementation.  $5000 is more than enough to pay for an implementation if you are so sure that you are right.

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
From bitcointalk.org thread,  link=topic=313479.msg3363346#msg3363346 date=1382116292

Quote from: bytemaster
If you are able to convince me this proof-of-work is no better than Scrypt then you will win the 30 btc bounty.

Given my immediately prior post, I hereby claim the 30 BTC bounty.