Author Topic: $5000 Bounty - Momentum Proof-of-Work Hacking [PAID]  (Read 54067 times)

0 Members and 1 Guest are viewing this topic.

Offline digitalindustry

  • Jr. Member
  • **
  • Posts: 41
    • View Profile
Seriously though I have no idea how to compile it anyhow, but if it works I guess its a share changer. 

Offline digitalindustry

  • Jr. Member
  • **
  • Posts: 41
    • View Profile
I'm happy to report that it works like a charm.
I've updated the source code.
Feel free to test it for yourself.


It correctly finds nonce collisions and follows all the predictions made in my earlier post.
I've conclusively demonstrated that memory requirements can be scaled arbitrarily while retaining an equivalent memory-to-computation ratio.  Latency has been made a non-issue due to immediate return on collision detection and external thread termination.
I'm happy to report that it works like a charm.
I've updated the source code.
Feel free to test it for yourself.


It correctly finds nonce collisions and follows all the predictions made in my earlier post.
I've conclusively demonstrated that memory requirements can be scaled arbitrarily while retaining an equivalent memory-to-computation ratio.  Latency has been made a non-issue due to immediate return on collision detection and external thread termination.

Here's a snippet of what the output looks like using CHAIN_LENGTH = 4:

Step 31 of 128...
  HITS: 743672
  MISS: 1353480
 TOTAL: 1966081
   PCT: 35
Step 32 of 128...
  HITS: 760208
  MISS: 1336944
 TOTAL: 2031617
   PCT: 36
RAINBOW HIT
  SEEK HASH: 423473256423089
 ORIG NONCE: 28524699
 RAIN NONCE: 5439512 + 1
         0: 203104387400228
         1: 423473256423089
         2: 1041792957055939
         3: 780112116212973
         4: 629783454762940
         5: 675829228796259
         6: 738539531672861
         7: 577969205224283
NAILED IT! Nonce: 28524696 + 3
         0: 959239184726270
         1: 186912846539772
         2: 633556876464486
         3: 423473256423089
         4: 553069074389047
         5: 746169572180730
         6: 539623141106693
         7: 588368479534609
Checking Work:
    Nonce A: 5439513
    Nonce B: 28524699
     Hash A: 423473256423089
     Hash B: 423473256423089




And the view from the outside: http://i.imgur.com/rMUKK5t.png

so this is for cpu? how to compile for windows?

Here's a snippet of what the output looks like using CHAIN_LENGTH = 4:

Step 31 of 128...
  HITS: 743672
  MISS: 1353480
 TOTAL: 1966081
   PCT: 35
Step 32 of 128...
  HITS: 760208
  MISS: 1336944
 TOTAL: 2031617
   PCT: 36
RAINBOW HIT
  SEEK HASH: 423473256423089
 ORIG NONCE: 28524699
 RAIN NONCE: 5439512 + 1
         0: 203104387400228
         1: 423473256423089
         2: 1041792957055939
         3: 780112116212973
         4: 629783454762940
         5: 675829228796259
         6: 738539531672861
         7: 577969205224283
NAILED IT! Nonce: 28524696 + 3
         0: 959239184726270
         1: 186912846539772
         2: 633556876464486
         3: 423473256423089
         4: 553069074389047
         5: 746169572180730
         6: 539623141106693
         7: 588368479534609
Checking Work:
    Nonce A: 5439513
    Nonce B: 28524699
     Hash A: 423473256423089
     Hash B: 423473256423089




And the view from the outside: http://i.imgur.com/rMUKK5t.png

so this is for cpu? how to compile for windows?

^
This is barwizi hating life lol , I think ill ramp up the core 2 duo barwizi...

Offline bytemaster

I'm happy to report that it works like a charm.
I've updated the source code.
Feel free to test it for yourself.


It correctly finds nonce collisions and follows all the predictions made in my earlier post.
I've conclusively demonstrated that memory requirements can be scaled arbitrarily while retaining an equivalent memory-to-computation ratio.  Latency has been made a non-issue due to immediate return on collision detection and external thread termination.

Here's a snippet of what the output looks like using CHAIN_LENGTH = 4:

Step 31 of 128...
  HITS: 743672
  MISS: 1353480
 TOTAL: 1966081
   PCT: 35
Step 32 of 128...
  HITS: 760208
  MISS: 1336944
 TOTAL: 2031617
   PCT: 36
RAINBOW HIT
  SEEK HASH: 423473256423089
 ORIG NONCE: 28524699
 RAIN NONCE: 5439512 + 1
         0: 203104387400228
         1: 423473256423089
         2: 1041792957055939
         3: 780112116212973
         4: 629783454762940
         5: 675829228796259
         6: 738539531672861
         7: 577969205224283
NAILED IT! Nonce: 28524696 + 3
         0: 959239184726270
         1: 186912846539772
         2: 633556876464486
         3: 423473256423089
         4: 553069074389047
         5: 746169572180730
         6: 539623141106693
         7: 588368479534609
Checking Work:
    Nonce A: 5439513
    Nonce B: 28524699
     Hash A: 423473256423089
     Hash B: 423473256423089




And the view from the outside: http://i.imgur.com/rMUKK5t.png

Wow, this looks good.... I need to understand it a bit better first.    You have come up with a new algorithm, how many HPM do you get with it on a CPU and how much RAM are you using?
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline barwizi

  • Hero Member
  • *****
  • Posts: 764
  • Noirbits, NoirShares, NoirEx.....lol, noir anyone?
    • View Profile
    • Noirbitstalk.org
I'm happy to report that it works like a charm.
I've updated the source code.
Feel free to test it for yourself.


It correctly finds nonce collisions and follows all the predictions made in my earlier post.
I've conclusively demonstrated that memory requirements can be scaled arbitrarily while retaining an equivalent memory-to-computation ratio.  Latency has been made a non-issue due to immediate return on collision detection and external thread termination.

Here's a snippet of what the output looks like using CHAIN_LENGTH = 4:

Step 31 of 128...
  HITS: 743672
  MISS: 1353480
 TOTAL: 1966081
   PCT: 35
Step 32 of 128...
  HITS: 760208
  MISS: 1336944
 TOTAL: 2031617
   PCT: 36
RAINBOW HIT
  SEEK HASH: 423473256423089
 ORIG NONCE: 28524699
 RAIN NONCE: 5439512 + 1
         0: 203104387400228
         1: 423473256423089
         2: 1041792957055939
         3: 780112116212973
         4: 629783454762940
         5: 675829228796259
         6: 738539531672861
         7: 577969205224283
NAILED IT! Nonce: 28524696 + 3
         0: 959239184726270
         1: 186912846539772
         2: 633556876464486
         3: 423473256423089
         4: 553069074389047
         5: 746169572180730
         6: 539623141106693
         7: 588368479534609
Checking Work:
    Nonce A: 5439513
    Nonce B: 28524699
     Hash A: 423473256423089
     Hash B: 423473256423089




And the view from the outside: http://i.imgur.com/rMUKK5t.png

so this is for cpu? how to compile for windows?
--Bar--  PiNEJGUv4AZVZkLuF6hV4xwbYTRp5etWWJ

The magical land of crypto, no freebies people.

Offline iruu

  • Jr. Member
  • **
  • Posts: 46
    • View Profile
So what are the precise terms of the $5000 bounty now? So that there're no doubts whether something meets it or not. 

Let's say a gpu miner uses x amount of memory and searches the 2^23 sha space in y seconds, on a modern graphics card. What're the x and y for the bounty? 
Or is memory unimportant, just that it can work effectively on a gpu? Effectively defined how?
Or is it the relation? When memory use is cut in half, speed drops less than a half (ie. relation is lower than linear). 

Can a proof of concept use simpler hash functions than sha512, if the algorithm is hash algorithm-agnostic? 

Or is it off?
« Last Edit: November 10, 2013, 09:32:22 am by iruu »

Offline gigawatt

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
I'm happy to report that it works like a charm.
I've updated the source code.
Feel free to test it for yourself.


It correctly finds nonce collisions and follows all the predictions made in my earlier post.
I've conclusively demonstrated that memory requirements can be scaled arbitrarily while retaining an equivalent memory-to-computation ratio.  Latency has been made a non-issue due to immediate return on collision detection and external thread termination.

Here's a snippet of what the output looks like using CHAIN_LENGTH = 4:

Step 31 of 128...
  HITS: 743672
  MISS: 1353480
 TOTAL: 1966081
   PCT: 35
Step 32 of 128...
  HITS: 760208
  MISS: 1336944
 TOTAL: 2031617
   PCT: 36
RAINBOW HIT
  SEEK HASH: 423473256423089
 ORIG NONCE: 28524699
 RAIN NONCE: 5439512 + 1
         0: 203104387400228
         1: 423473256423089
         2: 1041792957055939
         3: 780112116212973
         4: 629783454762940
         5: 675829228796259
         6: 738539531672861
         7: 577969205224283
NAILED IT! Nonce: 28524696 + 3
         0: 959239184726270
         1: 186912846539772
         2: 633556876464486
         3: 423473256423089
         4: 553069074389047
         5: 746169572180730
         6: 539623141106693
         7: 588368479534609
Checking Work:
    Nonce A: 5439513
    Nonce B: 28524699
     Hash A: 423473256423089
     Hash B: 423473256423089




And the view from the outside: http://i.imgur.com/rMUKK5t.png
« Last Edit: November 10, 2013, 08:54:58 am by gigawatt »
BTC: 1E2egHUcLDAmcxcqZqpL18TPLx9Xj1akcV
PTS: Pstkk1gZCxc4GEwS1eBAykYwVmcubU1P8L

Offline gigawatt

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
I don't have time at the moment to go through your method, but does momentum_verify(...) work on your results?

I'm working out a few kinks right now.  It said it found a hit earlier, but I accidentally had it dump the wrong info, so I had no way to verify.

This may be true for bandwidth of calculations, but what of latency?    Suppose you turn this from a 1 minute search into a 768 minute search... if you perform it a billion times in parallel you will have amazing bandwidth, but terrible latency.

Absolutely correct.  I edited my main post to reflect handling that.
The simple solution: as soon as you find any birthday collisions, return immediately.  If it found a hit 2 seconds in, spending the next ~750 minutes to finish off the run before returning makes the collisions stale, regardless how many were found.

The second is allowing the parent thread to terminate the children whenever a new block is found.  This is already handled in the main ProtoShares branch since commit 3dfc3cc33.

Given both of these capabilities, latency is no longer an issue.
BTC: 1E2egHUcLDAmcxcqZqpL18TPLx9Xj1akcV
PTS: Pstkk1gZCxc4GEwS1eBAykYwVmcubU1P8L

Offline bytemaster

I don't have time at the moment to go through your method, but does momentum_verify(...) work on your results?

Quote
This sounds useless at first, but consider the following: given that we can arbitrarily scale the memory requirements, the only limiting factor how fast we can find collisions is our brute computing power.  What's good at brute computing power?  GPUs and ASICs.

This may be true for bandwidth of calculations, but what of latency?    Suppose you turn this from a 1 minute search into a 768 minute search... if you perform it a billion times in parallel you will have amazing bandwidth, but terrible latency.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline gigawatt

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
You are right... I owe you that tip... sorry I have been very busy  can you send me a BTC address.  Thanks

Ditto.  ;D



In other news, I think I've finally, finally, actually done it (again).
As we discussed previously, it looks like using a cycle-detection algorithm is no-go because the hash doesn't map to itself.

However, what does work is making a pseudo-rainbow table in memory, on the fly.
The result: the ratio of memory used vs search rate remains exactly the same, but we can scale the size arbitrarily.  For example, instead of taking 768 MB memory and 1 minute per search, we can use (768 MB / X) memory and X minutes per search.

This sounds useless at first, but consider the following: given that we can arbitrarily scale the memory requirements, the only limiting factor how fast we can find collisions is our brute computing power.  What's good at brute computing power?  GPUs and ASICs.


Here's my proof:

Overview:
https://en.wikipedia.org/wiki/Rainbow_table
Make a "rainbow table" map to hold the equivalent hash space of SEARCH_SPACE
  For chains of length CHAIN_LENGTH,
    rainbow table needs (SEARCH_SPACE / CHAIN_LENGTH) elements [name: RAINBOW_ENTRIES]
  Rainbow table will require (8 bytes (hash) + 4 bytes (nonce)) * RAINBOW_ENTRIES bytes total
To compress each SEARCH_SPACE_BITS hash to a new nonce in the range MAX_MOMENTUM_NONCE,
  simply use (hash % MAX_MOMENTUM_NONCE)

Pseudocode:
For the first RAINBOW_ENTRIES
  nonce = element number
  For i = 1 to CHAIN_LENGTH
    cur_hash = HASH(nonce)
    Check if any part of cur_hash is in rainbow table
      If yes and generated by different nonce: add to result list
    For each chunk in cur_hash
      rainbow_map[chunk] = element number
    nonce = compress(cur_hash)

For the next (SEARCH_SPACE - RAINBOW_ENTRIES) elements:
  nonce = element number
  For i = 1 to CHAIN_LENGTH
    cur_hash = Hash(nonce)
    Check if any part of cur_hash is in rainbow table
      If yes and generated by different nonce: add to result list
    nonce = compress(cur_hash)

return result list


// MATH SECTION /////
Given:
CALLS(CHAIN_LENGTH) = total calls to SHA512 using a rainbow table with chain length CHAIN_LENGTH
CALLS(CHAIN_LENGTH) = (SEARCH_SPACE / CHAIN_LENGTH) + (SEARCH_SPACE * CHAIN_LENGTH) - (SEARCH_SPACE / CHAIN_LENGTH)
  + (SEARCH_SPACE / CHAIN_LENGTH) = Calls to generate rainbow table
  + (SEARCH_SPACE * CHAIN_LENGTH) = Calls to check SEARCH_SPACE nonce values
  - (SEARCH_SPACE * CHAIN_LENGTH) = Except we skip the nonces used to make the rainbow table
CALLS(CHAIN_LENGTH) = (SEARCH_SPACE * CHAIN_LENGTH)

Given:
MEMORY(CHAIN_LENGTH) = number of bytes required to store a rainbow table representing
                         SEARCH_SPACE elements with chain length CHAIN_LENGTH
MEMORY(CHAIN_LENGTH) = rainbow table elements * (size of hash chunk + size of nonce)
MEMORY(CHAIN_LENGTH) = (8 + 4) * SEARCH_SPACE / CHAIN_LENGTH

Examples:
Minimum number of SHA512 calls is CHAIN_LENGTH = 1 (equivalent to the "default" semiordered map setup)
  Requires (SEARCH_SPACE * 12) = 768 MB memory
  Requires (SEARCH_SPACE) calls to SHA512
For CHAIN_LENGTH = N
  Requires (SEARCH_SPACE * 12 / N) bytes of memory
  Requires (SEARCH_SPACE * N) calls to SHA512
For CHAIN_LENGTH = 1024
  Requires (2^26 * 12 / 1024) = 768 KB of memory
  Requires (2*26 * 1024) calls to SHA512

Bottom line:
Using a rainbow table map of chain length N,
  memory usage is multiplied by a factor of 1/N
  and SHA512 calls are multiplied by a factor of N
This means that regardless the CHAIN_LENGTH selected,
  the ratio of CALLS(CHAIN_LENGTH) to MEMORY(CHAIN_LENGTH)
  is inversely proportional.
EFFICIENCY IS CONSTANT AT ANY MEMORY TARGET.
All that matters is brute computational strength.

Example:
If a GPU/ASIC/FPGA has PROCESSOR number of parallel computing units,
  and MEM_MAX bytes of available memory, then each
  Momentum hash instance can use up to (MEM_MAX / PROCESSOR) bytes of memory.
To calculate the CHAIN_LEN to acheive this:
  CHAIN_LEN(MEM_MAX, PROCESSOR) = (8 + 4) * SEARCH_SPACE / (MEM_MAX / PROCESSOR)
  CHAIN_LEN(MEM_MAX, PROCESSOR) = (8 + 4) * SEARCH_SPACE * PROCESSOR / MEM_MAX
  Note: This calculation is just the inverse of MEMORY(CHAIN_LEN)

Assuming: GPU with 128 compute units, 1 GB memory, and SEARCH_SPACE = MAX_MOMENTUM_NONCE
  CHAIN_LEN(2^30, 2^7) = (8 + 4) * SEARCH_SPACE * PROCESSOR / MEM_MAX
  CHAIN_LEN(2^30, 2^7) = (8 + 4) * SEARCH_SPACE * 2^7 / 2^30
  CHAIN_LEN(2^30, 2^7) = 12 * SEARCH_SPACE / 2^23
  CHAIN_LEN(2^30, 2^7) = 12 * 2^26 / 2^23
  CHAIN_LEN(2^30, 2^7) = 12 * 2^3
  CHAIN_LEN(2^30, 2^7) = 96
  Verification:
  MEMORY(CHAIN_LENGTH) = (8 + 4) * SEARCH_SPACE / CHAIN_LENGTH
  MEMORY(96) = (8 + 4) * 2^26 / 96
  MEMORY(96) = 8388608 bytes
  MEMORY(96) = 8 MB
  CALLS(CHAIN_LENGTH) = (SEARCH_SPACE * CHAIN_LENGTH)
Result: 1/96th the normal memory usage of 768 MB, 96x the SHA512 calls.
  The problem is no longer about memory, it's about computation speed.



I've actually got a somewhat working proof of concept, although it's quite sub-optimal.  Right now the main issue is using the overkill boost::unordered_map for storage instead of a more bespoke memory structure.  The boost implementation likes to decide it needs to redo the bucket arrangement from time to time, resulting in slowdowns.

It also has an unholy number of debug statements.

If you'd like to check on my progress, the source is right here.  It's a drop-in replacement for "momentum.cpp".



Edit: There's other optimizations possible when using the time/memory tradeoff.  For example, if we normally took 1 minute per nonce space search, cranking up the time/memory ratio to 16 would mean we'd expect 16 minutes per search, which is considerably longer than the target time per block.  To compensate, we can have the algorithm return immediately on first collision discovery.  When a new block is detected, all the miner threads should be stopped and recreated.
« Last Edit: November 10, 2013, 06:10:36 am by gigawatt »
BTC: 1E2egHUcLDAmcxcqZqpL18TPLx9Xj1akcV
PTS: Pstkk1gZCxc4GEwS1eBAykYwVmcubU1P8L

Offline bytemaster

I am looking for algorithmic weaknesses in the Momentum Proof of Work: http://bitsharestalk.org/index.php?topic=21.msg24#msg24 

If you can identify an alternative algorithm for solving the proof of work that does not require the target RAM and yet is able to find hashes at a rate greater than  MomentumAlgoHashRate / (TARGET_RAM_USE/ALT_RAM_USE)  then you could win this bounty.

For example:  Suppose momentum as specked requires about 1 GB of RAM for maximum performance and produces hashes at 1hz.   If you design an algorithm that only requires 1 MB of RAM and can find hashes at .001 hz then you win because you will have proven that computational complexity is linear with RAM use rather than non-linear like I believe the problem calls for.   
      I paid out a tip for showing this property doesn't quite hold as I thought, but this was more a weakness in my devising an effective / objective measure for determining whether or not momentum is "broken' or 'hacked'.

For the record, I did not receive any tip.

You are right... I owe you that tip... sorry I have been very busy  can you send me a BTC address.  Thanks
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
I am looking for algorithmic weaknesses in the Momentum Proof of Work: http://bitsharestalk.org/index.php?topic=21.msg24#msg24 

If you can identify an alternative algorithm for solving the proof of work that does not require the target RAM and yet is able to find hashes at a rate greater than  MomentumAlgoHashRate / (TARGET_RAM_USE/ALT_RAM_USE)  then you could win this bounty.

For example:  Suppose momentum as specked requires about 1 GB of RAM for maximum performance and produces hashes at 1hz.   If you design an algorithm that only requires 1 MB of RAM and can find hashes at .001 hz then you win because you will have proven that computational complexity is linear with RAM use rather than non-linear like I believe the problem calls for.   
      I paid out a tip for showing this property doesn't quite hold as I thought, but this was more a weakness in my devising an effective / objective measure for determining whether or not momentum is "broken' or 'hacked'.

For the record, I did not receive any tip.

1) What is the performance of SCRYPT on CPU vs GPU? 

From the Litecoin Wiki, I seem to remember roughly 5 - 10 times faster for high-end GPU compared i7. Compared to roughly 100x faster for Bitcoin. You can double-check.

https://litecoin.info/Comparison_between_Litecoin_and_Bitcoin
https://litecoin.info/Mining_hardware_comparison
https://en.bitcoin.it/wiki/Mining_hardware_comparison

If we compare the i7 2600 both on pooler's cpuminer, then ratio to the HD7970 is roughly 30 for Bitcoin and 15 for Litecoin. So Litecoin is only a 2x improvement over Bitcoin, not the order-of-magnitude I had indicated.
« Last Edit: November 09, 2013, 09:52:21 pm by AnonyMint »

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
The whole theory/practice thing is what makes this whole discussion so hard.

The additional threads masking latency you have already the evidence of your own experience with hyperthreads, and also you can see this in analyzing the reports of cpuminer settings with Litecoin on the GPU.

On cpuminer, the "lookup gap" is the trading off memory for computation. The other main setting is the number of threads to run.
« Last Edit: November 08, 2013, 10:54:32 pm by AnonyMint »

Offline AnonyMint

  • Jr. Member
  • **
  • Posts: 30
    • View Profile
Some high-end CPUs don't have AGPUs. Look at the Bitcointalk thread for your protoshares launch, you see many Xeons and high-end i7. The server chips don't have AGPUs.

Integrated Graphics can saturate its memory bus (60 GB/Sec on latest hardware)

Is that limited amount of sideport memory or the full main DRAM?

The fastest I'd seen was the 40 GB/sec for Ivy-E and those don't have AGPUs.

On Haswell, Ivy, Sandy with AGPU I saw roughly 20 GB/sec, but I wasn't clear if the AGPU can attain that since it has worse latency to main memory than the CPU.

though I think in this particular use case moving more than 64 bytes (cache line size) per cycle doesn't help due to the random access nature.   In fact, it may hurt the GPU by being forced to consume its memory bandwidth in larger than necessary chunks when it is randomly accessed.

If you have enough parallel threads they will to some extent statistically clump in locality of memory. I haven't attempted to quantify that effect.

Hashtables would point to arrays of items in the bucket.

So there is a 26x difference between average CPU based systems and high end graphics cards.

Rough estimate agreed. On Haswell or latest, maybe half that.

Then higher if no AGPU integrated.

1) What is the performance of SCRYPT on CPU vs GPU? 

From the Litecoin Wiki, I seem to remember roughly 5 - 10 times faster for high-end GPU compared i7. Compared to roughly 100x faster for Bitcoin. You can double-check.

2) Does momentum have a lower performance ratio than SCRYPT?

Appears to be worse than Litecoin. But you gain fast verification, so for you that might be the correct tradeoff, especially if integrated graphics will be improving on CPUs (so you might reach parity with Litecoin or better in future) and the ASIC risk is not higher than Scrypt (or if you are not so worried about ASIC near-term).

3) When it comes to making an ASIC, which would be harder SCRYPT or Momentum?

I haven't analyzed this for Momentum. I have analyzed it for Scrypt. I can comment after others have summarized if I have something to add.

As far as how to divided the bounty among the various contributors I am thinking that AnonyMint and gigawatt have both provided some reasonable contributions.   I will wait to see gigawatt's proof-of-concept implementation to be sure there is a cycle and that the computational time is not larger than available parallelism before making the final decision about relative payout.   Gigawatts proof-of-concept shows more time and energy invested and is also more compelling than theoretical estimations of performance gains provided by AnonyMint.  Clearly writing code is of greater value than writing forum posts and also more definitive.

I have no problem with that. I am not expecting anything, but I think it does help you in the community to be fair and reward help. But I see we have a new comment claiming that Gigawatts vulnerability is not valid. I would go slowly and be sure first.

Good work everyone, it is a pleasure debating this with you all.  I hope you recognize that I am in this to find the best algorithm and NOT to defend my ego and put my head in the sand.   This bounty and ProtoShares has brought out great minds and we sharpen one another.

I appreciate your clarification. I know you were sick the prior day, because you mentioned it.
« Last Edit: November 08, 2013, 11:43:02 pm by AnonyMint »

Offline bytemaster

The cycle algorithm idea is wrong, there's no cycle in the 50 bit hash. The cycle is in 512 bit sha output, but that's equivalent to finding sha512 collisions.

Regardless of birthday algorithm, generating 2^23 sha512 on radeon 5870 (no oc) takes 180ms. 64 bit values and 1024 block makes parallelism on current compute units hard. That's close to output of new intel cpu. So gpus aren't going to win by a big margin, like with bitcoin's sha256. 
The collision finding time, even on a cpu, is almost insignificant, 2^26 is a small value. 
I'm writing gpu miner as an opencl learning experience. Looks like finding all birthdays in 2^26 hashes is going to take 200-400ms on radeon 5870.

iruu, thank you for joining the discussion.  You confirmed my belief that there were no cycles and that the GPU would have a hard time in practice doing this.  The whole theory/practice thing is what makes this whole discussion so hard.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline iruu

  • Jr. Member
  • **
  • Posts: 46
    • View Profile
The cycle algorithm idea is wrong, there's no cycle in the 50 bit hash. The cycle is in 512 bit sha output, but that's equivalent to finding sha512 collisions.

Regardless of birthday algorithm, generating 2^23 sha512 on radeon 5870 (no oc) takes 180ms. 64 bit values and 1024 block makes parallelism on current compute units hard. That's close to output of new intel cpu. So gpus aren't going to win by a big margin, like with bitcoin's sha256. 
The collision finding time, even on a cpu, is almost insignificant, 2^26 is a small value. 
I'm writing gpu miner as an opencl learning experience. Looks like finding all birthdays in 2^26 hashes is going to take 200-400ms on radeon 5870.

edit: 70ms for sha512, so not that bad. (optimizing opencl is completely different from cpu, and I'm not talking about obvious sequential vs parallel differences), time for the harder part.
« Last Edit: November 08, 2013, 11:36:59 pm by iruu »