Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - gigawatt

Pages: 1 [2]
16
My patch should fix this issue:

Code: [Select]
--- ProtoShares/src/makefile.unix       2013-11-11 11:46:13.019922787 -0800
+++ ProtoShares-new/src/makefile.unix   2013-11-11 11:47:52.255921671 -0800
@@ -141,7 +141,8 @@
     obj/bloom.o \
     obj/noui.o \
     obj/leveldb.o \
-    obj/txdb.o
+    obj/txdb.o \
+    obj/momentum.o
 
 
 all: bitcoind



save the code above as makefile.patch and patch -p0 < makefile.patch, you should be able to compile it again 8)

That's weird.  I wonder why "momentum.o" wasn't include in the makefile to begin with?  The makefile shows the last edit date was 7 months ago, so it must have been like that from the beginning.

How the hell did mine compile?

17
BitShares PTS / Re: Client keeps crashing
« on: November 11, 2013, 04:50:55 pm »
Running a few tests to see if I can replicate the issue.  I've got a watchdog wrapper for my protoshare client, so if it crashes I'll capture the debug.log file for analysis.

I'll let you know if I spot anything.

18
Check the "src" folder.  Are the "momentum.cpp" and "momentum.h" files there?

You might want to delete the whole directory and pull it from git again.  I ran into a few issues of git not fetching the files correctly.

19
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 11, 2013, 03:57:54 pm »
I can play with a 16 core (32 threads w hyperthreading) machine and 64 gig memory. Let me know if you want me to test something new.

Would it be possible to set it for so little memory that everything takes place on L2 cache? That memory should be a lot faster than main RAM - so might see some dramatic improvements at that level - don't know if they could be reproduced on GPU though.

It should be possible, but I doubt it would be advisable.  You'd be looking at using a CHAIN_LENGTH of 768 to get it down to 1MB, but the flip side is requiring 768x the processing.  I'm not an expert, but I doubt the miner would run fast enough to make up the difference.


bytemaster: Could you send me a PM?  I'd like to send you a few ideas on how to harden momentum hash.

20
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 11, 2013, 05:23:51 am »

CPU only miner would definitely help cryptos at a lot of levels. Hopefully those who can help do so!

I completely agree.



I also have a few ideas on how the momentum algorithm can be salvaged, but I'd like to work directly with the developers on the topic if possible.
If one of the Invictus staff could drop me a PM, it would be much appreciated.

21
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 10, 2013, 07:25:07 pm »
I tested gigawatts implementation.

14.33 collisionspermin instead of 180 collisionspermin stock client, 32 threads.

The point is that gigawatts algorithm could yield a GPU implementation which could be a 100x faster version of a slower algorithm, with a net gain.

But I will try to change the memory requirements and see what happens on the CPU if we bring it down to using very little memory.

What were you using for CHAIN_LENGTH?

I've been running a test with CHAIN_LENGTH = 2 on a 1 CPU/1GB test box overnight and got this:
http://i.imgur.com/wknTeLb.png

If I try to run the default implementation, it crashes because it requires too much memory.  However, on a 2 CPU/2 GB test box running only one thread, I get around 2.2 hpm.  However, I need to rerun these tests because the hashperminute calculation was updated with a recent commit.


I've also updated the source code: http://pastebin.com/VTCMFQXg
It now uses the thread interruption from the recent commit and makes debug messages optional.


To compile: Pull the source from the main github branch, then take my code and overwrite momentum.cpp with it.  Compile as normal.

22
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 10, 2013, 08:48:14 am »
I'm happy to report that it works like a charm.
I've updated the source code.
Feel free to test it for yourself.


It correctly finds nonce collisions and follows all the predictions made in my earlier post.
I've conclusively demonstrated that memory requirements can be scaled arbitrarily while retaining an equivalent memory-to-computation ratio.  Latency has been made a non-issue due to immediate return on collision detection and external thread termination.

Here's a snippet of what the output looks like using CHAIN_LENGTH = 4:

Step 31 of 128...
  HITS: 743672
  MISS: 1353480
 TOTAL: 1966081
   PCT: 35
Step 32 of 128...
  HITS: 760208
  MISS: 1336944
 TOTAL: 2031617
   PCT: 36
RAINBOW HIT
  SEEK HASH: 423473256423089
 ORIG NONCE: 28524699
 RAIN NONCE: 5439512 + 1
         0: 203104387400228
         1: 423473256423089
         2: 1041792957055939
         3: 780112116212973
         4: 629783454762940
         5: 675829228796259
         6: 738539531672861
         7: 577969205224283
NAILED IT! Nonce: 28524696 + 3
         0: 959239184726270
         1: 186912846539772
         2: 633556876464486
         3: 423473256423089
         4: 553069074389047
         5: 746169572180730
         6: 539623141106693
         7: 588368479534609
Checking Work:
    Nonce A: 5439513
    Nonce B: 28524699
     Hash A: 423473256423089
     Hash B: 423473256423089




And the view from the outside: http://i.imgur.com/rMUKK5t.png

23
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 10, 2013, 07:22:51 am »
I don't have time at the moment to go through your method, but does momentum_verify(...) work on your results?

I'm working out a few kinks right now.  It said it found a hit earlier, but I accidentally had it dump the wrong info, so I had no way to verify.

This may be true for bandwidth of calculations, but what of latency?    Suppose you turn this from a 1 minute search into a 768 minute search... if you perform it a billion times in parallel you will have amazing bandwidth, but terrible latency.

Absolutely correct.  I edited my main post to reflect handling that.
The simple solution: as soon as you find any birthday collisions, return immediately.  If it found a hit 2 seconds in, spending the next ~750 minutes to finish off the run before returning makes the collisions stale, regardless how many were found.

The second is allowing the parent thread to terminate the children whenever a new block is found.  This is already handled in the main ProtoShares branch since commit 3dfc3cc33.

Given both of these capabilities, latency is no longer an issue.

24
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 10, 2013, 05:55:46 am »
You are right... I owe you that tip... sorry I have been very busy  can you send me a BTC address.  Thanks

Ditto.  ;D



In other news, I think I've finally, finally, actually done it (again).
As we discussed previously, it looks like using a cycle-detection algorithm is no-go because the hash doesn't map to itself.

However, what does work is making a pseudo-rainbow table in memory, on the fly.
The result: the ratio of memory used vs search rate remains exactly the same, but we can scale the size arbitrarily.  For example, instead of taking 768 MB memory and 1 minute per search, we can use (768 MB / X) memory and X minutes per search.

This sounds useless at first, but consider the following: given that we can arbitrarily scale the memory requirements, the only limiting factor how fast we can find collisions is our brute computing power.  What's good at brute computing power?  GPUs and ASICs.


Here's my proof:

Overview:
https://en.wikipedia.org/wiki/Rainbow_table
Make a "rainbow table" map to hold the equivalent hash space of SEARCH_SPACE
  For chains of length CHAIN_LENGTH,
    rainbow table needs (SEARCH_SPACE / CHAIN_LENGTH) elements [name: RAINBOW_ENTRIES]
  Rainbow table will require (8 bytes (hash) + 4 bytes (nonce)) * RAINBOW_ENTRIES bytes total
To compress each SEARCH_SPACE_BITS hash to a new nonce in the range MAX_MOMENTUM_NONCE,
  simply use (hash % MAX_MOMENTUM_NONCE)

Pseudocode:
For the first RAINBOW_ENTRIES
  nonce = element number
  For i = 1 to CHAIN_LENGTH
    cur_hash = HASH(nonce)
    Check if any part of cur_hash is in rainbow table
      If yes and generated by different nonce: add to result list
    For each chunk in cur_hash
      rainbow_map[chunk] = element number
    nonce = compress(cur_hash)

For the next (SEARCH_SPACE - RAINBOW_ENTRIES) elements:
  nonce = element number
  For i = 1 to CHAIN_LENGTH
    cur_hash = Hash(nonce)
    Check if any part of cur_hash is in rainbow table
      If yes and generated by different nonce: add to result list
    nonce = compress(cur_hash)

return result list


// MATH SECTION /////
Given:
CALLS(CHAIN_LENGTH) = total calls to SHA512 using a rainbow table with chain length CHAIN_LENGTH
CALLS(CHAIN_LENGTH) = (SEARCH_SPACE / CHAIN_LENGTH) + (SEARCH_SPACE * CHAIN_LENGTH) - (SEARCH_SPACE / CHAIN_LENGTH)
  + (SEARCH_SPACE / CHAIN_LENGTH) = Calls to generate rainbow table
  + (SEARCH_SPACE * CHAIN_LENGTH) = Calls to check SEARCH_SPACE nonce values
  - (SEARCH_SPACE * CHAIN_LENGTH) = Except we skip the nonces used to make the rainbow table
CALLS(CHAIN_LENGTH) = (SEARCH_SPACE * CHAIN_LENGTH)

Given:
MEMORY(CHAIN_LENGTH) = number of bytes required to store a rainbow table representing
                         SEARCH_SPACE elements with chain length CHAIN_LENGTH
MEMORY(CHAIN_LENGTH) = rainbow table elements * (size of hash chunk + size of nonce)
MEMORY(CHAIN_LENGTH) = (8 + 4) * SEARCH_SPACE / CHAIN_LENGTH

Examples:
Minimum number of SHA512 calls is CHAIN_LENGTH = 1 (equivalent to the "default" semiordered map setup)
  Requires (SEARCH_SPACE * 12) = 768 MB memory
  Requires (SEARCH_SPACE) calls to SHA512
For CHAIN_LENGTH = N
  Requires (SEARCH_SPACE * 12 / N) bytes of memory
  Requires (SEARCH_SPACE * N) calls to SHA512
For CHAIN_LENGTH = 1024
  Requires (2^26 * 12 / 1024) = 768 KB of memory
  Requires (2*26 * 1024) calls to SHA512

Bottom line:
Using a rainbow table map of chain length N,
  memory usage is multiplied by a factor of 1/N
  and SHA512 calls are multiplied by a factor of N
This means that regardless the CHAIN_LENGTH selected,
  the ratio of CALLS(CHAIN_LENGTH) to MEMORY(CHAIN_LENGTH)
  is inversely proportional.
EFFICIENCY IS CONSTANT AT ANY MEMORY TARGET.
All that matters is brute computational strength.

Example:
If a GPU/ASIC/FPGA has PROCESSOR number of parallel computing units,
  and MEM_MAX bytes of available memory, then each
  Momentum hash instance can use up to (MEM_MAX / PROCESSOR) bytes of memory.
To calculate the CHAIN_LEN to acheive this:
  CHAIN_LEN(MEM_MAX, PROCESSOR) = (8 + 4) * SEARCH_SPACE / (MEM_MAX / PROCESSOR)
  CHAIN_LEN(MEM_MAX, PROCESSOR) = (8 + 4) * SEARCH_SPACE * PROCESSOR / MEM_MAX
  Note: This calculation is just the inverse of MEMORY(CHAIN_LEN)

Assuming: GPU with 128 compute units, 1 GB memory, and SEARCH_SPACE = MAX_MOMENTUM_NONCE
  CHAIN_LEN(2^30, 2^7) = (8 + 4) * SEARCH_SPACE * PROCESSOR / MEM_MAX
  CHAIN_LEN(2^30, 2^7) = (8 + 4) * SEARCH_SPACE * 2^7 / 2^30
  CHAIN_LEN(2^30, 2^7) = 12 * SEARCH_SPACE / 2^23
  CHAIN_LEN(2^30, 2^7) = 12 * 2^26 / 2^23
  CHAIN_LEN(2^30, 2^7) = 12 * 2^3
  CHAIN_LEN(2^30, 2^7) = 96
  Verification:
  MEMORY(CHAIN_LENGTH) = (8 + 4) * SEARCH_SPACE / CHAIN_LENGTH
  MEMORY(96) = (8 + 4) * 2^26 / 96
  MEMORY(96) = 8388608 bytes
  MEMORY(96) = 8 MB
  CALLS(CHAIN_LENGTH) = (SEARCH_SPACE * CHAIN_LENGTH)
Result: 1/96th the normal memory usage of 768 MB, 96x the SHA512 calls.
  The problem is no longer about memory, it's about computation speed.



I've actually got a somewhat working proof of concept, although it's quite sub-optimal.  Right now the main issue is using the overkill boost::unordered_map for storage instead of a more bespoke memory structure.  The boost implementation likes to decide it needs to redo the bucket arrangement from time to time, resulting in slowdowns.

It also has an unholy number of debug statements.

If you'd like to check on my progress, the source is right here.  It's a drop-in replacement for "momentum.cpp".



Edit: There's other optimizations possible when using the time/memory tradeoff.  For example, if we normally took 1 minute per nonce space search, cranking up the time/memory ratio to 16 would mean we'd expect 16 minutes per search, which is considerably longer than the target time per block.  To compensate, we can have the algorithm return immediately on first collision discovery.  When a new block is detected, all the miner threads should be stopped and recreated.

25
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 08, 2013, 05:28:01 pm »
It looks like there's a major flaw with birthday collisions as a proof of work:

For any hash algorithm of length L bits, a collision can be found in 2^(L/2) + 1 computations using a constant amount of memory using a cycle detection algorithm.  Given that Momentum uses a 50-bit hash, that means that a collision can be found in 2^25 + 1 steps which 1) is, ironically, less than MAX_MOMENTUM_NONCE, and 2) because it relies almost exclusively on SHA512 calls, can be made massively parallel.
Not only does it require fewer steps, it can also be made stupid parallel.

In fact, there's a whole section of the book "Introduction to Modern Cryptography" dedicated to the subject of collision attacks.  An errata referring to the section can be found here: http://www.cs.umd.edu/~jkatz/imc/hash-erratum.pdf (contains mathematical proofs of the claims listed above)

Additional discussion on the matter can be found here: http://crypto.stackexchange.com/questions/3295/how-does-a-birthday-attack-on-a-hashing-algorithm-work

The fact that the algorithm uses a constant amount of memory regardless of the hash output size means that no matter how large MAX_MOMENTUM_NONCE or SEARCH_SPACE_BITS values are, the memory requirements are essentially zero.  This means that no matter how slow the algorithm, it's infinitely more efficient (memory vs collision rate) than the current implementation.
Given that it requires a constant amount of memory, Momentum as a proof-of-work is essentially equivalent to scrypt/md5/sha512 on a higher difficulty.


To reinforce these claims, I'm actively working on a proof of concept that uses constant memory and almost nothing but SHA512 calls.

26
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 08, 2013, 04:41:45 pm »
The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.

If you don't require that the first match in the sequence of generated pseudo-random numbers, then my upthread described parallelism attack works to allow GPUs to blow away CPUs, assuming the pseudo-random sequence can be generated much faster than the random memory latency bound on the CPU so the GPU can try 100s of values from the same pseudo-random sequence in parallel to hide the memory latency.

If you do require it, then you lose the fast validation time.

Do you get it now or you going to continue to ignore me?

I expended the time to explain this in more detail:

https://bitcointalk.org/index.php?topic=325261.msg3520992#msg3520992

I think that is the last effort I will apply to this.

Even if you allow that a GPU is "leaps and bounds" faster than CPUs, you still run into major problems.

The time complexity of searching with a CPU using memory lookups is O(n), whereas brute-forcing combinations with a GPU is still O(n^2).
Regardless how much faster a GPU is, it's faster by a constant multiple, but the number of calculations is almost squared.

27
General Discussion / Re: $5000 Bounty - Momentum Proof-of-Work Hacking
« on: November 08, 2013, 01:38:19 am »
Quote
gigawatt,
     Thank you for providing the first innovative algorithm for reducing the memory requirements.  Let me attempt to post mitigating factors to your algorithm.

     From a CPU miner perspective, your reduction in memory comes at the expense of performance so does break the algorithmic complexity of the algorithm.

     From a GPU perspective you have to populate a bloom filter with 2^26 results... based upon my understanding of how bloom filters operate this would require updating a common data-structure from every thread and the resulting memory race conditions could create false negatives.   If you have to do this step sequentially, then you might as well use a CPU with memory.   

     So do you have any solid algorithms that can populate a bloom filter with 2^26 results in parallel? 

There's actually a few academic papers on the topic, mainly because bloom filters have lots of applications.

https://sites.google.com/site/venkateshnrao/projects/parallel-bloom-filter-implementation-on-gpu-using-cuda
https://sbs.cse.wustl.edu/pubs/mcbf11.pdf
http://www.gpucomputing.net/?q=node/13381
http://dl.acm.org/citation.cfm?id=1982363

Pages: 1 [2]