Author Topic: Open source optimized PTS CPU miner (BETA)  (Read 47706 times)

0 Members and 1 Guest are viewing this topic.

Offline barwizi

  • Hero Member
  • *****
  • Posts: 764
  • Noirbits, NoirShares, NoirEx.....lol, noir anyone?
    • View Profile
    • Noirbitstalk.org
Re: Open source optimized PTS CPU miner (BETA)
« Reply #131 on: February 19, 2014, 10:55:19 pm »
can this be adapated to work with the client?
--Bar--  PiNEJGUv4AZVZkLuF6hV4xwbYTRp5etWWJ

The magical land of crypto, no freebies people.

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #130 on: February 10, 2014, 09:49:06 pm »
on my haswell e3-1230 v3, the yam-M7m is about 640 cpm,
the ptsminer-avx2-beta10 is almost same.

At present I prefer yam-M7m since it

1. as fast as ptsminer
2. support windows so I can run same software/config on
   all my computer
3. support more protocol to avoid pool lock-in.
4. 1% dev fee.
5. I can setup a backup pool/coin

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #129 on: February 10, 2014, 04:50:46 pm »
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770.  (updated:  not a k, sorry, just the normal 4770) Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.

Offline yvg1900

  • Full Member
  • ***
  • Posts: 198
    • View Profile
  • BitShares: yvg1900
Re: Open source optimized PTS CPU miner (BETA)
« Reply #128 on: February 10, 2014, 04:50:05 pm »
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770k.  Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.

Curious to compare it against yam M7m on same config. That one shall hit 700+ cpm on that mach.
Follow @yvg1900 on Twitter for yam miner updates and support

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #127 on: February 10, 2014, 04:13:46 pm »
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770k.  Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.

Nice... not bad momentum POW, not bad...  looks like CPU mining of PTS will remain viable for a long time to come.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #126 on: February 10, 2014, 03:13:15 pm »
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770k.  Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.
« Last Edit: February 10, 2014, 03:24:36 pm by dga »

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #125 on: February 04, 2014, 07:06:26 am »
yes

Original source, looking at time spent in momentum_pow_test:
   User time (seconds): 5.24
   Maximum resident set size (kbytes): 1050652
       43.84%  momentum_pow_te  libcrypto.so.1.0.0  [.] 0x000000000006c729

With patch 1 for algorithmic changes:
   User time (seconds): 4.38
   Maximum resident set size (kbytes): 528452
       71.51%  momentum_pow_te  libcrypto.so.1.0.0   [.] 0x000000000006d29f

The reason it's not quite as much faster is that it's spending a little more time in allocation in the sha512 routine, which is being used differently from the one in PTS.  I'll clean that up as part of the sha512 optimizations in chunk-of-work #2.  That part is straightforward engineering.

Probably the biggest benefit to the current version is that, as shown above, it uses half the memory and is about 20% faster with no changes to the crypto or any other libraries.

Sending pull request now.

When I pulled this change in it stopped finding matches...

Whoops - thanks, I'd misunderstood test_momentum_pow.

I've fixed it in a second pull request.  It was a missing enc.reset().

Interestingly, you'll find that my version now finds a few more collisions than the original code did, which should produce a further speed-up.  These collisions verify.

Old:

3522368ms th_a       momentum_test.cpp:29          main                 ] [[25908781,36251059],[36251059,25908781],[14409167,49012845],[49012845,14409167],[32190345,58604277],[58604277,32190345],[11166445,59732725],[59732725,11166445],[41830614,64427554],[64427554,41830614]]

   User time (seconds): 5.09

New:

/usr/bin/time --verbose ./tests/momentum_pow_test  5959592
98735ms th_a       momentum_test.cpp:29          main                 ] [[29995035,64113291],[64113291,29995035],[41830614,64427554],[64427554,41830614],[32190345,58604277],[58604277,32190345],[11166445,59732725],[59732725,11166445],[14409167,49012845],[49012845,14409167],[25908781,36251059],[36251059,25908781]]

   User time (seconds): 3.43

Sorry for the double-try on that one.

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #124 on: February 04, 2014, 03:37:18 am »
yes

Original source, looking at time spent in momentum_pow_test:
   User time (seconds): 5.24
   Maximum resident set size (kbytes): 1050652
       43.84%  momentum_pow_te  libcrypto.so.1.0.0  [.] 0x000000000006c729

With patch 1 for algorithmic changes:
   User time (seconds): 4.38
   Maximum resident set size (kbytes): 528452
       71.51%  momentum_pow_te  libcrypto.so.1.0.0   [.] 0x000000000006d29f

The reason it's not quite as much faster is that it's spending a little more time in allocation in the sha512 routine, which is being used differently from the one in PTS.  I'll clean that up as part of the sha512 optimizations in chunk-of-work #2.  That part is straightforward engineering.

Probably the biggest benefit to the current version is that, as shown above, it uses half the memory and is about 20% faster with no changes to the crypto or any other libraries.

Sending pull request now.

When I pulled this change in it stopped finding matches...
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #123 on: February 04, 2014, 01:54:57 am »
yes

Original source, looking at time spent in momentum_pow_test:
   User time (seconds): 5.24
   Maximum resident set size (kbytes): 1050652
       43.84%  momentum_pow_te  libcrypto.so.1.0.0  [.] 0x000000000006c729

With patch 1 for algorithmic changes:
   User time (seconds): 4.38
   Maximum resident set size (kbytes): 528452
       71.51%  momentum_pow_te  libcrypto.so.1.0.0   [.] 0x000000000006d29f

The reason it's not quite as much faster is that it's spending a little more time in allocation in the sha512 routine, which is being used differently from the one in PTS.  I'll clean that up as part of the sha512 optimizations in chunk-of-work #2.  That part is straightforward engineering.

Probably the biggest benefit to the current version is that, as shown above, it uses half the memory and is about 20% faster with no changes to the crypto or any other libraries.

Sending pull request now.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #122 on: February 04, 2014, 12:19:21 am »
yes

Cool.  I'll need to change a little bit of the code to match the style, but should have it done soon.

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #121 on: February 04, 2014, 12:14:57 am »
yes
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #120 on: February 04, 2014, 12:06:29 am »
Sounds fair enough as long as the result is a pull request that simply works with CMake.

First batch of changes are now in a pull request to you:

https://github.com/InvictusInnovations/ProtoShares/pull/8

I spent a lot of time today thinking about this one for how to provide the best balance of performance improvement while ensuring that the reference code is as easy for people to use as possible on any platform of their choice.  As a consequence, I've refactored some of the algorithmic changes a little to try to make the best use of the existing SHA512 from OpenSSL.  I'm going to do a set of benchmark runs tomorrow to determine how much of a benefit there is on non-AVX2/Haswell platforms to being more architecture specific.  If the results don't justify making the build bad, I'll put in the AVX2 changes in a small module that people can integrate on their own if they wish, but that won't touch anything in the build.  If they're good, I'll do deeper modifications.

The current changes preserve the exact interface and code structure of the existing momentum_search, per your request.  They don't touch anything outside of the mining core code.  The results are about an 8x speedup on my test platform using about 50% of the memory.  4x of that speedup and all of the memory savings comes from the algorithmic improvements;  2x comes from testing the nonces in both directions when evaluating the collision.

I put some performance evaluation numbers in the pull request, but to briefly summarize, before the changes, each thread was taking about 28-28 seconds to do one execution of momentum_search.  After the changes, they take 7-8.

Before:
 83.10%  bitcoind  bitcoind                   [.] bts::momentum_search(uint256)
 12.95%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006d764

Only 13% of the time was being spent in computing SHA512 hashes.  After:

 70.36%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006cece

(update, forgot to give my PTS address:   Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )

   -Dave

[Update 2:  As another way to view the stats, quad-core i7-4770 is doing:

dga@homewell:~/coin/ProtoShares/src$ ./bitcoind getmininginfo
{
    "blocks" : 47838,
    "currentblocksize" : 5063,
    "currentblocktx" : 18,
    "difficulty" : 0.01374487,
    "errors" : "",
    "generate" : true,
    "genproclimit" : -1,
    "collisionspermin" : 240.86192739,
    "pooledtx" : 31,
    "testnet" : false
}

with no AVX2 optimizations, so this speed is probably what one might expect on an sse or avx platform.  Quite a bit faster than the default code.


There seems to have been a misunderstanding :)   I was looking for an update to the BitSHares repository for the same method.

Repository URL?
update:  Ahh, you mean this one?

https://github.com/InvictusInnovations/bitshares

Confirm and I'll get the patch done.  Should be straightforward.
« Last Edit: February 04, 2014, 12:08:58 am by dga »

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #119 on: February 04, 2014, 12:03:48 am »
Sounds fair enough as long as the result is a pull request that simply works with CMake.

First batch of changes are now in a pull request to you:

https://github.com/InvictusInnovations/ProtoShares/pull/8

I spent a lot of time today thinking about this one for how to provide the best balance of performance improvement while ensuring that the reference code is as easy for people to use as possible on any platform of their choice.  As a consequence, I've refactored some of the algorithmic changes a little to try to make the best use of the existing SHA512 from OpenSSL.  I'm going to do a set of benchmark runs tomorrow to determine how much of a benefit there is on non-AVX2/Haswell platforms to being more architecture specific.  If the results don't justify making the build bad, I'll put in the AVX2 changes in a small module that people can integrate on their own if they wish, but that won't touch anything in the build.  If they're good, I'll do deeper modifications.

The current changes preserve the exact interface and code structure of the existing momentum_search, per your request.  They don't touch anything outside of the mining core code.  The results are about an 8x speedup on my test platform using about 50% of the memory.  4x of that speedup and all of the memory savings comes from the algorithmic improvements;  2x comes from testing the nonces in both directions when evaluating the collision.

I put some performance evaluation numbers in the pull request, but to briefly summarize, before the changes, each thread was taking about 28-28 seconds to do one execution of momentum_search.  After the changes, they take 7-8.

Before:
 83.10%  bitcoind  bitcoind                   [.] bts::momentum_search(uint256)
 12.95%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006d764

Only 13% of the time was being spent in computing SHA512 hashes.  After:

 70.36%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006cece

(update, forgot to give my PTS address:   Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )

   -Dave

[Update 2:  As another way to view the stats, quad-core i7-4770 is doing:

dga@homewell:~/coin/ProtoShares/src$ ./bitcoind getmininginfo
{
    "blocks" : 47838,
    "currentblocksize" : 5063,
    "currentblocktx" : 18,
    "difficulty" : 0.01374487,
    "errors" : "",
    "generate" : true,
    "genproclimit" : -1,
    "collisionspermin" : 240.86192739,
    "pooledtx" : 31,
    "testnet" : false
}

with no AVX2 optimizations, so this speed is probably what one might expect on an sse or avx platform.  Quite a bit faster than the default code.


There seems to have been a misunderstanding :)   I was looking for an update to the BitSHares repository for the same method.   
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #118 on: February 03, 2014, 10:44:21 pm »
Sounds fair enough as long as the result is a pull request that simply works with CMake.

First batch of changes are now in a pull request to you:

https://github.com/InvictusInnovations/ProtoShares/pull/8

I spent a lot of time today thinking about this one for how to provide the best balance of performance improvement while ensuring that the reference code is as easy for people to use as possible on any platform of their choice.  As a consequence, I've refactored some of the algorithmic changes a little to try to make the best use of the existing SHA512 from OpenSSL.  I'm going to do a set of benchmark runs tomorrow to determine how much of a benefit there is on non-AVX2/Haswell platforms to being more architecture specific.  If the results don't justify making the build bad, I'll put in the AVX2 changes in a small module that people can integrate on their own if they wish, but that won't touch anything in the build.  If they're good, I'll do deeper modifications.

The current changes preserve the exact interface and code structure of the existing momentum_search, per your request.  They don't touch anything outside of the mining core code.  The results are about an 8x speedup on my test platform using about 50% of the memory.  4x of that speedup and all of the memory savings comes from the algorithmic improvements;  2x comes from testing the nonces in both directions when evaluating the collision.

I put some performance evaluation numbers in the pull request, but to briefly summarize, before the changes, each thread was taking about 28-28 seconds to do one execution of momentum_search.  After the changes, they take 7-8.

Before:
 83.10%  bitcoind  bitcoind                   [.] bts::momentum_search(uint256)
 12.95%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006d764

Only 13% of the time was being spent in computing SHA512 hashes.  After:

 70.36%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006cece

(update, forgot to give my PTS address:   Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )

   -Dave

[Update 2:  As another way to view the stats, quad-core i7-4770 is doing:

dga@homewell:~/coin/ProtoShares/src$ ./bitcoind getmininginfo
{
    "blocks" : 47838,
    "currentblocksize" : 5063,
    "currentblocktx" : 18,
    "difficulty" : 0.01374487,
    "errors" : "",
    "generate" : true,
    "genproclimit" : -1,
    "collisionspermin" : 240.86192739,
    "pooledtx" : 31,
    "testnet" : false
}

with no AVX2 optimizations, so this speed is probably what one might expect on an sse or avx platform.  Quite a bit faster than the default code.
« Last Edit: February 03, 2014, 11:58:58 pm by dga »

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #117 on: February 03, 2014, 03:24:06 am »
Sounds fair enough as long as the result is a pull request that simply works with CMake.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.