Author Topic: Open source optimized PTS CPU miner (BETA)  (Read 47687 times)

0 Members and 1 Guest are viewing this topic.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #116 on: February 03, 2014, 01:04:12 am »
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk

That's do-able.  It's mostly just making sure that compilation isn't a mess.  Basically, I'd modularize it as:

  generate_sha512(buf, num_hashes, starting_nonce);

And as long as there was a version of generate_sha512 that worked reasonably well, it would be fine.

The tweaks:  My first reaction is adding more branches/conditionals to cause warp divergence on a SIMD machine.  That would slow down the CPUs, too, of course, but it would be really painful for the GPUs.  I'm not sure exactly where I'd add them.  Possibly a changed sha512 core with a slightly variable or tweaked number of rounds depending on something in the input that caused divergence on a per-nonce basis.  It'd make it more evil, at least.  If you could force all 16 or 32 units in the vector to have to diverge early on and remain diverged through the end of the sha512, you'd slow the GPUs by a factor of 16 or 32 while only slowing the CPU down by 2-4x.

Well you know this stuff better than most, so come up with something solid and we will include it.

Sounds good.

Here's what I propose.  Perhaps surprisingly, it's taken way more work to do the fast CPU implementation of PTS than the basic GPU implementation.  Instead of going by my consulting rates (grin), I'll admit that I did it for fun, too, and judge it be about 450 PTS worth of work based upon the previous rates you were offering, and about 50PTS more work to actually manage the integration into momentum.cpp, since it differs substantially from the codebase I've been developing on.

Instead of having it all in one chunk, though, I think it makes more sense to split it in half for two different deliverables to help reduce risk and get something in your hands faster:

(a)  Algorithmic improvements to mining that are completely platform-independent.  (250).
(b)  Platform-optimized implementation for sse4, avx, and avx2, delivered as GNU assembly code along with original source code files to generate that assembly.  (250)

Both documented, of course.

I think I can get (a) done reasonably straightforwardly.  For (b), I'll need to spend more time understanding the Makefile setup for it so that I can integrate it without breaking things.

As a nitpicky note based upon the copyright issues that arose in my previous release, just to be up front:  Like other high-performance miners, for everything but avx2, I use the Intel sha512 implementation.  Its license is compatible (redistributions must include the copyright notice).  The code I'd integrate into momentum.cpp is entirely my own at this point, and I'd simply integrate it under the existing license.

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #115 on: February 03, 2014, 12:46:45 am »
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk

That's do-able.  It's mostly just making sure that compilation isn't a mess.  Basically, I'd modularize it as:

  generate_sha512(buf, num_hashes, starting_nonce);

And as long as there was a version of generate_sha512 that worked reasonably well, it would be fine.

The tweaks:  My first reaction is adding more branches/conditionals to cause warp divergence on a SIMD machine.  That would slow down the CPUs, too, of course, but it would be really painful for the GPUs.  I'm not sure exactly where I'd add them.  Possibly a changed sha512 core with a slightly variable or tweaked number of rounds depending on something in the input that caused divergence on a per-nonce basis.  It'd make it more evil, at least.  If you could force all 16 or 32 units in the vector to have to diverge early on and remain diverged through the end of the sha512, you'd slow the GPUs by a factor of 16 or 32 while only slowing the CPU down by 2-4x.

Well you know this stuff better than most, so come up with something solid and we will include it. 
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline Coindgr

  • Newbie
  • *
  • Posts: 18
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #114 on: February 02, 2014, 10:57:46 pm »
Will this be released to windows?
I hope so

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #113 on: February 02, 2014, 10:03:48 pm »
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk

That's do-able.  It's mostly just making sure that compilation isn't a mess.  Basically, I'd modularize it as:

  generate_sha512(buf, num_hashes, starting_nonce);

And as long as there was a version of generate_sha512 that worked reasonably well, it would be fine.

The tweaks:  My first reaction is adding more branches/conditionals to cause warp divergence on a SIMD machine.  That would slow down the CPUs, too, of course, but it would be really painful for the GPUs.  I'm not sure exactly where I'd add them.  Possibly a changed sha512 core with a slightly variable or tweaked number of rounds depending on something in the input that caused divergence on a per-nonce basis.  It'd make it more evil, at least.  If you could force all 16 or 32 units in the vector to have to diverge early on and remain diverged through the end of the sha512, you'd slow the GPUs by a factor of 16 or 32 while only slowing the CPU down by 2-4x.

(PTS deposits:  Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )
« Last Edit: February 03, 2014, 11:12:36 pm by dga »

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #112 on: February 02, 2014, 09:56:17 pm »
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #111 on: February 02, 2014, 09:52:57 pm »
Any chance I can get the code from this miner integrated with bitshares/src/momentum.cpp  API?

I would be willing to pay a reasonable number of PTS for the work. 

API:   
Code: [Select]
std::vector< std::pair<uint32_t,uint32_t> > momentum_search( pow_seed_type head )
Thoughts?

Sure, I'm happy to figure out a value that works.

Let me lay out the catch a little bit:  The compilation chain is ugly because I generate a few CPU-specific chunks of code.  I can put all of that in a repository, and by outputting assembly from the first step, it could all be compilable by gcc -- or from the original source if someone installed some other compiler support tools.

There are really two major contributions that make it fast:
  - Some algorithmic changes that make the memory-hard parts faster;
  - A re-implementation of the sha512 code for AVX2;
  - An AVX/SSE implementation of other high-performance parts of the code.

The algorithmic changes are easy and will make any codebase faster and use less memory.  The nitty gritty implementation bits start to get architecture specific.  But I'm happy to include them.

The only drawback from my perspective is that the AVX2 SHA512 changes are also very pertinent to making Memorycoin faster, and I haven't yet started writing a miner for that one.  *grins*  But I'm willing to be scooped.

Same license as the original momentum is fine.

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #110 on: February 02, 2014, 05:15:42 pm »
Any chance I can get the code from this miner integrated with bitshares/src/momentum.cpp  API?

I would be willing to pay a reasonable number of PTS for the work. 

API:   
Code: [Select]
std::vector< std::pair<uint32_t,uint32_t> > momentum_search( pow_seed_type head )
Thoughts?
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #109 on: February 02, 2014, 04:11:08 am »
It runs at 599.6 c/m for at long time and I'm really panic.

thank god now cpm : 600.5.

[STATS] 2014-Feb-02 12:08:19 | 600.5 c/m | 9.3 sh/m | VL: 1885 (99.5%), RJ: 9 (0.5%), ST: 0 (0.0%)

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #108 on: February 02, 2014, 02:40:51 am »
haswell e1230-v3 avx2 beta9 upgraded, cpm : 595

[STATS] 2014-Feb-02 10:35:29 | 595.1 c/m | 8.8 sh/m | VL: 1004 (99.5%), RJ: 5 (0.5%), ST: 0 (0.0%)

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #107 on: February 02, 2014, 12:03:20 am »
Quote
The haswell/AVX2 release is very solid and beats low-end GPUs:  It's sitting just above 600 c/m.  A cheap GPU (GT 640 GDDR5 -- $85) can get about 250 cpm.  The fastest ($600-$1000) get around 2000-2200cpm.  The GPUs are still ahead in cpm/$, but not by a shocking margin.  Haswell is 610cpm for $300, or about 2cpm/$.  An R9 290x is 2200cpm/$610 = 3.6cpm/$.

Considering you can build a high end CPU miner for less than the cost of a high end GPU miner I would have to contend that momentum has served its intended goals quite well. 
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #106 on: February 01, 2014, 09:57:11 pm »
Well, I'll be.  I guess we're entering the CPU mess zone.  (Deleted old post)

Solved, thanks to some help from mikaelh_ on #beeeeer. 

There's now only one binary, but on AMD, run with sse4 explicitly:

./ptsminer...   <addr>  <threads>  sse4

You'll be much happier than with avx.  For Intel, auto-detect works, and avx is better.
« Last Edit: February 01, 2014, 10:30:30 pm by dga »

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #105 on: February 01, 2014, 09:40:24 pm »
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

beta9 for AVX is also now online.  This one should be a good speed boost - I'm seeing my test machine go from about 780cpm to 1020cpm.

Note:  Unlike prior avxsse releases, this avx release really does require AVX.  It's compiled to target sandy bridge and higher.  I've changed the name of the binary to reflect this, and left the old avxsse one (which will run on sse4) online.

Direct link:  http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta9-avx-linux64-static.bin

Happy mining!

Nice... how does this compare to the latest GPU mining?

I think I broke something.  This one is a lot better on my AMD test CPU and absolutely horrible on my Intel CPUs.  Back to the drawing board.  Beta8 is the one to stick with for Intel. (update:  beta9 is now working properly for Intel)

The haswell/AVX2 release is very solid and beats low-end GPUs:  It's sitting just above 600 c/m.  A cheap GPU (GT 640 GDDR5 -- $85) can get about 250 cpm.  The fastest ($600-$1000) get around 2000-2200cpm.  The GPUs are still ahead in cpm/$, but not by a shocking margin.  Haswell is 610cpm for $300, or about 2cpm/$.  An R9 290x is 2200cpm/$610 = 3.6cpm/$.
« Last Edit: February 02, 2014, 01:05:37 am by dga »

Offline bytemaster

Re: Open source optimized PTS CPU miner (BETA)
« Reply #104 on: February 01, 2014, 09:12:13 pm »
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

beta9 for AVX is also now online.  This one should be a good speed boost - I'm seeing my test machine go from about 780cpm to 1020cpm.

Note:  Unlike prior avxsse releases, this avx release really does require AVX.  It's compiled to target sandy bridge and higher.  I've changed the name of the binary to reflect this, and left the old avxsse one (which will run on sse4) online.

Direct link:  http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta9-avx-linux64-static.bin

Happy mining!

Nice... how does this compare to the latest GPU mining?
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #103 on: February 01, 2014, 08:24:32 pm »
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

beta9 for AVX is also now online.  This one should be a good speed boost - I'm seeing my test machine go from about 780cpm to 1020cpm.

Note:  Unlike prior avxsse releases, this avx release really does require AVX.  It's compiled to target sandy bridge and higher.  I've changed the name of the binary to reflect this, and left the old avxsse one (which will run on sse4) online.

Direct link:  http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta9-avx-linux64-static.bin

Happy mining!

Update:  This one is producing very mixed results.  Try beta8 and beta9 and use whichever is better for you.  Beta9 is rocking on my AMD test CPU, but it seems slower on some others.  Definitely needs improvement still.
« Last Edit: February 01, 2014, 09:26:39 pm by dga »

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #102 on: February 01, 2014, 07:19:38 pm »
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

This is a speed-boost release.  I'm still doing the benchmarking runs, but on my i7-4770, it's the first of my releases to crack 600 cpm.  Looks like it's going to settle in between 610 and 620 cpm with 7 threads running on my test box.

beta9 is haswell-only right now;  its optimizations are specific to avx2.  I plan to address some of the portability/pool selection issues soon (because I'm running out of great ideas for how to make this thing faster without getting ugly).