Author Topic: Open source optimized PTS CPU miner (BETA)  (Read 17898 times)

0 Members and 1 Guest are viewing this topic.

Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #45 on: January 17, 2014, 08:56:33 am »
Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

Code: [Select]
$ ptsminer-dga-beta3-avx2-linux64.bin PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W 2 sse4
ptsminer-dga-beta3-avx2-linux64.bin: error while loading shared libraries: libboost_system.so.1.53.0: cannot open shared object file: No such file or directory

I have gentoo linux and using repository libs boost 1.52 the binary you provided is compiled against boost 1.53 thanks
« Last Edit: January 17, 2014, 09:01:26 am by noobster »
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline unsoindovo

  • Full Member
  • ***
  • Posts: 123
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #46 on: January 17, 2014, 09:14:09 am »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

hy dga!!
very good job!!!

when a release for windows SO???



Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #47 on: January 17, 2014, 11:54:18 am »
anyway I did notice one thing, why using so many hugepages if the miner @ 4 threads only uses 4 hugepages:

Code: [Select]
# cat /proc/meminfo |grep -i hugepages
AnonHugePages:         0 kB
HugePages_Total:     512
HugePages_Free:      508
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

perhaps
Code: [Select]
echo 4 > /proc/sys/vm/nr_hugepagescould be enough? more than that i would consider waste of memory, or may it use more hugepages over time?

ps. still getting this:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memoryeven after I recompiled my kernel with
Code: [Select]
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
thanks
« Last Edit: January 17, 2014, 12:00:12 pm by noobster »
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #48 on: January 17, 2014, 11:56:58 am »
Code: [Select]
$ ptsminer-dga-beta3-avx2-linux64.bin PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W 2 sse4
ptsminer-dga-beta3-avx2-linux64.bin: error while loading shared libraries: libboost_system.so.1.53.0: cannot open shared object file: No such file or directory

I have gentoo linux and using repository libs boost 1.52 the binary you provided is compiled against boost 1.53 thanks

Ahh.  Can you try:

ptsminer-dga-beta3-avx2-linux64-static.bin.gz

from that same directory and let me know if it works for you?  You'll have to gunzip it before running, obviously. :)

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #49 on: January 17, 2014, 12:31:53 pm »
anyway I did notice one thing, why using so many hugepages if the miner @ 4 threads only uses 4 hugepages:

Code: [Select]
# cat /proc/meminfo |grep -i hugepages
AnonHugePages:         0 kB
HugePages_Total:     512
HugePages_Free:      508
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

perhaps
Code: [Select]
echo 4 > /proc/sys/vm/nr_hugepagescould be enough? more than that i would consider waste of memory, or may it use more hugepages over time?

ps. still getting this:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memoryeven after I recompiled my kernel with
Code: [Select]
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
thanks

Each hugepage is 2MB.  Each thread needs about 600MB.  I'll reduce that by another 50MB in beta4 later today, but for now, that's the math.  so you need 300 hugepages per thread.  With 6 threads, that's 1800 hugepages.

echo 2048 > /proc/sys/vm/nr_hugepages

for 6 threads, or something a little higher if you want to try more threads.

  -Dave

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #50 on: January 17, 2014, 02:44:01 pm »
ok - thanks again for the feedback on this.  I've put beta4 online in the usual place:

http://www.cs.cmu.edu/~dga/ptsminer/

Along with a static build to address the gentoo library versioning issue.

I'm kind of proud of this one - it's the first of the Haswell builds that cracks 500 cpm on a non-overclocked CPU.  I haven't quite determined if 6 or 7 threads is better, but it's one of those two settings.

Delta from beta3:
  - Uses about 20MB less memory per thread
  - Further optimized sha512 computation code
  - Static build is now part of my default build chain, so we'll keep this one around.
  - Still 3% advanced-build dev fee, but I hope that the 170cpm you'll get more than any other miner should more than compensate for that.  :-)

Offline jernau

  • Full Member
  • ***
  • Posts: 78
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #51 on: January 17, 2014, 03:10:50 pm »
ok - thanks again for the feedback on this.  I've put beta4 online in the usual place:

http://www.cs.cmu.edu/~dga/ptsminer/

Along with a static build to address the gentoo library versioning issue.

I'm kind of proud of this one - it's the first of the Haswell builds that cracks 500 cpm on a non-overclocked CPU.  I haven't quite determined if 6 or 7 threads is better, but it's one of those two settings.

Delta from beta3:
  - Uses about 20MB less memory per thread
  - Further optimized sha512 computation code
  - Static build is now part of my default build chain, so we'll keep this one around.
  - Still 3% advanced-build dev fee, but I hope that the 170cpm you'll get more than any other miner should more than compensate for that.  :-)

That sounds good. Just to be clear, do we actually need a Haswell CPU to use this build?
PTS: PgiEykg2RATYwWYhFtyNRqwSxQyEApLSmW

Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #52 on: January 17, 2014, 03:28:33 pm »
the static bin does not work for me as well i am using non avx cpu (1st gen. core i3)
I get
Code: [Select]
Illegal instruction
« Last Edit: January 17, 2014, 03:52:05 pm by noobster »
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline archit

  • Full Member
  • ***
  • Posts: 161
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #53 on: January 17, 2014, 05:14:11 pm »
dga, work on cudapts too please

Offline Gwynbleidd

  • Jr. Member
  • **
  • Posts: 35
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #54 on: January 17, 2014, 05:36:32 pm »
How to compile it?

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #55 on: January 17, 2014, 05:53:01 pm »
the static bin does not work for me as well i am using non avx cpu (1st gen. core i3)
I get
Code: [Select]
Illegal instruction

Right.  The pre-built one is *just* my advanced preview for avx2.  For other architectures, just grab the current version from the open source release and build.

  -Dave

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #56 on: January 17, 2014, 10:57:43 pm »
Ok - there's now an advanced preview of beta4 for avx/sse in addition to the avx one. 

http://www.cs.cmu.edu/~dga/ptsminer/

Be sure to grab the right version (beta4) and architecture (avx2 or avxsse) for your machine.  If you're not using the latest Ubuntu, grab the -static version to have a better chance of it working.

Feedback welcome.  I don't have a good set of avxsse machines to compare on, so I don't know how this one compares against yam.  Where the avx2 version is quite a bit faster, this one is still probably just in the same ballpark.  3% dev fee, but I'll cut that down to 1% if it's not beating yam by enough to make it worth paying the dev fee.  *grin*

Making headway at getting the build working better, but it's still a ghastly piece of spaghetti and not fit for pushing to the repository.

  -Dave

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #57 on: January 18, 2014, 01:29:50 am »
Hi dga,

beta4 avxsse has 315 - 320 cpm in my e3-1230 v2 avx gentoo box.

the yam is about 305 - 310 cpm.

so you win about 3% ahead :)

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #58 on: January 18, 2014, 01:35:10 am »
Hi dga,

beta4 avxsse has 315 - 320 cpm in my e3-1230 v2 avx gentoo box.

the yam is about 305 - 310 cpm.

so you win about 3% ahead :)

Hah.  Thanks!  Not quite enough to justify that dev fee, though.  I'll see if I can make it a bit faster and earn my keep.

In the meantime, I'm just going to go ahead and admit that I have a problem.  I can't keep my toes out of optimizing the avx2 build, so I've put beta5 online.  This one is even more annoyingly architecture-specific, so ONLY haswell / avx2 people should even bother with it.  I just put the static build online because I'm still working out more kinks I introduced into the build process for optimization.  It's getting about 530 cpm on my stock i7-4770.

  -Dave

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #59 on: January 18, 2014, 01:47:56 am »
[email protected]:/home/proto/ptsminer/src# make -f makefile.unix.no-chrono

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
obj/sha512.o: In function `Init_SHA512_avx2':
sha512.c:(.text+0x27): undefined reference to `sha512_transform_rorx'
sha512.c:(.text+0x32): undefined reference to `sha512_transform_single_rorx'
collect2: ld returned 1 exit status
make: *** [ptsminer] Error 1

What do I need to do? Thanks in advance

Ah - I haven't updated the no-chrono makefile

You can run by hand:

gcc intel/sha512_avx2.S -O3 -o obj/sha512_avx2.o

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_avx2.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread

Or try grabbing one of the newer, faster binary builds.

I'll patch up that makefile pretty soon.  Thanks for letting me know - I wasn't sure if anyone wanted to use it.

  -Dave