Author Topic: Open source optimized PTS CPU miner (BETA)  (Read 47683 times)

0 Members and 1 Guest are viewing this topic.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #56 on: January 17, 2014, 05:53:01 pm »
the static bin does not work for me as well i am using non avx cpu (1st gen. core i3)
I get
Code: [Select]
Illegal instruction

Right.  The pre-built one is *just* my advanced preview for avx2.  For other architectures, just grab the current version from the open source release and build.

  -Dave

Offline Gwynbleidd

  • Jr. Member
  • **
  • Posts: 35
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #55 on: January 17, 2014, 05:36:32 pm »
How to compile it?

Offline archit

  • Full Member
  • ***
  • Posts: 161
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #54 on: January 17, 2014, 05:14:11 pm »
dga, work on cudapts too please

Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #53 on: January 17, 2014, 03:28:33 pm »
the static bin does not work for me as well i am using non avx cpu (1st gen. core i3)
I get
Code: [Select]
Illegal instruction
« Last Edit: January 17, 2014, 03:52:05 pm by noobster »
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline jernau

  • Full Member
  • ***
  • Posts: 78
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #52 on: January 17, 2014, 03:10:50 pm »
ok - thanks again for the feedback on this.  I've put beta4 online in the usual place:

http://www.cs.cmu.edu/~dga/ptsminer/

Along with a static build to address the gentoo library versioning issue.

I'm kind of proud of this one - it's the first of the Haswell builds that cracks 500 cpm on a non-overclocked CPU.  I haven't quite determined if 6 or 7 threads is better, but it's one of those two settings.

Delta from beta3:
  - Uses about 20MB less memory per thread
  - Further optimized sha512 computation code
  - Static build is now part of my default build chain, so we'll keep this one around.
  - Still 3% advanced-build dev fee, but I hope that the 170cpm you'll get more than any other miner should more than compensate for that.  :-)

That sounds good. Just to be clear, do we actually need a Haswell CPU to use this build?
PTS: PgiEykg2RATYwWYhFtyNRqwSxQyEApLSmW

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #51 on: January 17, 2014, 02:44:01 pm »
ok - thanks again for the feedback on this.  I've put beta4 online in the usual place:

http://www.cs.cmu.edu/~dga/ptsminer/

Along with a static build to address the gentoo library versioning issue.

I'm kind of proud of this one - it's the first of the Haswell builds that cracks 500 cpm on a non-overclocked CPU.  I haven't quite determined if 6 or 7 threads is better, but it's one of those two settings.

Delta from beta3:
  - Uses about 20MB less memory per thread
  - Further optimized sha512 computation code
  - Static build is now part of my default build chain, so we'll keep this one around.
  - Still 3% advanced-build dev fee, but I hope that the 170cpm you'll get more than any other miner should more than compensate for that.  :-)

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #50 on: January 17, 2014, 12:31:53 pm »
anyway I did notice one thing, why using so many hugepages if the miner @ 4 threads only uses 4 hugepages:

Code: [Select]
# cat /proc/meminfo |grep -i hugepages
AnonHugePages:         0 kB
HugePages_Total:     512
HugePages_Free:      508
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

perhaps
Code: [Select]
echo 4 > /proc/sys/vm/nr_hugepagescould be enough? more than that i would consider waste of memory, or may it use more hugepages over time?

ps. still getting this:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memoryeven after I recompiled my kernel with
Code: [Select]
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
thanks

Each hugepage is 2MB.  Each thread needs about 600MB.  I'll reduce that by another 50MB in beta4 later today, but for now, that's the math.  so you need 300 hugepages per thread.  With 6 threads, that's 1800 hugepages.

echo 2048 > /proc/sys/vm/nr_hugepages

for 6 threads, or something a little higher if you want to try more threads.

  -Dave

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #49 on: January 17, 2014, 11:56:58 am »
Code: [Select]
$ ptsminer-dga-beta3-avx2-linux64.bin PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W 2 sse4
ptsminer-dga-beta3-avx2-linux64.bin: error while loading shared libraries: libboost_system.so.1.53.0: cannot open shared object file: No such file or directory

I have gentoo linux and using repository libs boost 1.52 the binary you provided is compiled against boost 1.53 thanks

Ahh.  Can you try:

ptsminer-dga-beta3-avx2-linux64-static.bin.gz

from that same directory and let me know if it works for you?  You'll have to gunzip it before running, obviously. :)

Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #48 on: January 17, 2014, 11:54:18 am »
anyway I did notice one thing, why using so many hugepages if the miner @ 4 threads only uses 4 hugepages:

Code: [Select]
# cat /proc/meminfo |grep -i hugepages
AnonHugePages:         0 kB
HugePages_Total:     512
HugePages_Free:      508
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

perhaps
Code: [Select]
echo 4 > /proc/sys/vm/nr_hugepagescould be enough? more than that i would consider waste of memory, or may it use more hugepages over time?

ps. still getting this:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memoryeven after I recompiled my kernel with
Code: [Select]
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
thanks
« Last Edit: January 17, 2014, 12:00:12 pm by noobster »
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline unsoindovo

  • Full Member
  • ***
  • Posts: 123
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #47 on: January 17, 2014, 09:14:09 am »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

hy dga!!
very good job!!!

when a release for windows SO???



Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #46 on: January 17, 2014, 08:56:33 am »
Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

Code: [Select]
$ ptsminer-dga-beta3-avx2-linux64.bin PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W 2 sse4
ptsminer-dga-beta3-avx2-linux64.bin: error while loading shared libraries: libboost_system.so.1.53.0: cannot open shared object file: No such file or directory

I have gentoo linux and using repository libs boost 1.52 the binary you provided is compiled against boost 1.53 thanks
« Last Edit: January 17, 2014, 09:01:26 am by noobster »
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline Brekyrself

  • Hero Member
  • *****
  • Posts: 512
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #45 on: January 17, 2014, 04:47:41 am »
Would like to test a non avx win64 build :)  I'm stuck with a few x58 systems still!

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #44 on: January 17, 2014, 04:29:38 am »
update for beta3:

after running 30 min, it keeps 462 cpm now.

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #43 on: January 17, 2014, 03:40:58 am »
Hi dga,

I'm running  ptsminer-dga-beta2-avx2-linux64 on my gentoo box.

echo "3072" > /proc/sys/vm/nr_hugepages

and start 6 worker thread, I get 450 cpm on it's e-1230v3 Haswell CPU.

echo "3584" > /proc/sys/vm/nr_hugepages

and start 7 worker thread, I get 458 cpm.

for your information, the yam runs at about 330 cpm on the same machine.

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #42 on: January 17, 2014, 02:17:07 am »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*