Author Topic: Open source optimized PTS CPU miner (BETA)  (Read 17103 times)

0 Members and 1 Guest are viewing this topic.

Offline honger18

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #30 on: January 16, 2014, 03:14:23 pm »
Quote
Oof.  This is going to be a problem on a 32 bit system.  There are some very x86_64 specific chunks of code in the assembly-optimized sha512 routines (which you need if you want this thing to be fast).

Sorry.

I was afraid of that, no problem. Maybe finally a good reason to covert my main desktop to 64bit...

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #31 on: January 16, 2014, 04:01:49 pm »
I've released beta 2 of my AVX2-optimized build for Linux x64:

http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta2-avx2-linux64.bin

(Note the changed URL).  This one is still binary-only -- I've been focusing on speed, not making it possible for anyone else to build this hunk o'junk code.

This is the first version of my code that beats 400 cpm on a stock i7-4770.  You i7-4770k overclocked folks should see very happy results.  I've affectionately termed this release "herbivore", because, of course, that's what eats yams for dinner.   :)

This version has a 3% dev fee, which I'll reduce further in later builds.  If it's not clear, I'm using the ratcheting-down dev fee as a good reason for people to upgrade to the later releases and not have old versions of the code floating around.

I've updated the dev fee mechanism a little, so don't freak out:
  - It mines for the 60 seconds for dev
  - It mines for the next 2000 seconds for the user
  -- After that, those numbers are multiplied by 20, so that the miner runs with fewer interruptions:  20 minutes of dev mining followed by 1.3 days of blissfully uninterrupted user mining.
 
Still tied to beeeeeeer.  Are there other pools that use the same protocol as beer?  I can support those easily.

  -Dave

Offline archit

  • Full Member
  • ***
  • Posts: 161
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #32 on: January 16, 2014, 04:04:08 pm »
dga any plans of blessing the people who only have avx?

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #33 on: January 16, 2014, 04:09:57 pm »
dga any plans of blessing the people who only have avx?

It's a lot harder to beat Intel's assembly-optimized sha512 on avx than it was on avx2.  I'll port my most recent speed improvements back, but the biggest speed gain came from rewriting the sha512 computation, and I'm not going to do that for avx.  I'll give a few more % in the avx version of my code, but it won't be the same as the 80cpm jump I just introduced for avx2.

It'll be a while.  I've used up my free time coding quota for the week. :)

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #34 on: January 16, 2014, 04:18:14 pm »
I've released beta 2 of my AVX2-optimized build for Linux x64:

http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta2-avx2-linux64.bin

EEeeeeeek.  If you grabbed it in the prior 30 minutes, download again.  I botched the dev-fee switching when I implemented the new dev mining code and it's not switching properly.

Sorry about that.  Re-tested and it's happy.

Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #35 on: January 16, 2014, 05:29:36 pm »
Code: [Select]
Couldn't use the hugepage speed optimization.  Enable huge pages for a slight speed boost.
kernel config:
Code: [Select]
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

Code: [Select]
$ cat /proc/meminfo | grep HugePages
AnonHugePages:     14336 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0

followed https://wiki.archlinux.org/index.php/KVM#Enabling_huge_pages

What am I missing here?
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #36 on: January 16, 2014, 05:46:39 pm »
Code: [Select]
Couldn't use the hugepage speed optimization.  Enable huge pages for a slight speed boost.
kernel config:
Code: [Select]
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

Code: [Select]
$ cat /proc/meminfo | grep HugePages
AnonHugePages:     14336 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0

followed https://wiki.archlinux.org/index.php/KVM#Enabling_huge_pages

What am I missing here?

sudo bash
echo "4096" > /proc/sys/vm/nr_hugepages

Offline noobster

  • Jr. Member
  • **
  • Posts: 35
  • cryptocurrencies vs. fed
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #37 on: January 16, 2014, 06:08:48 pm »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?
BTC: 15mey7vTkkvHm4UoZgVEP4Yo3REDpH87KW
PTS: PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W
drop some =)

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #38 on: January 16, 2014, 06:17:04 pm »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Offline Aber

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #39 on: January 16, 2014, 06:53:27 pm »
Nice work dga :) can u add 1gh?

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #40 on: January 16, 2014, 07:51:21 pm »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Offline dga

  • Full Member
  • ***
  • Posts: 122
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #41 on: January 17, 2014, 02:17:07 am »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #42 on: January 17, 2014, 03:40:58 am »
Hi dga,

I'm running  ptsminer-dga-beta2-avx2-linux64 on my gentoo box.

echo "3072" > /proc/sys/vm/nr_hugepages

and start 6 worker thread, I get 450 cpm on it's e-1230v3 Haswell CPU.

echo "3584" > /proc/sys/vm/nr_hugepages

and start 7 worker thread, I get 458 cpm.

for your information, the yam runs at about 330 cpm on the same machine.

Offline ptsrush

  • Full Member
  • ***
  • Posts: 84
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #43 on: January 17, 2014, 04:29:38 am »
update for beta3:

after running 30 min, it keeps 462 cpm now.

Offline Brekyrself

  • Hero Member
  • *****
  • Posts: 502
    • View Profile
Re: Open source optimized PTS CPU miner (BETA)
« Reply #44 on: January 17, 2014, 04:47:41 am »
Would like to test a non avx win64 build :)  I'm stuck with a few x58 systems still!