BitShares Forum

Other => Graveyard => BitShares PTS => Topic started by: dga on January 13, 2014, 01:22:25 am

Title: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 13, 2014, 01:22:25 am
Following up on yvg1900's release of yam, I figured I'd improve the state of the art of the open source versions a bit:

https://github.com/dave-andersen/ptsminer

I haven't made it build yet on windows (it just needs to compile the avx2 assembly code - should be straightforward if someone wants to clue me in on how to appropriately invoke gcc there), but it should work on other platforms.  As a warning, I've only really tried it on avx2, since I'm a fan of Haswell.  THIS SOFTWARE SHOULD BE CONSIDERED A BETA QUALITY RELEASE.  At best. 

As with my GPU release, this one is based very directly on ptsminer, so it's tied to beeeeer for the moment.  I plan to fix that and let it be used with other pools in the near future, but that's going to take some dev work.  sigh.

There's a lot of optimization to be done, but this gets the basics as far as memory subsystem optimization, and bridges a lot of the gap between the old OSS version and yam M7i.  I haven't tried out M7j, mind you -- it's probably a bit faster still, but this release should bridge the gap considerably.

It incorporates the same optional, extendible 1% dev fee that the gpu miner does.  Prior ptsminer devs, if you feel like you should be in the list, please PM me and I'll get you added!

With gratitude to FreeTrade for the donation that kept me interested in hacking on and releasing this stuff, and to yvg1900 for some very engaging unofficial competition. *grin*

  -Dave
Title: Re: Open source optimized PTS miner (BETA)
Post by: r05 on January 13, 2014, 01:25:13 am
Went from 80c/m to 160c/m on a Q9550.

Outstanding, dga. :)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: zvs on January 13, 2014, 01:51:31 am
Looked like it was faster than yam on my two junk servers, but slower on the rest

it was also dumping cores everywhere with mmap failing
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 13, 2014, 01:56:12 am
Looked like it was faster than yam on my two junk servers, but slower on the rest

it was also dumping cores everywhere with mmap failing

Thanks for the report.  The mmap failure is just a warning - it falls back to malloc.

To silence - and run a little faster with both yam and my code - run:

echo "2048" > /proc/sys/vm/nr_hugepages

But the dumping cores is bad.  Could you send me a little more detail, or a stack trace?  (And on what kind of machine?)

The slower on the rest isn't too surprising.  There are a lot of optimizations to be done yet, particularly for huge servers with respect to thread affinity and other things.  And the SHA512 code is virtually untouched from the Intel release.  The goal isn't to beat yam with this release, it's just to start the ball rolling a little bit.

There are some constants to play with to tune for different platforms, but it's not worth going there yet (unless you're interested in poking in the code).

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: xolokram on January 13, 2014, 09:36:25 am
you replaced by "Hello, World!" log output >:(

keep up the good work & thank you :D
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 13, 2014, 12:29:16 pm
you replaced by "Hello, World!" log output >:(

keep up the good work & thank you :D

Oops.  I didn't think anyone would notice.  *grins*

Also, I've reduced the severity of that "oh my god no huge pages" message.  It now presents it as a suggestion for how to get better throughput.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: zvs on January 13, 2014, 03:22:22 pm
Looked like it was faster than yam on my two junk servers, but slower on the rest

it was also dumping cores everywhere with mmap failing

Thanks for the report.  The mmap failure is just a warning - it falls back to malloc.

To silence - and run a little faster with both yam and my code - run:

echo "2048" > /proc/sys/vm/nr_hugepages

But the dumping cores is bad.  Could you send me a little more detail, or a stack trace?  (And on what kind of machine?)

The slower on the rest isn't too surprising.  There are a lot of optimizations to be done yet, particularly for huge servers with respect to thread affinity and other things.  And the SHA512 code is virtually untouched from the Intel release.  The goal isn't to beat yam with this release, it's just to start the ball rolling a little bit.

There are some constants to play with to tune for different platforms, but it's not worth going there yet (unless you're interested in poking in the code).

  -Dave

Hmm, I'll try it out again later tonight.  I did check  /proc/sys/vm/nr_hugepages and it comes back as 0, yet proc/meminfo shows:

AnonHugePages:   4194304 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

and afterwards:

AnonHugePages:   4194304 kB
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

....  so even though it had the size allocation, it couldn't create any?  d'oh.  I guess maybe once I fix that everywhere it'll run faster. 

Is there any reason not to make it perm by putting vm.nr_hugepages = 2048 in sysctl.conf?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 13, 2014, 03:29:26 pm

Is there any reason not to make it perm by putting vm.nr_hugepages = 2048 in sysctl.conf?

It's what I do on my machines.  It may tie up a little more memory on your system, but if it's used for a lot of mining, it's a good plan.

My miner uses them less than yam does, I believe, so you'll get a boost on yam also, if it's not going under the covers and enabling them for you. :-)

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: archit on January 13, 2014, 05:52:25 pm
Why not AVX2 for SHA256 as well?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 13, 2014, 05:54:03 pm
Why not AVX2 for SHA256 as well?

Why bother?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: archit on January 13, 2014, 05:54:34 pm
Why not AVX2 for SHA256 as well?

Why bother?

Won't improve anything?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 13, 2014, 05:56:41 pm
Why not AVX2 for SHA256 as well?

Why bother?

Won't improve anything?

There are 4-10 hash collisions per group of 2^23 SHA512 hashes that have to be pushed through SHA256.  Making SHA256 faster would make 1/1,000,000th of the computation faster. :-)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: archit on January 13, 2014, 06:34:25 pm
Why not AVX2 for SHA256 as well?

Why bother?

Won't improve anything?

There are 4-10 hash collisions per group of 2^23 SHA512 hashes that have to be pushed through SHA256.  Making SHA256 faster would make 1/1,000,000th of the computation faster. :-)
??? ::) :o :)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: relm9 on January 13, 2014, 07:20:15 pm
Getting 390-400 cpm on an i7 4770k @ 4ghz, nice :)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: earntodie on January 13, 2014, 09:50:06 pm
Cool. Good luck!
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: gordonhucn on January 15, 2014, 01:06:24 am
great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 15, 2014, 06:10:52 am
great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~

Going to work on it.  I'm having more fun with the algorithms than with the stuff around it - patches accepted to be able to support either pool. :-)

Speaking of avx2, I've been targeting it a little in a build I'm working on.  If people want to test it, there's a linux binary-only build available for avx2 CPUs at:

http://www.cs.cmu.edu/~dga/ptsminer-dga-adv-avx2-linux64.bin

I'll be releasing the code for this series once I've finished ironing out the kinks, but I'd be curious to know if people are seeing the same kind of speed gains that I am on their Intel Haswell-based systems.  On mine - a stock i7-4770 running at normal clock rates - I see:

[STATS] 2014-Jan-15 01:09:04 | 356.8 c/m | 5.5 sh/m | VL: 1195 (98.6%), RJ: 17 (1.4%), ST: 0 (0.0%)

Because it's a preview dev build, I changed the mining fee a little - it's higher just for this one - 5% - but in big blocks (200 seconds dev and then 4000 seconds user) so that it has more time to stabilize so you can see the true cpm rate.

I believe this one substantially outperforms yam, but only for avx2.  I'd be curious if others can confirm.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: allano on January 15, 2014, 07:15:12 am
I have Ubuntu 12.04

I've done the following

sudo apt-get install build-essential libboost-system-dev libboost-filesystem-dev libboost-program-options-dev libboost-thread-dev zlib1g-dev yasm

git clone https://github.com/dave-andersen/ptsminer

cd ptsminer/src

make -f makefile.unix.no-chrono


Code: [Select]
g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
obj/sha512.o: In function `Init_SHA512_avx2':
sha512.c:(.text+0x27): undefined reference to `sha512_transform_rorx'
sha512.c:(.text+0x32): undefined reference to `sha512_transform_single_rorx'
collect2: ld returned 1 exit status
make: *** [ptsminer] Error 1

I hope you can help me.

Edit:
I have fixed the Problem

sudo aptitude install libboost-system1.48-dev libboost-filesystem1.48-dev libboost-program-options1.48-dev libboost-thread1.48-dev libboost-chrono1.48-dev

 make -f makefile.unix


Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: gordonhucn on January 15, 2014, 09:18:21 am
great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~

Going to work on it.  I'm having more fun with the algorithms than with the stuff around it - patches accepted to be able to support either pool. :-)

Speaking of avx2, I've been targeting it a little in a build I'm working on.  If people want to test it, there's a linux binary-only build available for avx2 CPUs at:

http://www.cs.cmu.edu/~dga/ptsminer-dga-adv-avx2-linux64.bin

I'll be releasing the code for this series once I've finished ironing out the kinks, but I'd be curious to know if people are seeing the same kind of speed gains that I am on their Intel Haswell-based systems.  On mine - a stock i7-4770 running at normal clock rates - I see:

[STATS] 2014-Jan-15 01:09:04 | 356.8 c/m | 5.5 sh/m | VL: 1195 (98.6%), RJ: 17 (1.4%), ST: 0 (0.0%)

Because it's a preview dev build, I changed the mining fee a little - it's higher just for this one - 5% - but in big blocks (200 seconds dev and then 4000 seconds user) so that it has more time to stabilize so you can see the true cpm rate.

I believe this one substantially outperforms yam, but only for avx2.  I'd be curious if others can confirm.

  -Dave
avx/sse performs more or less the same as x86 instruction if SIMD is not used, those bit operations(80 round calulation etc.) must be done with intel intrinsics SIMD to get a great performance boost.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: jernau on January 15, 2014, 10:29:32 am
Using the following CPU (Intel Xeon(R) CPU E5506  @ 2.13GHz)

Code: [Select]
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Stepping:              5
CPU MHz:               2133.408
BogoMIPS:              4266.81
Hypervisor vendor:     Xen
Virtualization type:   para
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0,1

Running on 2 threads, with hugepages disabled, sse4 mode, I'm getting ~50cpm vs ~35cpm for xolokram's ptsminer.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 15, 2014, 12:03:07 pm
great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~

Speaking of avx2, I've been targeting it a little in a build I'm working on.  If people want to test it, there's a linux binary-only build available for avx2 CPUs at:

http://www.cs.cmu.edu/~dga/ptsminer-dga-adv-avx2-linux64.bin

I believe this one substantially outperforms yam, but only for avx2.  I'd be curious if others can confirm.

  -Dave
avx/sse performs more or less the same as x86 instruction if SIMD is not used, those bit operations(80 round calulation etc.) must be done with intel intrinsics SIMD to get a great performance boost.

Yes.  The reason this one is a binary-only build is that I rewrote the sha512 generating code.  It was previously using Intel's hand-optimized avx2 SIMD code.  Now it's using my own.  The binary I linked is using my own version, also SIMD using avx2, which is faster.  It's binary-only in part because I haven't made it anywhere near easy for anyone else to compile it - or made a version that is generalized across architectures.  This one is a toy for Haswell owners for now while I play with it more.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 16, 2014, 01:56:54 am
Hi Dave,

I'm running your cudapts and donating for a while and it runs great.

I'd like to try your cpu miner on my haswell machine, but it requires boost-1.53.0.

I have only boost-1.52.0 installed and a lot of software depends on it.

Of course I can upgrade to 1.53.0 and rebuild my gentoo system, but I think maybe
you could build a static linked file, so it depends only glibc.

If you do that your miner will run on almost any linux box, just as the yam did.

thank you very much.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: allano on January 16, 2014, 06:37:39 am
Hi Dave,

I get the following error:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


I have tried the following:
Code: [Select]
echo "2048" > /proc/sys/vm/nr_hugepages
vm.nr_hugepages = 2048 in sysctl.conf
reboot

The server has the following data:
Code: [Select]
16GB RAM
AnonHugePages:         0 kB
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Ubuntu 12.04 64bit

I have your ptsminer compiled with makefile.unix
I start your ptsminer with ./ptsminer <adress> 8 avx

if you need more information, just tell me which

Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: honger18 on January 16, 2014, 10:15:17 am
I got the following trying to compile...
anyone know how to fix this ? I'm on a 32-bit kernel(PAE), not sure if that's relevant...

Code: [Select]
~/comps/ptsminer/src$ make -f makefile.unix
cc -c -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o
intel/sha512_avx2.S: Assembler messages:
intel/sha512_avx2.S:64: Error: bad expression
intel/sha512_avx2.S:64: Error: junk at end of line, first unrecognized character is `y'
intel/sha512_avx2.S:67: Error: bad expression
intel/sha512_avx2.S:67: Error: junk at end of line, first unrecognized character is `y'
intel/sha512_avx2.S:70: Error: bad expression
intel/sha512_avx2.S:70: Error: junk at end of line, first unrecognized character is `r'
intel/sha512_avx2.S:72: Error: bad expression
intel/sha512_avx2.S:72: Error: junk at end of line, first unrecognized character is `r'
...
intel/sha512_avx2.S:784: Error: bad register name `%rsp)'
intel/sha512_avx2.S:785: Error: bad register name `%rsp)'
intel/sha512_avx2.S:788: Error: bad register name `%rsp)'
make: *** [obj/sha512_avx2.o] Error 1
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 12:18:41 pm
I got the following trying to compile...
anyone know how to fix this ? I'm on a 32-bit kernel(PAE), not sure if that's relevant...

Code: [Select]
~/comps/ptsminer/src$ make -f makefile.unix
cc -c -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o
intel/sha512_avx2.S: Assembler messages:
intel/sha512_avx2.S:64: Error: bad expression
intel/sha512_avx2.S:64: Error: junk at end of line, first unrecognized character is `y'
intel/sha512_avx2.S:67: Error: bad expression
intel/sha512_avx2.S:67: Error: junk at end of line, first unrecognized character is `y'
intel/sha512_avx2.S:70: Error: bad expression
intel/sha512_avx2.S:70: Error: junk at end of line, first unrecognized character is `r'
intel/sha512_avx2.S:72: Error: bad expression
intel/sha512_avx2.S:72: Error: junk at end of line, first unrecognized character is `r'
...
intel/sha512_avx2.S:784: Error: bad register name `%rsp)'
intel/sha512_avx2.S:785: Error: bad register name `%rsp)'
intel/sha512_avx2.S:788: Error: bad register name `%rsp)'
make: *** [obj/sha512_avx2.o] Error 1

Are you using an older version of gcc?  If so, try upgrading -- or I can explain how to remove the need for the avx2 code if you're not on a Haswell CPU.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: honger18 on January 16, 2014, 01:40:39 pm
Hi dga
Quote
Are you using an older version of gcc?  If so, try upgrading
I have
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9)
the latest available on ubuntu. is that too old ?

Quote
or I can explain how to remove the need for the avx2 code if you're not on a Haswell CPU.

I have to admit I'm not sure if I have a HASWELL Cpu, I have the following processor.
Code: [Select]
model name      : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

If it's an easy fix, e.g. to comment some code out I'd appreciate it.

p.s. thanks for all the effort, I really appreciate keeping the source open !
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: fishrat on January 16, 2014, 02:12:00 pm
very good
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: honger18 on January 16, 2014, 02:12:30 pm
I googled Haswell and since mine isn't one I tried commenting out the failing sha512_avx2.S bit in the makefile.unix , since apparently I can't use it anyway, but now it fails with the following. Not sure if I need to do more than mess with the makefile...


Code: [Select]
~/comps/ptsminer/src$ make -f makefile.unix
g++ -c -O3  -fpermissive -o obj/cpuid.o cpuid.c
yasm -f elf32 -o obj/sha512_avx.o intel/sha512_avx.asm
yasm -f elf32 -o obj/sha512_sse4.o intel/sha512_sse4.asm
g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_sse4.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -l boost_chrono -Wl,-Bdynamic -l z -l dl -l pthread
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 0 has invalid symbol index 11
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 1 has invalid symbol index 12
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 2 has invalid symbol index 2
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 3 has invalid symbol index 2
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 4 has invalid symbol index 11
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 5 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 6 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 7 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 8 has invalid symbol index 12
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 9 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 10 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 11 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 12 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 13 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 14 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 15 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 16 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 17 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 18 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 19 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 20 has invalid symbol index 13
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 21 has invalid symbol index 22
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_line): relocation 0 has invalid symbol index 2
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/crt1.o: In function `_start':
(.text+0x18): undefined reference to `main'
collect2: error: ld returned 1 exit status
make: *** [ptsminer] Error 1
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: daem0n on January 16, 2014, 02:45:16 pm
Ubuntu 13.10 Work!  :D

1 - git clone https://github.com/dave-andersen/ptsminer

2 - sudo apt-get install build-essential libboost-system-dev libboost-filesystem-dev libboost-program-options-dev libboost-thread-dev zlib1g-dev yasm

3 - cd ptsminer/src

4 - make -f makefile.unix

5 - ./ptsminer <address> <threads> avx

Example: ./ptsminer Padf809dfgdf9OP23nht8f02j3f0 8 avx

 8)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 02:50:25 pm
I googled Haswell and since mine isn't one I tried commenting out the failing sha512_avx2.S bit in the makefile.unix , since apparently I can't use it anyway, but now it fails with the following. Not sure if I need to do more than mess with the makefile...


Code: [Select]
~/comps/ptsminer/src$ make -f makefile.unix
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_info): relocation 11 has invalid symbol index 13
make: *** [ptsminer] Error 1

Oof.  This is going to be a problem on a 32 bit system.  There are some very x86_64 specific chunks of code in the assembly-optimized sha512 routines (which you need if you want this thing to be fast).

Sorry. :(
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: honger18 on January 16, 2014, 03:14:23 pm
Quote
Oof.  This is going to be a problem on a 32 bit system.  There are some very x86_64 specific chunks of code in the assembly-optimized sha512 routines (which you need if you want this thing to be fast).

Sorry.

I was afraid of that, no problem. Maybe finally a good reason to covert my main desktop to 64bit...
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 04:01:49 pm
I've released beta 2 of my AVX2-optimized build for Linux x64:

http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta2-avx2-linux64.bin

(Note the changed URL).  This one is still binary-only -- I've been focusing on speed, not making it possible for anyone else to build this hunk o'junk code.

This is the first version of my code that beats 400 cpm on a stock i7-4770.  You i7-4770k overclocked folks should see very happy results.  I've affectionately termed this release "herbivore", because, of course, that's what eats yams for dinner.   :)

This version has a 3% dev fee, which I'll reduce further in later builds.  If it's not clear, I'm using the ratcheting-down dev fee as a good reason for people to upgrade to the later releases and not have old versions of the code floating around.

I've updated the dev fee mechanism a little, so don't freak out:
  - It mines for the 60 seconds for dev
  - It mines for the next 2000 seconds for the user
  -- After that, those numbers are multiplied by 20, so that the miner runs with fewer interruptions:  20 minutes of dev mining followed by 1.3 days of blissfully uninterrupted user mining.
 
Still tied to beeeeeeer.  Are there other pools that use the same protocol as beer?  I can support those easily.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: archit on January 16, 2014, 04:04:08 pm
dga any plans of blessing the people who only have avx?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 04:09:57 pm
dga any plans of blessing the people who only have avx?

It's a lot harder to beat Intel's assembly-optimized sha512 on avx than it was on avx2.  I'll port my most recent speed improvements back, but the biggest speed gain came from rewriting the sha512 computation, and I'm not going to do that for avx.  I'll give a few more % in the avx version of my code, but it won't be the same as the 80cpm jump I just introduced for avx2.

It'll be a while.  I've used up my free time coding quota for the week. :)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 04:18:14 pm
I've released beta 2 of my AVX2-optimized build for Linux x64:

http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta2-avx2-linux64.bin

EEeeeeeek.  If you grabbed it in the prior 30 minutes, download again.  I botched the dev-fee switching when I implemented the new dev mining code and it's not switching properly.

Sorry about that.  Re-tested and it's happy.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 16, 2014, 05:29:36 pm
Code: [Select]
Couldn't use the hugepage speed optimization.  Enable huge pages for a slight speed boost.
kernel config:
Code: [Select]
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

Code: [Select]
$ cat /proc/meminfo | grep HugePages
AnonHugePages:     14336 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0

followed https://wiki.archlinux.org/index.php/KVM#Enabling_huge_pages

What am I missing here?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 05:46:39 pm
Code: [Select]
Couldn't use the hugepage speed optimization.  Enable huge pages for a slight speed boost.
kernel config:
Code: [Select]
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

Code: [Select]
$ cat /proc/meminfo | grep HugePages
AnonHugePages:     14336 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0

followed https://wiki.archlinux.org/index.php/KVM#Enabling_huge_pages

What am I missing here?

sudo bash
echo "4096" > /proc/sys/vm/nr_hugepages
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 16, 2014, 06:08:48 pm
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 06:17:04 pm
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: Aber on January 16, 2014, 06:53:27 pm
Nice work dga :) can u add 1gh?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 16, 2014, 07:51:21 pm
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 17, 2014, 02:17:07 am
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 17, 2014, 03:40:58 am
Hi dga,

I'm running  ptsminer-dga-beta2-avx2-linux64 on my gentoo box.

echo "3072" > /proc/sys/vm/nr_hugepages

and start 6 worker thread, I get 450 cpm on it's e-1230v3 Haswell CPU.

echo "3584" > /proc/sys/vm/nr_hugepages

and start 7 worker thread, I get 458 cpm.

for your information, the yam runs at about 330 cpm on the same machine.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 17, 2014, 04:29:38 am
update for beta3:

after running 30 min, it keeps 462 cpm now.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: Brekyrself on January 17, 2014, 04:47:41 am
Would like to test a non avx win64 build :)  I'm stuck with a few x58 systems still!
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 17, 2014, 08:56:33 am
Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

Code: [Select]
$ ptsminer-dga-beta3-avx2-linux64.bin PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W 2 sse4
ptsminer-dga-beta3-avx2-linux64.bin: error while loading shared libraries: libboost_system.so.1.53.0: cannot open shared object file: No such file or directory

I have gentoo linux and using repository libs boost 1.52 the binary you provided is compiled against boost 1.53 thanks
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: unsoindovo on January 17, 2014, 09:14:09 am
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

hy dga!!
very good job!!!

when a release for windows SO???


Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 17, 2014, 11:54:18 am
anyway I did notice one thing, why using so many hugepages if the miner @ 4 threads only uses 4 hugepages:

Code: [Select]
# cat /proc/meminfo |grep -i hugepages
AnonHugePages:         0 kB
HugePages_Total:     512
HugePages_Free:      508
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

perhaps
Code: [Select]
echo 4 > /proc/sys/vm/nr_hugepagescould be enough? more than that i would consider waste of memory, or may it use more hugepages over time?

ps. still getting this:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memoryeven after I recompiled my kernel with
Code: [Select]
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
thanks
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 17, 2014, 11:56:58 am
Code: [Select]
$ ptsminer-dga-beta3-avx2-linux64.bin PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W 2 sse4
ptsminer-dga-beta3-avx2-linux64.bin: error while loading shared libraries: libboost_system.so.1.53.0: cannot open shared object file: No such file or directory

I have gentoo linux and using repository libs boost 1.52 the binary you provided is compiled against boost 1.53 thanks

Ahh.  Can you try:

ptsminer-dga-beta3-avx2-linux64-static.bin.gz

from that same directory and let me know if it works for you?  You'll have to gunzip it before running, obviously. :)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 17, 2014, 12:31:53 pm
anyway I did notice one thing, why using so many hugepages if the miner @ 4 threads only uses 4 hugepages:

Code: [Select]
# cat /proc/meminfo |grep -i hugepages
AnonHugePages:         0 kB
HugePages_Total:     512
HugePages_Free:      508
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

perhaps
Code: [Select]
echo 4 > /proc/sys/vm/nr_hugepagescould be enough? more than that i would consider waste of memory, or may it use more hugepages over time?

ps. still getting this:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memoryeven after I recompiled my kernel with
Code: [Select]
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
thanks

Each hugepage is 2MB.  Each thread needs about 600MB.  I'll reduce that by another 50MB in beta4 later today, but for now, that's the math.  so you need 300 hugepages per thread.  With 6 threads, that's 1800 hugepages.

echo 2048 > /proc/sys/vm/nr_hugepages

for 6 threads, or something a little higher if you want to try more threads.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 17, 2014, 02:44:01 pm
ok - thanks again for the feedback on this.  I've put beta4 online in the usual place:

http://www.cs.cmu.edu/~dga/ptsminer/

Along with a static build to address the gentoo library versioning issue.

I'm kind of proud of this one - it's the first of the Haswell builds that cracks 500 cpm on a non-overclocked CPU.  I haven't quite determined if 6 or 7 threads is better, but it's one of those two settings.

Delta from beta3:
  - Uses about 20MB less memory per thread
  - Further optimized sha512 computation code
  - Static build is now part of my default build chain, so we'll keep this one around.
  - Still 3% advanced-build dev fee, but I hope that the 170cpm you'll get more than any other miner should more than compensate for that.  :-)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: jernau on January 17, 2014, 03:10:50 pm
ok - thanks again for the feedback on this.  I've put beta4 online in the usual place:

http://www.cs.cmu.edu/~dga/ptsminer/

Along with a static build to address the gentoo library versioning issue.

I'm kind of proud of this one - it's the first of the Haswell builds that cracks 500 cpm on a non-overclocked CPU.  I haven't quite determined if 6 or 7 threads is better, but it's one of those two settings.

Delta from beta3:
  - Uses about 20MB less memory per thread
  - Further optimized sha512 computation code
  - Static build is now part of my default build chain, so we'll keep this one around.
  - Still 3% advanced-build dev fee, but I hope that the 170cpm you'll get more than any other miner should more than compensate for that.  :-)

That sounds good. Just to be clear, do we actually need a Haswell CPU to use this build?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 17, 2014, 03:28:33 pm
the static bin does not work for me as well i am using non avx cpu (1st gen. core i3)
I get
Code: [Select]
Illegal instruction
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: archit on January 17, 2014, 05:14:11 pm
dga, work on cudapts too please
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: Gwynbleidd on January 17, 2014, 05:36:32 pm
How to compile it?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 17, 2014, 05:53:01 pm
the static bin does not work for me as well i am using non avx cpu (1st gen. core i3)
I get
Code: [Select]
Illegal instruction

Right.  The pre-built one is *just* my advanced preview for avx2.  For other architectures, just grab the current version from the open source release and build.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 17, 2014, 10:57:43 pm
Ok - there's now an advanced preview of beta4 for avx/sse in addition to the avx one. 

http://www.cs.cmu.edu/~dga/ptsminer/

Be sure to grab the right version (beta4) and architecture (avx2 or avxsse) for your machine.  If you're not using the latest Ubuntu, grab the -static version to have a better chance of it working.

Feedback welcome.  I don't have a good set of avxsse machines to compare on, so I don't know how this one compares against yam.  Where the avx2 version is quite a bit faster, this one is still probably just in the same ballpark.  3% dev fee, but I'll cut that down to 1% if it's not beating yam by enough to make it worth paying the dev fee.  *grin*

Making headway at getting the build working better, but it's still a ghastly piece of spaghetti and not fit for pushing to the repository.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 18, 2014, 01:29:50 am
Hi dga,

beta4 avxsse has 315 - 320 cpm in my e3-1230 v2 avx gentoo box.

the yam is about 305 - 310 cpm.

so you win about 3% ahead :)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 01:35:10 am
Hi dga,

beta4 avxsse has 315 - 320 cpm in my e3-1230 v2 avx gentoo box.

the yam is about 305 - 310 cpm.

so you win about 3% ahead :)

Hah.  Thanks!  Not quite enough to justify that dev fee, though.  I'll see if I can make it a bit faster and earn my keep.

In the meantime, I'm just going to go ahead and admit that I have a problem.  I can't keep my toes out of optimizing the avx2 build, so I've put beta5 online.  This one is even more annoyingly architecture-specific, so ONLY haswell / avx2 people should even bother with it.  I just put the static build online because I'm still working out more kinks I introduced into the build process for optimization.  It's getting about 530 cpm on my stock i7-4770.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 01:47:56 am
user@testbox:/home/proto/ptsminer/src# make -f makefile.unix.no-chrono

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
obj/sha512.o: In function `Init_SHA512_avx2':
sha512.c:(.text+0x27): undefined reference to `sha512_transform_rorx'
sha512.c:(.text+0x32): undefined reference to `sha512_transform_single_rorx'
collect2: ld returned 1 exit status
make: *** [ptsminer] Error 1

What do I need to do? Thanks in advance

Ah - I haven't updated the no-chrono makefile

You can run by hand:

gcc intel/sha512_avx2.S -O3 -o obj/sha512_avx2.o

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_avx2.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread

Or try grabbing one of the newer, faster binary builds.

I'll patch up that makefile pretty soon.  Thanks for letting me know - I wasn't sure if anyone wanted to use it.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: aliasme on January 18, 2014, 01:52:24 am
user@testbox:/home/proto/ptsminer/src# make -f makefile.unix.no-chrono

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
obj/sha512.o: In function `Init_SHA512_avx2':
sha512.c:(.text+0x27): undefined reference to `sha512_transform_rorx'
sha512.c:(.text+0x32): undefined reference to `sha512_transform_single_rorx'
collect2: ld returned 1 exit status
make: *** [ptsminer] Error 1

What do I need to do? Thanks in advance

Ah - I haven't updated the no-chrono makefile

You can run by hand:

gcc intel/sha512_avx2.S -O3 -o obj/sha512_avx2.o

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_avx2.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread

Or try grabbing one of the newer, faster binary builds.

I'll patch up that makefile pretty soon.  Thanks for letting me know - I wasn't sure if anyone wanted to use it.

  -Dave

Thanks much. I nuked my post after finding the makefile and assuming I needed to comment out the avx2 code.

Going to try again --- here's the CPU output, which binary?

processor   : 7
vendor_id   : GenuineIntel
cpu family   : 6
model      : 44
model name   : Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
stepping   : 2
cpu MHz      : 2400.260
cache size   : 4096 KB
fpu      : yes
fpu_exception   : yes
cpuid level   : 11
wp      : yes
flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm
bogomips   : 4800.52
clflush size   : 64
cache_alignment   : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:


Thanks for your help. Understanding the CPU features and seeing this performance improvement is cool. Great stuff you are doing.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: aliasme on January 18, 2014, 01:54:06 am
Output when compiling by hand:

user@testbox:/home/proto/ptsminer/src# g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_avx2.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
g++: obj/sha512_avx2.o: No such file or directory
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: plane501 on January 18, 2014, 01:55:16 am
Windows version please :)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 01:59:54 am
model name   : Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz

http://ark.intel.com/products/48768/Intel-Xeon-Processor-E5645-12M-Cache-2_40-ghz-5_86-gts-Intel-qpi

The SSE one.  LMK if it works - I haven't tested on that model.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 02:00:29 am
Output when compiling by hand:

user@testbox:/home/proto/ptsminer/src# g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_avx2.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
g++: obj/sha512_avx2.o: No such file or directory

First run:

gcc -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 18, 2014, 02:12:29 am
Hi dga,

ptsminer-dga-beta5-avx2-linux64-static rush at about 530 - 535 cpm on my e3-1230 v3 haswell gentoo box.

Really crazy.

I'm not native speaker, but maybe in such case guess I should say it's "damn fast"!

Anyway, once a new block start, seems all my 6 worker print "Aborting scan run because of new work."

It's is a little annoying. Could you output them only 1 time?

And sometime it says "Not inserted: <298324783427234980742398> at 7744". Is it OK?

The RJ is about 5% in China. I'm not sure is due to broken internet or something else.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: aliasme on January 18, 2014, 02:15:27 am
[STATS] 2014-Jan-18 02:46:35 | 255.8 c/m | 2.2 sh/m | VL: 12 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

This is at least 2x.

Wow, nice work.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: allano on January 18, 2014, 06:40:15 am
Hi dga,

ptsminer-dga-beta5-avx2-linux64-static.bin rush at about 580 - 610 cpm on my i7-4770 CPU @ 3.40GHz Ubuntu.

I wish to compile it to my AMD Opteron can you make the source code to githube, please?

Very nice work from you ;)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 18, 2014, 09:52:02 am
with the pool fee the 3% miner fee is preety high together

also it would be nice if one can specify the amount of ram per thread

thanks
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 10:13:36 am
Replying to a bunch of the previous comments:

- I'll disable the "aborting scan" message in the next set of changes.  I still want to put in a slightly better method for doing this, because I think it is increasing the reject rate a little bit.  @ptsrush, my guess is it's a little bit because of latency and a little bit because the miner works in bigger chunks.  I'll make it a priority to get that in the next build.

@allano, re source code to github: 
  (a)  Which opteron do you have?  I'm running the SSE one on AMD CPUs with SSE support.  It works just fine.
  (b)  I have a few reasons I'm not pushing the source yet:  First, the build is a mess, and I need to move it to using autoconf or some other method.  There are currently four different makefiles that the project inherited from ptsminer, and part of my recent optimization adds different code that runs on different CPUs.  The combination of this is currently unmanageable.

  Second, I'm trying to figure out what the right strategy is for this one to fund development.  It's pretty clear based on what happened with the GPU miners that if I release all of my tricks without supporting more pools and windows, a whole lot of clones will spring up with binary builds for other platforms with high dev fees that go to someone else.  I'm fine with that in the long term, but I want to be careful.  I've put .. um .. rather a lot of work into the current optimized CPU version.

  My current thought is that I'm going to wait until *I* can provide easy-to-use binaries for both linux and windows with a 1-2% dev fee that supports both beeeeer and ypool, and then release the source as well.  Of course, I'd also be happy to go with a sponsored code release again or explore other options, but the dev fee really does seem like a very equitable way.

@noobster: Agreed.  The 3% dev fee is temporary.  I'm going to cut it down to 2% pretty soon.
  - Do you mind the 60 second dev mine starting time?  I'd rather increase the user time from 2000 to 3000 seconds than reduce dev mine from 60 to 40 --- it's more efficient to run for longer.

- Amount of RAM per thread:  The algorithms I'm using in this are completely different from what previous miners did.  There's no way to use less RAM because of the way I store the data, and using more RAM will actually make it slower.

Did you want to use *less* RAM, or more because you want it go to faster?

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 18, 2014, 11:31:34 am
I wanted more ram options so I can experiment a bit with different settings. Say from 512 to 2048 MB per thread adjustable by 256MB… no idea this could take a lot of work maybe.

And you were right about the rejects, beta4 update increased the reject rate to over 30% in my case lol
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 01:12:12 pm
I wanted more ram options so I can experiment a bit with different settings. Say from 512 to 2048 MB per thread adjustable by 256MB… no idea this could take a lot of work maybe.

And you were right about the rejects, beta4 update increased the reject rate to over 30% in my case lol

Gotcha.  The thing to tune isn't RAM - as I said, the goal is actually to solve the problem with as *little* RAM per thread as possible, because it's faster if you can do that.  Google a bit about the TLB and L2 DTLB if you're curious to understand the background behind this.  Some of the big changes between the beta2 and beta5 releases were related to using less memory in order to make things faster.

The thing to play around with tuning is the number of threads.  If you're not also doing GPU mining, you want to use at least as many threads as your CPU has real cores.  But you might also want to use more - but the optimal number is tricky.  On a 4 core CPU, the right answer might be anywhere from 4-7.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: mmao on January 18, 2014, 01:48:28 pm
I got some error messages when I compile the source:
Code: [Select]
cc -c -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o
intel/sha512_avx2.S: Assembler messages:
intel/sha512_avx2.S:606: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:607: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:608: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:609: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:618: Error: suffix or operands invalid for `vpaddq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpalignr'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpaddq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpalignr'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsrlq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsllq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpor'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsrlq'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $41,e,y0'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $18,e,y1'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $34,a,T1'
why ?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 02:00:28 pm
I got some error messages when I compile the source:
Code: [Select]
cc -c -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o
intel/sha512_avx2.S: Assembler messages:
intel/sha512_avx2.S:606: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:607: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:608: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:609: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:618: Error: suffix or operands invalid for `vpaddq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpalignr'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpaddq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpalignr'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsrlq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsllq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpor'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsrlq'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $41,e,y0'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $18,e,y1'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $34,a,T1'
why ?

What compiler?  It's possible you're using too old a version of gcc that doesn't understand the avx2 instructions.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: mmao on January 18, 2014, 02:03:51 pm
it's a centos6 box
Code: [Select]
%gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 02:09:30 pm
it's a centos6 box
Code: [Select]
%gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

Ow.  gcc doesnt support avx2 until somewhere in the 4.7 release.  Best suggestion for now is to try the static avxsse binary - does it work for you?

Or if you're on an avx2 machine, upgrade. :-)

Next best is that it's getting higher on the TODO to make it easy to disable avx2 building.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 02:48:16 pm
I'm not native speaker, but maybe in such case guess I should say it's "damn fast"!

Anyway, once a new block start, seems all my 6 worker print "Aborting scan run because of new work."

It's is a little annoying. Could you output them only 1 time?

And sometime it says "Not inserted: <298324783427234980742398> at 7744". Is it OK?

The RJ is about 5% in China. I'm not sure is due to broken internet or something else.

Thanks for these bug reports.  I've fixed both and pushed beta6 for haswell/avx2.

My benchmark hasn't been running long enough to really stabilize, but...

Beta5:   546.7 c/m | 8.7 sh/m | VL: 3937 (97.8%), RJ: 90 (2.2%), ST: 0 (0.0%)

Beta6:   543.0 c/m | 8.7 sh/m | VL: 142 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

I wouldn't read too much into the c/m and sh/m differences - it hasn't been running long enough - but the reject rate is reduced substantially.  The speed should be within 10c/m plus or minus once it's been running long enough to tell.

With more data:  540.2 c/m | 7.8 sh/m | VL: 1219 (99.5%), RJ: 6 (0.5%), ST: 0 (0.0%)

Much better reject rate with beta6.
  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 07:12:55 pm
beta6 is now available as a dynamically linked build as well.  I think the static is a better way to go in general, but I threw this one up there in case anyone wants to test it.  I've removed the dependencies upon boost_filesystem and boost_chrono (which is the first step towards getting rid of at least one of those darned makefiles, and simplifying compilation on other platforms).

Me being the high-quality software engineering house that I am, there are a few other hopefully-insignificant tweaks in the one I just put online vs the static beta6, 'cause you're just getting builds out of my dev directory as I muddle through this, but (ha ..) nothing that should cause noticeable performance difference.

Happy mining.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 18, 2014, 08:46:08 pm
is a new sse4 version going to be available?

i get

core 2 duo t8100 @ [STATS] 2014-Jan-18 21:44:54 | 60.6 c/m | 1.0 sh/m | VL: 502 (83.8%), RJ: 97 (16.2%), ST: 0 (0.0%)

core i3 380um @ [STATS] 2014-Jan-18 21:43:03 | 47.2 c/m | 0.7 sh/m | VL: 409 (85.9%), RJ: 67 (14.1%), ST: 0 (0.0%)

damn i wish I had faster cpu
thanks
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 18, 2014, 09:50:08 pm
is a new sse4 version going to be available?

i get

core 2 duo t8100 @ [STATS] 2014-Jan-18 21:44:54 | 60.6 c/m | 1.0 sh/m | VL: 502 (83.8%), RJ: 97 (16.2%), ST: 0 (0.0%)

core i3 380um @ [STATS] 2014-Jan-18 21:43:03 | 47.2 c/m | 0.7 sh/m | VL: 409 (85.9%), RJ: 67 (14.1%), ST: 0 (0.0%)

damn i wish I had faster cpu
thanks

Yes.  beta7 is now online for both avx2 and sse/avx. 

There aren't major changes from beta6 for avx2 users;  if you're running really happily, I wouldn't bother upgrading.  The changes are mostly internal to trying to make it easier to build, and to make available the reject-reducing improvements for SSE and AMD CPUs.  It should be even more aggressive for slower CPUs, but it's quite a bit better than it was.  Note:  You may still see a batch of rejects at the very start when the miner switches out of dev-mining mode for the first time.  Just depends on where things were when it switches.

One note:  More individually slower cores will result in a higher reject rate.  I'm seeing this, for example, if I push the hyperthreading too hard, and on AMD CPUs, which have more cores but no hyperthreading, with each core a bit slower.  Not horrible, but something you'll notice.

Updated:  I've also put the Mac build online for beta7, and tried to improve the static-ness of this one so it should be easier to run.  This also meets a second personal goal of mine:  It's now about as fast to mine with the CPU on the Macbook Pro than it is to use the GPU with cudapts.  Take that, cudapts!  It's about time to put the GPU-hardness back in Protoshares.  <grin>  (It does, however, make the fans spin more.)  I'm getting about 200 cpm using 4 threads on MBP, which is pretty close to GPU.  I don't recommend mining on a laptop, though, unless you don't like your laptop.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dclark44 on January 19, 2014, 12:24:05 am
Hi,

I was using beta7 and got the following error message

[MASTER] work received - sharetarget: 03ffffffffffffffffffffffffffffffffffffffff                                                             ffffffffffffffbeefde4d
ptsminer-dga-beta7-avxsse-linux64-static.bin: malloc.c:2369: sysmalloc: Assertio                                                             n `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_                                                             offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old                                                             _size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsiz                                                             e))+((2 * (sizeof(size_t)) < __alignof__ (long double) ? __alignof__ (long doubl                                                             e) : 2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t)) < __alignof__ (long d                                                             ouble) ? __alignof__ (long double) : 2 * (sizeof(size_t))) - 1))) && ((old_top)-                                                             >size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.

Thanks,
D
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: mmao on January 19, 2014, 08:45:25 am
it's a centos6 box
Code: [Select]
%gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

Ow.  gcc doesnt support avx2 until somewhere in the 4.7 release.  Best suggestion for now is to try the static avxsse binary - does it work for you?

Or if you're on an avx2 machine, upgrade. :-)

Next best is that it's getting higher on the TODO to make it easy to disable avx2 building.

I installed gcc4.7.2 but got the same errors,
I met exactly the same problem when compiling girino's opencl miner,
what version of gcc do you use?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 19, 2014, 09:37:15 am
it's a centos6 box
Code: [Select]
%gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

Ow.  gcc doesnt support avx2 until somewhere in the 4.7 release.  Best suggestion for now is to try the static avxsse binary - does it work for you?

Or if you're on an avx2 machine, upgrade. :-)

Next best is that it's getting higher on the TODO to make it easy to disable avx2 building.

I installed gcc4.7.2 but got the same errors,
I met exactly the same problem when compiling girino's opencl miner,
what version of gcc do you use?

Odd.  I use a more recent one:

gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1

This may also be about your version of the assembler, though:

as --version
GNU assembler (GNU Binutils for Ubuntu) 2.23.52.20130913
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: noobster on January 19, 2014, 11:24:05 am
it's a centos6 box
Code: [Select]
%gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

Ow.  gcc doesnt support avx2 until somewhere in the 4.7 release.  Best suggestion for now is to try the static avxsse binary - does it work for you?

Or if you're on an avx2 machine, upgrade. :-)

Next best is that it's getting higher on the TODO to make it easy to disable avx2 building.

I installed gcc4.7.2 but got the same errors,
I met exactly the same problem when compiling girino's opencl miner,
what version of gcc do you use?

try upgrading your entire system, gcc might not be enough since it uses libraries such as libstdc++ and more
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dclark44 on January 19, 2014, 02:47:13 pm
I upgraded my libraries and left off the mode from the command line and got it to work.

Thanks,
D
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 20, 2014, 04:01:41 am
Hi dga,

ptsminer-dga-beta7-avx2-linux64-static.bin running at 530 cpm on my e3-1230 v3.

RJ is about 0.5% - 1.2%. Very nice.

Now running your miner on all my haswell machine.

In the other hand, ptsminer-dga-beta7-avxsse-linux64-static.bin has 20% RJ very
often. terrible.

minor issue:

when the ptsminer start, seems all workers are print texts at same time
 so screen a little mess. for example:

spawning 6 worker thread(s)
[WORKER[WORKER1] starting
0] starting

Fix it or not, I guess no one really care about that :)

then some Feature-Request:

At present, the ptsminer output 3 kind of information:
[MASTER], [WORKER] and [STATS]

as a miner, I don't care about [MASTER], [WORKER]. I just want to see [STATS].

"66101067<->34749031 #29878"???

No one understand it except you developer, I guess.

I don't want to see "[MASTER] submitted share -> SHARE", since I have VL already.

"[MASTER] work received" is useful, though.

By the way, RJ is reject, ST is stale, what the "VL" is?

And the ST always 0 on my machine.

At last, maybe you should not open source your code. Now @archit start to port
your code back to linux, hell.

In China a packer rebuild your code and earn 20%.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 21, 2014, 01:29:32 am
Hi dga,

More stats report.

I run your avx2 beta7 for 2 days.

In the beginning, the RJ is almost 0.0%. But time passed, it raise to 6.4% slowly.

I see "too many rejecets (3) in a row, forcing reconnect" several times.

Flowing that error message, the ptsminer reconnect and even more "[MASTER] submitted share -> REJECET"

I had to Ctrl-C to restart the ptsminer.

Thank you.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 21, 2014, 01:51:51 am
using same Internet connecion, my ivybridge (avx sse beta7) RJ is higher than haswell (avx2 beta7).

Now on several machine, VL 1013 and avxsse 14.3% : avx2 0.1%.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 22, 2014, 08:40:21 pm
using same Internet connecion, my ivybridge (avx sse beta7) RJ is higher than haswell (avx2 beta7).

Now on several machine, VL 1013 and avxsse 14.3% : avx2 0.1%.

Ok, replying to a few of these:
 - I've put beta 7.1 online just for avxsse.  I've let a few small tweaks from what will be beta8 slip in, but it's basically the same as beta7 from a performance perspective.
 Changes:
    - It should improve the reject rate.  It's a bit more aggressive about checking for updates now without slowing down mining. (ptsrush)
    - Better handling of and diagnostic messages for out-of-memory / allocation errors. (dclark)

Update:  After about an hour of testing on a 64 core AMD machine:

    789.7 c/m | 12.3 sh/m | VL: 623 (99.7%), RJ: 2 (0.3%), ST: 0 (0.0%)

Looks like this one successfully pulls the submitted rejects down for avxsse also, though an hour isn't quite long enough to say what the overall reject rate will be.

Update 2:  After a day (the c/m and sh/m got reset but VL/RJ didn't):
    758.6 c/m | 12.0 sh/m | VL: 16570 (98.6%), RJ: 228 (1.4%), ST: 0 (0.0%)

Looks solid on rejects.

RJ is reject, ST is stale, VL is valid. 

I'm pondering the open-sourceness.  In the case of GPU, I'm happy - Invictus paid for the release.  Now that I'm doing unpaid improvements to the CPU miner, I want to see how it plays out - but I'm increasingly leaning towards keeping at least some of the cutting edge private as a way to get a bit of return on development time, and trying to keep the open source version updated at a reasonable level that lags a bit behind the latest and greatest but is still a good basis for people who want to learn / explore / improve.  It's a tough question.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: Brekyrself on January 23, 2014, 02:59:33 am
Just to clarify the miner, does it require AVX or can it be run on SSE alone?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: jernau on January 23, 2014, 06:51:22 am
You can use just sse4.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 24, 2014, 03:01:26 am
hi dga,

I found that yam support beeeeer.org and report about 7% RJ too.

Since both miner report almost same reject rate, I guess your code is OK and
 don't worry about RJ any more.

Thank you very much.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: mmao on January 24, 2014, 10:18:12 am
I downloaded the beta7.1 static binary, and run it on my centos6 box but failed:

Code: [Select]
using SSE4
spawning 4 worker thread(s)
[WORKER[WORKER[WORKER2] starting
3] starting
1] starting
[WORKER0] starting
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
[WORKER1] GoGoGo!
[WORKER2] GoGoGo!
[WORKER0] GoGoGo!
[WORKER3] GoGoGo!
connecting to 54.201.26.128:1337
Mining for approx 60 seconds to support further development
Payments to: Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc
[MASTER] work received - sharetarget: 03ffffffffffffffffffffffffffffffffffffffffffffffffffffffbeefde4d
ptsminer-dga-beta7.1-avxsse-linux64-static.bin: malloc.c:2369: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t)) < __alignof__ (long double) ? __alignof__ (long double) : 2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t)) < __alignof__ (long double) ? __alignof__ (long double) : 2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
aborted (core dumped)

it failed also with sph mode:
Code: [Select]
using SPHLIB
spawning 4 worker thread(s)
[WORKER[WORKER12[WORKER] starting] starting

3] starting
[WORKER0] starting
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
[WORKER2] GoGoGo!
[WORKER3] GoGoGo!
[WORKER0] GoGoGo!
[WORKER1] GoGoGo!
connecting to 54.201.26.128:1337
Mining for approx 60 seconds to support further development
Payments to: Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc
[MASTER] work received - sharetarget: 03ffffffffffffffffffffffffffffffffffffffffffffffffffffffbeefde4d
Segmentation fault (core dumped)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 24, 2014, 07:02:20 pm
Hm.  What happens if you run with only 1 thread?

What happens if you first run, as root:
   echo 2048 > /proc/sys/vm/nr_hugepages

and then run with only one thread?

If it works with 1 thread, does it also work with 2?

  -Dave

I downloaded the beta7.1 static binary, and run it on my centos6 box but failed:

Code: [Select]
using SSE4
spawning 4 worker thread(s)
[WORKER[WORKER[WORKER2] starting
3] starting
1] starting
[WORKER0] starting
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
[WORKER1] GoGoGo!
[WORKER2] GoGoGo!
[WORKER0] GoGoGo!
[WORKER3] GoGoGo!
connecting to 54.201.26.128:1337
Mining for approx 60 seconds to support further development
Payments to: Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc
[MASTER] work received - sharetarget: 03ffffffffffffffffffffffffffffffffffffffffffffffffffffffbeefde4d
ptsminer-dga-beta7.1-avxsse-linux64-static.bin: malloc.c:2369: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t)) < __alignof__ (long double) ? __alignof__ (long double) : 2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t)) < __alignof__ (long double) ? __alignof__ (long double) : 2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
aborted (core dumped)

it failed also with sph mode:
Code: [Select]
using SPHLIB
spawning 4 worker thread(s)
[WORKER[WORKER12[WORKER] starting] starting

3] starting
[WORKER0] starting
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for big table.  Enable huge pages for a slight speed boost.
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
Couldn't use the hugepage speed optimization for small table.  Enable huge pages for a slight speed boost
[WORKER2] GoGoGo!
[WORKER3] GoGoGo!
[WORKER0] GoGoGo!
[WORKER1] GoGoGo!
connecting to 54.201.26.128:1337
Mining for approx 60 seconds to support further development
Payments to: Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc
[MASTER] work received - sharetarget: 03ffffffffffffffffffffffffffffffffffffffffffffffffffffffbeefde4d
Segmentation fault (core dumped)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: mmao on January 25, 2014, 10:50:49 am
Hm.  What happens if you run with only 1 thread?

What happens if you first run, as root:
   echo 2048 > /proc/sys/vm/nr_hugepages

and then run with only one thread?

If it works with 1 thread, does it also work with 2?

  -Dave

after executing "echo 2048 > /proc/sys/vm/nr_hugepages", it works with 4 threads!!!
and with 4 threads sse4 mode on my i7-920@2.67GHZ, it gives about 170 cpm
it's quite good, thank you!
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 25, 2014, 08:18:14 pm
I've placed beta8 for haswell/avx2 and now for avxsse online.  It fixes (I believe) the bug that was causing failures when not enough hugepages were available, and incorporates the latest round of speed improvements for Haswell/AVX2.  Speed isn't changed much for avxsse, but I'm narrowing in on some more general improvements that should help there too.

http://www.cs.cmu.edu/~dga/ptsminer/

This is a quite worthwhile upgrade for the Haswell/AVX2 crowd.  Expect at least a 20cpm jump and probably more - I'm still letting the cpm benchmarks run, but my dev benchmarks suggest somewhere between a 5-10% speedup over beta7.  I'll update this post tonight with some actual CPM numbers from an i7-4770.

Update 2:
Totally rough guesstimate:

[STATS] 2014-Jan-25 17:34:57 | 570.1 c/m | 8.9 sh/m | VL: 299 (99.7%), RJ: 1 (0.3%), ST: 0 (0.0%)

I expect sustained rates of 565 c/m over a longer period of time.  Not bad, little CPU, not bad.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 25, 2014, 11:20:41 pm
haswell e3-1230 v3 upgraded.

beta7 avx2 cpm : 530
beta8 avx2 cpm : 542

It's now faster than my 530 cpm GT 560!

by the way, seems ptsminer work 60 s for developer then 200 s for miner.
next round 1200 s for developer and 40,000 s for miner.

That's good if I'm running a server and never power off.

But when I run ptsminer on my desktop, in most case I run about 20,000 s
then power off.

So 1200 / 20000 = 6%.

for avx2 even 6% the ptsminer still far better than yam, so just for your information.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 26, 2014, 12:40:56 am
haswell e3-1230 v3 upgraded.

beta7 avx2 cpm : 530
beta8 avx2 cpm : 542

It's now faster than my 530 cpm GT 560!

by the way, seems ptsminer work 60 s for developer then 200 s for miner.
next round 1200 s for developer and 40,000 s for miner.

That's good if I'm running a server and never power off.

But when I run ptsminer on my desktop, in most case I run about 20,000 s
then power off.

So 1200 / 20000 = 6%.

for avx2 even 6% the ptsminer still far better than yam, so just for your information.

Glad to hear it's running well on the E3.

Noted about the devmine fee.  I'll fix that in a few betas.  What it really should be is an exponentially increasing sequence (with a cap) -- dev 60, user 2000;  dev 120, user 4000;  dev 240, user 8000; etc., which would reduce the problem you're seeing if you kill at exactly the wrong time, while still reducing the amount of interruption due to mining switches.  There are a few other things I want to do to make the dev mining more robustly fair under disconnects/etc., which is why I haven't just thrown out the exponential version.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on January 30, 2014, 12:19:43 am
Hi dga,

After 3 or 4 days, the v8 avx slow down from 320 to 260 cpm.
v8 avx2 not sure since run as daemon, but profit seems drop to 75%。

kill the ptsminer and restart seems fix it.

I'm not sure what's happening, since the yam get a lot of reject from beeeeer.org too.

yam on 1GH has no problem using new xpt2h protocol and port 18120.

Now ptsminer is beeeeer.org lock-in.

Any plan to support xpt2h protocol?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on January 30, 2014, 11:04:27 am
Hi dga,

After 3 or 4 days, the v8 avx slow down from 320 to 260 cpm.
v8 avx2 not sure since run as daemon, but profit seems drop to 75%。

kill the ptsminer and restart seems fix it.

I'm not sure what's happening, since the yam get a lot of reject from beeeeer.org too.

yam on 1GH has no problem using new xpt2h protocol and port 18120.

Now ptsminer is beeeeer.org lock-in.

Any plan to support xpt2h protocol?

I haven't seen avx2 slow down, but beeeeer has had a bad string of luck lately with block finding - my profit is also down a fair bit. 

My own avx beta8 client hasn't slowed down:

2014-Jan-25 17:00:19 | 760.0 c/m | 14.1 sh/m
2014-Jan-30 06:00:07 | 774.9 c/m | 12.2 sh/m

but that doesn't mean there's not something wrong.  What CPU are you running the avx one on and with how many threads?

I do hope to add more protocol support.  I have real work taking up all of my time until this weekend, but I'll check out xpt2h then.

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: rots on February 01, 2014, 01:06:22 pm
Any plan to release windows bineary file?

Following up on yvg1900's release of yam, I figured I'd improve the state of the art of the open source versions a bit:

https://github.com/dave-andersen/ptsminer

I haven't made it build yet on windows (it just needs to compile the avx2 assembly code - should be straightforward if someone wants to clue me in on how to appropriately invoke gcc there), but it should work on other platforms.  As a warning, I've only really tried it on avx2, since I'm a fan of Haswell.  THIS SOFTWARE SHOULD BE CONSIDERED A BETA QUALITY RELEASE.  At best. 

As with my GPU release, this one is based very directly on ptsminer, so it's tied to beeeeer for the moment.  I plan to fix that and let it be used with other pools in the near future, but that's going to take some dev work.  sigh.

There's a lot of optimization to be done, but this gets the basics as far as memory subsystem optimization, and bridges a lot of the gap between the old OSS version and yam M7i.  I haven't tried out M7j, mind you -- it's probably a bit faster still, but this release should bridge the gap considerably.

It incorporates the same optional, extendible 1% dev fee that the gpu miner does.  Prior ptsminer devs, if you feel like you should be in the list, please PM me and I'll get you added!

With gratitude to FreeTrade for the donation that kept me interested in hacking on and releasing this stuff, and to yvg1900 for some very engaging unofficial competition. *grin*

  -Dave
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 01, 2014, 07:19:38 pm
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

This is a speed-boost release.  I'm still doing the benchmarking runs, but on my i7-4770, it's the first of my releases to crack 600 cpm.  Looks like it's going to settle in between 610 and 620 cpm with 7 threads running on my test box.

beta9 is haswell-only right now;  its optimizations are specific to avx2.  I plan to address some of the portability/pool selection issues soon (because I'm running out of great ideas for how to make this thing faster without getting ugly).
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 01, 2014, 08:24:32 pm
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

beta9 for AVX is also now online.  This one should be a good speed boost - I'm seeing my test machine go from about 780cpm to 1020cpm.

Note:  Unlike prior avxsse releases, this avx release really does require AVX.  It's compiled to target sandy bridge and higher.  I've changed the name of the binary to reflect this, and left the old avxsse one (which will run on sse4) online.

Direct link:  http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta9-avx-linux64-static.bin

Happy mining!

Update:  This one is producing very mixed results.  Try beta8 and beta9 and use whichever is better for you.  Beta9 is rocking on my AMD test CPU, but it seems slower on some others.  Definitely needs improvement still.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 01, 2014, 09:12:13 pm
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

beta9 for AVX is also now online.  This one should be a good speed boost - I'm seeing my test machine go from about 780cpm to 1020cpm.

Note:  Unlike prior avxsse releases, this avx release really does require AVX.  It's compiled to target sandy bridge and higher.  I've changed the name of the binary to reflect this, and left the old avxsse one (which will run on sse4) online.

Direct link:  http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta9-avx-linux64-static.bin

Happy mining!

Nice... how does this compare to the latest GPU mining?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 01, 2014, 09:40:24 pm
beta9 for AVX2 is now online in the usual place:  http://www.cs.cmu.edu/~dga/ptsminer/

beta9 for AVX is also now online.  This one should be a good speed boost - I'm seeing my test machine go from about 780cpm to 1020cpm.

Note:  Unlike prior avxsse releases, this avx release really does require AVX.  It's compiled to target sandy bridge and higher.  I've changed the name of the binary to reflect this, and left the old avxsse one (which will run on sse4) online.

Direct link:  http://www.cs.cmu.edu/~dga/ptsminer/ptsminer-dga-beta9-avx-linux64-static.bin

Happy mining!

Nice... how does this compare to the latest GPU mining?

I think I broke something.  This one is a lot better on my AMD test CPU and absolutely horrible on my Intel CPUs.  Back to the drawing board.  Beta8 is the one to stick with for Intel. (update:  beta9 is now working properly for Intel)

The haswell/AVX2 release is very solid and beats low-end GPUs:  It's sitting just above 600 c/m.  A cheap GPU (GT 640 GDDR5 -- $85) can get about 250 cpm.  The fastest ($600-$1000) get around 2000-2200cpm.  The GPUs are still ahead in cpm/$, but not by a shocking margin.  Haswell is 610cpm for $300, or about 2cpm/$.  An R9 290x is 2200cpm/$610 = 3.6cpm/$.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 01, 2014, 09:57:11 pm
Well, I'll be.  I guess we're entering the CPU mess zone.  (Deleted old post)

Solved, thanks to some help from mikaelh_ on #beeeeer. 

There's now only one binary, but on AMD, run with sse4 explicitly:

./ptsminer...   <addr>  <threads>  sse4

You'll be much happier than with avx.  For Intel, auto-detect works, and avx is better.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 02, 2014, 12:03:20 am
Quote
The haswell/AVX2 release is very solid and beats low-end GPUs:  It's sitting just above 600 c/m.  A cheap GPU (GT 640 GDDR5 -- $85) can get about 250 cpm.  The fastest ($600-$1000) get around 2000-2200cpm.  The GPUs are still ahead in cpm/$, but not by a shocking margin.  Haswell is 610cpm for $300, or about 2cpm/$.  An R9 290x is 2200cpm/$610 = 3.6cpm/$.

Considering you can build a high end CPU miner for less than the cost of a high end GPU miner I would have to contend that momentum has served its intended goals quite well. 
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on February 02, 2014, 02:40:51 am
haswell e1230-v3 avx2 beta9 upgraded, cpm : 595

[STATS] 2014-Feb-02 10:35:29 | 595.1 c/m | 8.8 sh/m | VL: 1004 (99.5%), RJ: 5 (0.5%), ST: 0 (0.0%)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on February 02, 2014, 04:11:08 am
It runs at 599.6 c/m for at long time and I'm really panic.

thank god now cpm : 600.5.

[STATS] 2014-Feb-02 12:08:19 | 600.5 c/m | 9.3 sh/m | VL: 1885 (99.5%), RJ: 9 (0.5%), ST: 0 (0.0%)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 02, 2014, 05:15:42 pm
Any chance I can get the code from this miner integrated with bitshares/src/momentum.cpp  API?

I would be willing to pay a reasonable number of PTS for the work. 

API:   
Code: [Select]
std::vector< std::pair<uint32_t,uint32_t> > momentum_search( pow_seed_type head )
Thoughts?
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 02, 2014, 09:52:57 pm
Any chance I can get the code from this miner integrated with bitshares/src/momentum.cpp  API?

I would be willing to pay a reasonable number of PTS for the work. 

API:   
Code: [Select]
std::vector< std::pair<uint32_t,uint32_t> > momentum_search( pow_seed_type head )
Thoughts?

Sure, I'm happy to figure out a value that works.

Let me lay out the catch a little bit:  The compilation chain is ugly because I generate a few CPU-specific chunks of code.  I can put all of that in a repository, and by outputting assembly from the first step, it could all be compilable by gcc -- or from the original source if someone installed some other compiler support tools.

There are really two major contributions that make it fast:
  - Some algorithmic changes that make the memory-hard parts faster;
  - A re-implementation of the sha512 code for AVX2;
  - An AVX/SSE implementation of other high-performance parts of the code.

The algorithmic changes are easy and will make any codebase faster and use less memory.  The nitty gritty implementation bits start to get architecture specific.  But I'm happy to include them.

The only drawback from my perspective is that the AVX2 SHA512 changes are also very pertinent to making Memorycoin faster, and I haven't yet started writing a miner for that one.  *grins*  But I'm willing to be scooped.

Same license as the original momentum is fine.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 02, 2014, 09:56:17 pm
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk (http://tapatalk.com/m?id=1)
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 02, 2014, 10:03:48 pm
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk (http://tapatalk.com/m?id=1)

That's do-able.  It's mostly just making sure that compilation isn't a mess.  Basically, I'd modularize it as:

  generate_sha512(buf, num_hashes, starting_nonce);

And as long as there was a version of generate_sha512 that worked reasonably well, it would be fine.

The tweaks:  My first reaction is adding more branches/conditionals to cause warp divergence on a SIMD machine.  That would slow down the CPUs, too, of course, but it would be really painful for the GPUs.  I'm not sure exactly where I'd add them.  Possibly a changed sha512 core with a slightly variable or tweaked number of rounds depending on something in the input that caused divergence on a per-nonce basis.  It'd make it more evil, at least.  If you could force all 16 or 32 units in the vector to have to diverge early on and remain diverged through the end of the sha512, you'd slow the GPUs by a factor of 16 or 32 while only slowing the CPU down by 2-4x.

(PTS deposits:  Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: Coindgr on February 02, 2014, 10:57:46 pm
Will this be released to windows?
I hope so
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 03, 2014, 12:46:45 am
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk (http://tapatalk.com/m?id=1)

That's do-able.  It's mostly just making sure that compilation isn't a mess.  Basically, I'd modularize it as:

  generate_sha512(buf, num_hashes, starting_nonce);

And as long as there was a version of generate_sha512 that worked reasonably well, it would be fine.

The tweaks:  My first reaction is adding more branches/conditionals to cause warp divergence on a SIMD machine.  That would slow down the CPUs, too, of course, but it would be really painful for the GPUs.  I'm not sure exactly where I'd add them.  Possibly a changed sha512 core with a slightly variable or tweaked number of rounds depending on something in the input that caused divergence on a per-nonce basis.  It'd make it more evil, at least.  If you could force all 16 or 32 units in the vector to have to diverge early on and remain diverged through the end of the sha512, you'd slow the GPUs by a factor of 16 or 32 while only slowing the CPU down by 2-4x.

Well you know this stuff better than most, so come up with something solid and we will include it. 
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 03, 2014, 01:04:12 am
As long as the high-performance mode has a fallback option to low performance the instructions are not supported. 

Are there any small tweaks that you can think of that would make it harder for a gpu?   


Sent from my iPhone using Tapatalk (http://tapatalk.com/m?id=1)

That's do-able.  It's mostly just making sure that compilation isn't a mess.  Basically, I'd modularize it as:

  generate_sha512(buf, num_hashes, starting_nonce);

And as long as there was a version of generate_sha512 that worked reasonably well, it would be fine.

The tweaks:  My first reaction is adding more branches/conditionals to cause warp divergence on a SIMD machine.  That would slow down the CPUs, too, of course, but it would be really painful for the GPUs.  I'm not sure exactly where I'd add them.  Possibly a changed sha512 core with a slightly variable or tweaked number of rounds depending on something in the input that caused divergence on a per-nonce basis.  It'd make it more evil, at least.  If you could force all 16 or 32 units in the vector to have to diverge early on and remain diverged through the end of the sha512, you'd slow the GPUs by a factor of 16 or 32 while only slowing the CPU down by 2-4x.

Well you know this stuff better than most, so come up with something solid and we will include it.

Sounds good.

Here's what I propose.  Perhaps surprisingly, it's taken way more work to do the fast CPU implementation of PTS than the basic GPU implementation.  Instead of going by my consulting rates (grin), I'll admit that I did it for fun, too, and judge it be about 450 PTS worth of work based upon the previous rates you were offering, and about 50PTS more work to actually manage the integration into momentum.cpp, since it differs substantially from the codebase I've been developing on.

Instead of having it all in one chunk, though, I think it makes more sense to split it in half for two different deliverables to help reduce risk and get something in your hands faster:

(a)  Algorithmic improvements to mining that are completely platform-independent.  (250).
(b)  Platform-optimized implementation for sse4, avx, and avx2, delivered as GNU assembly code along with original source code files to generate that assembly.  (250)

Both documented, of course.

I think I can get (a) done reasonably straightforwardly.  For (b), I'll need to spend more time understanding the Makefile setup for it so that I can integrate it without breaking things.

As a nitpicky note based upon the copyright issues that arose in my previous release, just to be up front:  Like other high-performance miners, for everything but avx2, I use the Intel sha512 implementation.  Its license is compatible (redistributions must include the copyright notice).  The code I'd integrate into momentum.cpp is entirely my own at this point, and I'd simply integrate it under the existing license.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 03, 2014, 03:24:06 am
Sounds fair enough as long as the result is a pull request that simply works with CMake.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 03, 2014, 10:44:21 pm
Sounds fair enough as long as the result is a pull request that simply works with CMake.

First batch of changes are now in a pull request to you:

https://github.com/InvictusInnovations/ProtoShares/pull/8

I spent a lot of time today thinking about this one for how to provide the best balance of performance improvement while ensuring that the reference code is as easy for people to use as possible on any platform of their choice.  As a consequence, I've refactored some of the algorithmic changes a little to try to make the best use of the existing SHA512 from OpenSSL.  I'm going to do a set of benchmark runs tomorrow to determine how much of a benefit there is on non-AVX2/Haswell platforms to being more architecture specific.  If the results don't justify making the build bad, I'll put in the AVX2 changes in a small module that people can integrate on their own if they wish, but that won't touch anything in the build.  If they're good, I'll do deeper modifications.

The current changes preserve the exact interface and code structure of the existing momentum_search, per your request.  They don't touch anything outside of the mining core code.  The results are about an 8x speedup on my test platform using about 50% of the memory.  4x of that speedup and all of the memory savings comes from the algorithmic improvements;  2x comes from testing the nonces in both directions when evaluating the collision.

I put some performance evaluation numbers in the pull request, but to briefly summarize, before the changes, each thread was taking about 28-28 seconds to do one execution of momentum_search.  After the changes, they take 7-8.

Before:
 83.10%  bitcoind  bitcoind                   [.] bts::momentum_search(uint256)
 12.95%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006d764

Only 13% of the time was being spent in computing SHA512 hashes.  After:

 70.36%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006cece

(update, forgot to give my PTS address:   Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )

   -Dave

[Update 2:  As another way to view the stats, quad-core i7-4770 is doing:

dga@homewell:~/coin/ProtoShares/src$ ./bitcoind getmininginfo
{
    "blocks" : 47838,
    "currentblocksize" : 5063,
    "currentblocktx" : 18,
    "difficulty" : 0.01374487,
    "errors" : "",
    "generate" : true,
    "genproclimit" : -1,
    "collisionspermin" : 240.86192739,
    "pooledtx" : 31,
    "testnet" : false
}

with no AVX2 optimizations, so this speed is probably what one might expect on an sse or avx platform.  Quite a bit faster than the default code.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 04, 2014, 12:03:48 am
Sounds fair enough as long as the result is a pull request that simply works with CMake.

First batch of changes are now in a pull request to you:

https://github.com/InvictusInnovations/ProtoShares/pull/8

I spent a lot of time today thinking about this one for how to provide the best balance of performance improvement while ensuring that the reference code is as easy for people to use as possible on any platform of their choice.  As a consequence, I've refactored some of the algorithmic changes a little to try to make the best use of the existing SHA512 from OpenSSL.  I'm going to do a set of benchmark runs tomorrow to determine how much of a benefit there is on non-AVX2/Haswell platforms to being more architecture specific.  If the results don't justify making the build bad, I'll put in the AVX2 changes in a small module that people can integrate on their own if they wish, but that won't touch anything in the build.  If they're good, I'll do deeper modifications.

The current changes preserve the exact interface and code structure of the existing momentum_search, per your request.  They don't touch anything outside of the mining core code.  The results are about an 8x speedup on my test platform using about 50% of the memory.  4x of that speedup and all of the memory savings comes from the algorithmic improvements;  2x comes from testing the nonces in both directions when evaluating the collision.

I put some performance evaluation numbers in the pull request, but to briefly summarize, before the changes, each thread was taking about 28-28 seconds to do one execution of momentum_search.  After the changes, they take 7-8.

Before:
 83.10%  bitcoind  bitcoind                   [.] bts::momentum_search(uint256)
 12.95%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006d764

Only 13% of the time was being spent in computing SHA512 hashes.  After:

 70.36%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006cece

(update, forgot to give my PTS address:   Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )

   -Dave

[Update 2:  As another way to view the stats, quad-core i7-4770 is doing:

dga@homewell:~/coin/ProtoShares/src$ ./bitcoind getmininginfo
{
    "blocks" : 47838,
    "currentblocksize" : 5063,
    "currentblocktx" : 18,
    "difficulty" : 0.01374487,
    "errors" : "",
    "generate" : true,
    "genproclimit" : -1,
    "collisionspermin" : 240.86192739,
    "pooledtx" : 31,
    "testnet" : false
}

with no AVX2 optimizations, so this speed is probably what one might expect on an sse or avx platform.  Quite a bit faster than the default code.


There seems to have been a misunderstanding :)   I was looking for an update to the BitSHares repository for the same method.   
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 04, 2014, 12:06:29 am
Sounds fair enough as long as the result is a pull request that simply works with CMake.

First batch of changes are now in a pull request to you:

https://github.com/InvictusInnovations/ProtoShares/pull/8

I spent a lot of time today thinking about this one for how to provide the best balance of performance improvement while ensuring that the reference code is as easy for people to use as possible on any platform of their choice.  As a consequence, I've refactored some of the algorithmic changes a little to try to make the best use of the existing SHA512 from OpenSSL.  I'm going to do a set of benchmark runs tomorrow to determine how much of a benefit there is on non-AVX2/Haswell platforms to being more architecture specific.  If the results don't justify making the build bad, I'll put in the AVX2 changes in a small module that people can integrate on their own if they wish, but that won't touch anything in the build.  If they're good, I'll do deeper modifications.

The current changes preserve the exact interface and code structure of the existing momentum_search, per your request.  They don't touch anything outside of the mining core code.  The results are about an 8x speedup on my test platform using about 50% of the memory.  4x of that speedup and all of the memory savings comes from the algorithmic improvements;  2x comes from testing the nonces in both directions when evaluating the collision.

I put some performance evaluation numbers in the pull request, but to briefly summarize, before the changes, each thread was taking about 28-28 seconds to do one execution of momentum_search.  After the changes, they take 7-8.

Before:
 83.10%  bitcoind  bitcoind                   [.] bts::momentum_search(uint256)
 12.95%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006d764

Only 13% of the time was being spent in computing SHA512 hashes.  After:

 70.36%  bitcoind  libcrypto.so.1.0.0         [.] 0x000000000006cece

(update, forgot to give my PTS address:   Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc   )

   -Dave

[Update 2:  As another way to view the stats, quad-core i7-4770 is doing:

dga@homewell:~/coin/ProtoShares/src$ ./bitcoind getmininginfo
{
    "blocks" : 47838,
    "currentblocksize" : 5063,
    "currentblocktx" : 18,
    "difficulty" : 0.01374487,
    "errors" : "",
    "generate" : true,
    "genproclimit" : -1,
    "collisionspermin" : 240.86192739,
    "pooledtx" : 31,
    "testnet" : false
}

with no AVX2 optimizations, so this speed is probably what one might expect on an sse or avx platform.  Quite a bit faster than the default code.


There seems to have been a misunderstanding :)   I was looking for an update to the BitSHares repository for the same method.

Repository URL?
update:  Ahh, you mean this one?

https://github.com/InvictusInnovations/bitshares

Confirm and I'll get the patch done.  Should be straightforward.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 04, 2014, 12:14:57 am
yes
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 04, 2014, 12:19:21 am
yes

Cool.  I'll need to change a little bit of the code to match the style, but should have it done soon.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 04, 2014, 01:54:57 am
yes

Original source, looking at time spent in momentum_pow_test:
   User time (seconds): 5.24
   Maximum resident set size (kbytes): 1050652
       43.84%  momentum_pow_te  libcrypto.so.1.0.0  [.] 0x000000000006c729

With patch 1 for algorithmic changes:
   User time (seconds): 4.38
   Maximum resident set size (kbytes): 528452
       71.51%  momentum_pow_te  libcrypto.so.1.0.0   [.] 0x000000000006d29f

The reason it's not quite as much faster is that it's spending a little more time in allocation in the sha512 routine, which is being used differently from the one in PTS.  I'll clean that up as part of the sha512 optimizations in chunk-of-work #2.  That part is straightforward engineering.

Probably the biggest benefit to the current version is that, as shown above, it uses half the memory and is about 20% faster with no changes to the crypto or any other libraries.

Sending pull request now.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 04, 2014, 03:37:18 am
yes

Original source, looking at time spent in momentum_pow_test:
   User time (seconds): 5.24
   Maximum resident set size (kbytes): 1050652
       43.84%  momentum_pow_te  libcrypto.so.1.0.0  [.] 0x000000000006c729

With patch 1 for algorithmic changes:
   User time (seconds): 4.38
   Maximum resident set size (kbytes): 528452
       71.51%  momentum_pow_te  libcrypto.so.1.0.0   [.] 0x000000000006d29f

The reason it's not quite as much faster is that it's spending a little more time in allocation in the sha512 routine, which is being used differently from the one in PTS.  I'll clean that up as part of the sha512 optimizations in chunk-of-work #2.  That part is straightforward engineering.

Probably the biggest benefit to the current version is that, as shown above, it uses half the memory and is about 20% faster with no changes to the crypto or any other libraries.

Sending pull request now.

When I pulled this change in it stopped finding matches...
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 04, 2014, 07:06:26 am
yes

Original source, looking at time spent in momentum_pow_test:
   User time (seconds): 5.24
   Maximum resident set size (kbytes): 1050652
       43.84%  momentum_pow_te  libcrypto.so.1.0.0  [.] 0x000000000006c729

With patch 1 for algorithmic changes:
   User time (seconds): 4.38
   Maximum resident set size (kbytes): 528452
       71.51%  momentum_pow_te  libcrypto.so.1.0.0   [.] 0x000000000006d29f

The reason it's not quite as much faster is that it's spending a little more time in allocation in the sha512 routine, which is being used differently from the one in PTS.  I'll clean that up as part of the sha512 optimizations in chunk-of-work #2.  That part is straightforward engineering.

Probably the biggest benefit to the current version is that, as shown above, it uses half the memory and is about 20% faster with no changes to the crypto or any other libraries.

Sending pull request now.

When I pulled this change in it stopped finding matches...

Whoops - thanks, I'd misunderstood test_momentum_pow.

I've fixed it in a second pull request.  It was a missing enc.reset().

Interestingly, you'll find that my version now finds a few more collisions than the original code did, which should produce a further speed-up.  These collisions verify.

Old:

3522368ms th_a       momentum_test.cpp:29          main                 ] [[25908781,36251059],[36251059,25908781],[14409167,49012845],[49012845,14409167],[32190345,58604277],[58604277,32190345],[11166445,59732725],[59732725,11166445],[41830614,64427554],[64427554,41830614]]

   User time (seconds): 5.09

New:

/usr/bin/time --verbose ./tests/momentum_pow_test  5959592
98735ms th_a       momentum_test.cpp:29          main                 ] [[29995035,64113291],[64113291,29995035],[41830614,64427554],[64427554,41830614],[32190345,58604277],[58604277,32190345],[11166445,59732725],[59732725,11166445],[14409167,49012845],[49012845,14409167],[25908781,36251059],[36251059,25908781]]

   User time (seconds): 3.43

Sorry for the double-try on that one.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 10, 2014, 03:13:15 pm
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770k.  Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: bytemaster on February 10, 2014, 04:13:46 pm
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770k.  Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.

Nice... not bad momentum POW, not bad...  looks like CPU mining of PTS will remain viable for a long time to come.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: yvg1900 on February 10, 2014, 04:50:05 pm
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770k.  Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.

Curious to compare it against yam M7m on same config. That one shall hit 700+ cpm on that mach.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: dga on February 10, 2014, 04:50:46 pm
I have released beta10 for avx2 with others to follow.  Note that because beeeeer has increased the difficulty target, share/min numbers are now lower than they used to be, so comparing CPS is probably the most useful metric.

http://www.cs.cmu.edu/~dga/ptsminer/

This one's getting in the 650-660 range when run on 8 threads on an i7-4770.  (updated:  not a k, sorry, just the normal 4770) Note that it's now fastest to run 8 threads, not 7, though it kinda destroys interactive use of your computer. :-)  @ptsrush, this one should keep you very happily over 600.  I haven't finished benchmarking it yet.  It's faster based upon internal metrics, but it'll take a bit to see how it shakes out in cpm.

[STATS] 2014-Feb-10 10:23:54 | 657.7 c/m | 2.8 sh/m | VL: 43 (100.0%), RJ: 0 (0.0%), ST: 0 (0.0%)

Not bad, little CPU, not bad.
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: ptsrush on February 10, 2014, 09:49:06 pm
on my haswell e3-1230 v3, the yam-M7m is about 640 cpm,
the ptsminer-avx2-beta10 is almost same.

At present I prefer yam-M7m since it

1. as fast as ptsminer
2. support windows so I can run same software/config on
   all my computer
3. support more protocol to avoid pool lock-in.
4. 1% dev fee.
5. I can setup a backup pool/coin
Title: Re: Open source optimized PTS CPU miner (BETA)
Post by: barwizi on February 19, 2014, 10:55:19 pm
can this be adapated to work with the client?