Author Topic: Open source optimized PTS CPU miner (BETA) (Read 144575 times)

honger18 · « **Reply #26 on:** January 16, 2014, 01:40:39 pm »

Hi dga

Quote

Are you using an older version of gcc? If so, try upgrading

I have
gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9)
the latest available on ubuntu. is that too old ?

Quote

or I can explain how to remove the need for the avx2 code if you're not on a Haswell CPU.

I have to admit I'm not sure if I have a HASWELL Cpu, I have the following processor.

Code: [Select]

model name      : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

If it's an easy fix, e.g. to comment some code out I'd appreciate it.

p.s. thanks for all the effort, I really appreciate keeping the source open !

dga · « **Reply #25 on:** January 16, 2014, 12:18:41 pm »

Quote from: honger18 on January 16, 2014, 10:15:17 am

I got the following trying to compile...
anyone know how to fix this ? I'm on a 32-bit kernel(PAE), not sure if that's relevant...

Code: [Select]
~/comps/ptsminer/src$ make -f makefile.unix cc -c -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o intel/sha512_avx2.S: Assembler messages: intel/sha512_avx2.S:64: Error: bad expression intel/sha512_avx2.S:64: Error: junk at end of line, first unrecognized character is `y' intel/sha512_avx2.S:67: Error: bad expression intel/sha512_avx2.S:67: Error: junk at end of line, first unrecognized character is `y' intel/sha512_avx2.S:70: Error: bad expression intel/sha512_avx2.S:70: Error: junk at end of line, first unrecognized character is `r' intel/sha512_avx2.S:72: Error: bad expression intel/sha512_avx2.S:72: Error: junk at end of line, first unrecognized character is `r' ... intel/sha512_avx2.S:784: Error: bad register name `%rsp)' intel/sha512_avx2.S:785: Error: bad register name `%rsp)' intel/sha512_avx2.S:788: Error: bad register name `%rsp)' make: *** [obj/sha512_avx2.o] Error 1

Are you using an older version of gcc? If so, try upgrading -- or I can explain how to remove the need for the avx2 code if you're not on a Haswell CPU.

honger18 · « **Reply #24 on:** January 16, 2014, 10:15:17 am »

I got the following trying to compile...
anyone know how to fix this ? I'm on a 32-bit kernel(PAE), not sure if that's relevant...

Code: [Select]

~/comps/ptsminer/src$ make -f makefile.unix
cc -c -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o
intel/sha512_avx2.S: Assembler messages:
intel/sha512_avx2.S:64: Error: bad expression
intel/sha512_avx2.S:64: Error: junk at end of line, first unrecognized character is `y'
intel/sha512_avx2.S:67: Error: bad expression
intel/sha512_avx2.S:67: Error: junk at end of line, first unrecognized character is `y'
intel/sha512_avx2.S:70: Error: bad expression
intel/sha512_avx2.S:70: Error: junk at end of line, first unrecognized character is `r'
intel/sha512_avx2.S:72: Error: bad expression
intel/sha512_avx2.S:72: Error: junk at end of line, first unrecognized character is `r'
...
intel/sha512_avx2.S:784: Error: bad register name `%rsp)'
intel/sha512_avx2.S:785: Error: bad register name `%rsp)'
intel/sha512_avx2.S:788: Error: bad register name `%rsp)'
make: *** [obj/sha512_avx2.o] Error 1

allano · « **Reply #23 on:** January 16, 2014, 06:37:39 am »

Hi Dave,

I get the following error:

Code: [Select]

Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
Could not mmap hugepage, reverting to malloc: Cannot allocate memory

I have tried the following:

Code: [Select]

echo "2048" > /proc/sys/vm/nr_hugepages
vm.nr_hugepages = 2048 in sysctl.conf
reboot

The server has the following data:

Code: [Select]

16GB RAM
AnonHugePages:         0 kB
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Ubuntu 12.04 64bit

I have your ptsminer compiled with makefile.unix
I start your ptsminer with ./ptsminer <adress> 8 avx

if you need more information, just tell me which

ptsrush · « **Reply #22 on:** January 16, 2014, 01:56:54 am »

Hi Dave,

I'm running your cudapts and donating for a while and it runs great.

I'd like to try your cpu miner on my haswell machine, but it requires boost-1.53.0.

I have only boost-1.52.0 installed and a lot of software depends on it.

Of course I can upgrade to 1.53.0 and rebuild my gentoo system, but I think maybe
you could build a static linked file, so it depends only glibc.

If you do that your miner will run on almost any linux box, just as the yam did.

thank you very much.

dga · « **Reply #21 on:** January 15, 2014, 12:03:07 pm »

Quote from: gordonhucn on January 15, 2014, 09:18:21 am

Quote from: dga on January 15, 2014, 06:10:52 am
Quote from: gordonhucn on January 15, 2014, 01:06:24 am
great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~

Speaking of avx2, I've been targeting it a little in a build I'm working on. If people want to test it, there's a linux binary-only build available for avx2 CPUs at:

http://www.cs.cmu.edu/~dga/ptsminer-dga-adv-avx2-linux64.bin

I believe this one substantially outperforms yam, but only for avx2. I'd be curious if others can confirm.

-Dave
avx/sse performs more or less the same as x86 instruction if SIMD is not used, those bit operations(80 round calulation etc.) must be done with intel intrinsics SIMD to get a great performance boost.

Yes. The reason this one is a binary-only build is that I rewrote the sha512 generating code. It was previously using Intel's hand-optimized avx2 SIMD code. Now it's using my own. The binary I linked is using my own version, also SIMD using avx2, which is faster. It's binary-only in part because I haven't made it anywhere near easy for anyone else to compile it - or made a version that is generalized across architectures. This one is a toy for Haswell owners for now while I play with it more.

-Dave

jernau · « **Reply #20 on:** January 15, 2014, 10:29:32 am »

Using the following CPU (Intel Xeon(R) CPU E5506 @ 2.13GHz)

Code: [Select]

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Stepping:              5
CPU MHz:               2133.408
BogoMIPS:              4266.81
Hypervisor vendor:     Xen
Virtualization type:   para
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0,1

Running on 2 threads, with hugepages disabled, sse4 mode, I'm getting ~50cpm vs ~35cpm for xolokram's ptsminer.

gordonhucn · « **Reply #19 on:** January 15, 2014, 09:18:21 am »

Quote from: dga on January 15, 2014, 06:10:52 am

Quote from: gordonhucn on January 15, 2014, 01:06:24 am
great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~

Going to work on it. I'm having more fun with the algorithms than with the stuff around it - patches accepted to be able to support either pool. :-)

Speaking of avx2, I've been targeting it a little in a build I'm working on. If people want to test it, there's a linux binary-only build available for avx2 CPUs at:

http://www.cs.cmu.edu/~dga/ptsminer-dga-adv-avx2-linux64.bin

I'll be releasing the code for this series once I've finished ironing out the kinks, but I'd be curious to know if people are seeing the same kind of speed gains that I am on their Intel Haswell-based systems. On mine - a stock i7-4770 running at normal clock rates - I see:

[STATS] 2014-Jan-15 01:09:04 | 356.8 c/m | 5.5 sh/m | VL: 1195 (98.6%), RJ: 17 (1.4%), ST: 0 (0.0%)

Because it's a preview dev build, I changed the mining fee a little - it's higher just for this one - 5% - but in big blocks (200 seconds dev and then 4000 seconds user) so that it has more time to stabilize so you can see the true cpm rate.

I believe this one substantially outperforms yam, but only for avx2. I'd be curious if others can confirm.

-Dave

avx/sse performs more or less the same as x86 instruction if SIMD is not used, those bit operations(80 round calulation etc.) must be done with intel intrinsics SIMD to get a great performance boost.

allano · « **Reply #18 on:** January 15, 2014, 07:15:12 am »

I have Ubuntu 12.04

I've done the following

sudo apt-get install build-essential libboost-system-dev libboost-filesystem-dev libboost-program-options-dev libboost-thread-dev zlib1g-dev yasm

git clone https://github.com/dave-andersen/ptsminer

cd ptsminer/src

make -f makefile.unix.no-chrono

Code: [Select]

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
obj/sha512.o: In function `Init_SHA512_avx2':
sha512.c:(.text+0x27): undefined reference to `sha512_transform_rorx'
sha512.c:(.text+0x32): undefined reference to `sha512_transform_single_rorx'
collect2: ld returned 1 exit status
make: *** [ptsminer] Error 1

I hope you can help me.

Edit:
I have fixed the Problem

sudo aptitude install libboost-system1.48-dev libboost-filesystem1.48-dev libboost-program-options1.48-dev libboost-thread1.48-dev libboost-chrono1.48-dev

make -f makefile.unix

dga · « **Reply #17 on:** January 15, 2014, 06:10:52 am »

Quote from: gordonhucn on January 15, 2014, 01:06:24 am

great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~

Going to work on it. I'm having more fun with the algorithms than with the stuff around it - patches accepted to be able to support either pool. :-)

Speaking of avx2, I've been targeting it a little in a build I'm working on. If people want to test it, there's a linux binary-only build available for avx2 CPUs at:

http://www.cs.cmu.edu/~dga/ptsminer-dga-adv-avx2-linux64.bin

I'll be releasing the code for this series once I've finished ironing out the kinks, but I'd be curious to know if people are seeing the same kind of speed gains that I am on their Intel Haswell-based systems. On mine - a stock i7-4770 running at normal clock rates - I see:

[STATS] 2014-Jan-15 01:09:04 | 356.8 c/m | 5.5 sh/m | VL: 1195 (98.6%), RJ: 17 (1.4%), ST: 0 (0.0%)

Because it's a preview dev build, I changed the mining fee a little - it's higher just for this one - 5% - but in big blocks (200 seconds dev and then 4000 seconds user) so that it has more time to stabilize so you can see the true cpm rate.

I believe this one substantially outperforms yam, but only for avx2. I'd be curious if others can confirm.

-Dave

gordonhucn · « **Reply #16 on:** January 15, 2014, 01:06:24 am »

great works, can you add ypool support(from the official jhProtominer v0.1e) to it?
btw. according to intel, avx can do two sha512 and avx2 can do four at the same time~~

earntodie · « **Reply #15 on:** January 13, 2014, 09:50:06 pm »

Cool. Good luck!

relm9 · « **Reply #14 on:** January 13, 2014, 07:20:15 pm »

Getting 390-400 cpm on an i7 4770k @ 4ghz, nice

archit · « **Reply #13 on:** January 13, 2014, 06:34:25 pm »

Quote from: dga on January 13, 2014, 05:56:41 pm

Quote from: archit on January 13, 2014, 05:54:34 pm
Quote from: dga on January 13, 2014, 05:54:03 pm
Quote from: archit on January 13, 2014, 05:52:25 pm
Why not AVX2 for SHA256 as well?

Why bother?

Won't improve anything?

There are 4-10 hash collisions per group of 2^23 SHA512 hashes that have to be pushed through SHA256. Making SHA256 faster would make 1/1,000,000th of the computation faster. :-)

dga · « **Reply #12 on:** January 13, 2014, 05:56:41 pm »

Quote from: archit on January 13, 2014, 05:54:34 pm

Quote from: dga on January 13, 2014, 05:54:03 pm
Quote from: archit on January 13, 2014, 05:52:25 pm
Why not AVX2 for SHA256 as well?

Why bother?

Won't improve anything?

There are 4-10 hash collisions per group of 2^23 SHA512 hashes that have to be pushed through SHA256. Making SHA256 faster would make 1/1,000,000th of the computation faster. :-)

Author Topic: Open source optimized PTS CPU miner (BETA) (Read 144575 times)

honger18

Re: Open source optimized PTS CPU miner (BETA)

dga

Re: Open source optimized PTS CPU miner (BETA)

honger18

Re: Open source optimized PTS CPU miner (BETA)

allano

Re: Open source optimized PTS CPU miner (BETA)

ptsrush

Re: Open source optimized PTS CPU miner (BETA)

dga

Re: Open source optimized PTS CPU miner (BETA)

jernau

Re: Open source optimized PTS CPU miner (BETA)

gordonhucn

Re: Open source optimized PTS CPU miner (BETA)

allano

Re: Open source optimized PTS CPU miner (BETA)

dga

Re: Open source optimized PTS CPU miner (BETA)

gordonhucn

Re: Open source optimized PTS CPU miner (BETA)

earntodie

Re: Open source optimized PTS CPU miner (BETA)

relm9

Re: Open source optimized PTS CPU miner (BETA)

archit

Re: Open source optimized PTS CPU miner (BETA)

dga

Re: Open source optimized PTS CPU miner (BETA)