Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - dga

Pages: 1 2 [3] 4 5 6 7 8 9
31
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 18, 2014, 02:00:28 pm »
I got some error messages when I compile the source:
Code: [Select]
cc -c -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o
intel/sha512_avx2.S: Assembler messages:
intel/sha512_avx2.S:606: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:607: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:608: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:609: Error: suffix or operands invalid for `vpshufb'
intel/sha512_avx2.S:618: Error: suffix or operands invalid for `vpaddq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpalignr'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpaddq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpalignr'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsrlq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsllq'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpor'
intel/sha512_avx2.S:620: Error: suffix or operands invalid for `vpsrlq'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $41,e,y0'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $18,e,y1'
intel/sha512_avx2.S:620: Error: no such instruction: `rorx $34,a,T1'
why ?

What compiler?  It's possible you're using too old a version of gcc that doesn't understand the avx2 instructions.

32
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 18, 2014, 01:12:12 pm »
I wanted more ram options so I can experiment a bit with different settings. Say from 512 to 2048 MB per thread adjustable by 256MB… no idea this could take a lot of work maybe.

And you were right about the rejects, beta4 update increased the reject rate to over 30% in my case lol

Gotcha.  The thing to tune isn't RAM - as I said, the goal is actually to solve the problem with as *little* RAM per thread as possible, because it's faster if you can do that.  Google a bit about the TLB and L2 DTLB if you're curious to understand the background behind this.  Some of the big changes between the beta2 and beta5 releases were related to using less memory in order to make things faster.

The thing to play around with tuning is the number of threads.  If you're not also doing GPU mining, you want to use at least as many threads as your CPU has real cores.  But you might also want to use more - but the optimal number is tricky.  On a 4 core CPU, the right answer might be anywhere from 4-7.

33
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 18, 2014, 10:13:36 am »
Replying to a bunch of the previous comments:

- I'll disable the "aborting scan" message in the next set of changes.  I still want to put in a slightly better method for doing this, because I think it is increasing the reject rate a little bit.  @ptsrush, my guess is it's a little bit because of latency and a little bit because the miner works in bigger chunks.  I'll make it a priority to get that in the next build.

@allano, re source code to github: 
  (a)  Which opteron do you have?  I'm running the SSE one on AMD CPUs with SSE support.  It works just fine.
  (b)  I have a few reasons I'm not pushing the source yet:  First, the build is a mess, and I need to move it to using autoconf or some other method.  There are currently four different makefiles that the project inherited from ptsminer, and part of my recent optimization adds different code that runs on different CPUs.  The combination of this is currently unmanageable.

  Second, I'm trying to figure out what the right strategy is for this one to fund development.  It's pretty clear based on what happened with the GPU miners that if I release all of my tricks without supporting more pools and windows, a whole lot of clones will spring up with binary builds for other platforms with high dev fees that go to someone else.  I'm fine with that in the long term, but I want to be careful.  I've put .. um .. rather a lot of work into the current optimized CPU version.

  My current thought is that I'm going to wait until *I* can provide easy-to-use binaries for both linux and windows with a 1-2% dev fee that supports both beeeeer and ypool, and then release the source as well.  Of course, I'd also be happy to go with a sponsored code release again or explore other options, but the dev fee really does seem like a very equitable way.

@noobster: Agreed.  The 3% dev fee is temporary.  I'm going to cut it down to 2% pretty soon.
  - Do you mind the 60 second dev mine starting time?  I'd rather increase the user time from 2000 to 3000 seconds than reduce dev mine from 60 to 40 --- it's more efficient to run for longer.

- Amount of RAM per thread:  The algorithms I'm using in this are completely different from what previous miners did.  There's no way to use less RAM because of the way I store the data, and using more RAM will actually make it slower.

Did you want to use *less* RAM, or more because you want it go to faster?

  -Dave

34
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 18, 2014, 02:00:29 am »
Output when compiling by hand:

user@testbox:/home/proto/ptsminer/src# g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_avx2.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
g++: obj/sha512_avx2.o: No such file or directory

First run:

gcc -O3 intel/sha512_avx2.S -o obj/sha512_avx2.o

35
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 18, 2014, 01:59:54 am »
model name   : Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz

http://ark.intel.com/products/48768/Intel-Xeon-Processor-E5645-12M-Cache-2_40-ghz-5_86-gts-Intel-qpi

The SSE one.  LMK if it works - I haven't tested on that model.

  -Dave

36
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 18, 2014, 01:47:56 am »
user@testbox:/home/proto/ptsminer/src# make -f makefile.unix.no-chrono

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread
obj/sha512.o: In function `Init_SHA512_avx2':
sha512.c:(.text+0x27): undefined reference to `sha512_transform_rorx'
sha512.c:(.text+0x32): undefined reference to `sha512_transform_single_rorx'
collect2: ld returned 1 exit status
make: *** [ptsminer] Error 1

What do I need to do? Thanks in advance

Ah - I haven't updated the no-chrono makefile

You can run by hand:

gcc intel/sha512_avx2.S -O3 -o obj/sha512_avx2.o

g++ -Wl,-z,relro -Wl,-z,now  -o ptsminer  obj/cpuid.o obj/sha512_avx.o obj/sha512_avx2.o obj/sha512_sse4.o obj/sha512.o obj/sph_sha2.o obj/sph_sha2big.o obj/main_poolminer.o  -Wl,-Bdynamic -l boost_system -l boost_filesystem -l boost_program_options -l boost_thread -Wl,-Bdynamic -l z -l dl -l pthread

Or try grabbing one of the newer, faster binary builds.

I'll patch up that makefile pretty soon.  Thanks for letting me know - I wasn't sure if anyone wanted to use it.

  -Dave

37
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 18, 2014, 01:35:10 am »
Hi dga,

beta4 avxsse has 315 - 320 cpm in my e3-1230 v2 avx gentoo box.

the yam is about 305 - 310 cpm.

so you win about 3% ahead :)

Hah.  Thanks!  Not quite enough to justify that dev fee, though.  I'll see if I can make it a bit faster and earn my keep.

In the meantime, I'm just going to go ahead and admit that I have a problem.  I can't keep my toes out of optimizing the avx2 build, so I've put beta5 online.  This one is even more annoyingly architecture-specific, so ONLY haswell / avx2 people should even bother with it.  I just put the static build online because I'm still working out more kinks I introduced into the build process for optimization.  It's getting about 530 cpm on my stock i7-4770.

  -Dave

38
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 17, 2014, 10:57:43 pm »
Ok - there's now an advanced preview of beta4 for avx/sse in addition to the avx one. 

http://www.cs.cmu.edu/~dga/ptsminer/

Be sure to grab the right version (beta4) and architecture (avx2 or avxsse) for your machine.  If you're not using the latest Ubuntu, grab the -static version to have a better chance of it working.

Feedback welcome.  I don't have a good set of avxsse machines to compare on, so I don't know how this one compares against yam.  Where the avx2 version is quite a bit faster, this one is still probably just in the same ballpark.  3% dev fee, but I'll cut that down to 1% if it's not beating yam by enough to make it worth paying the dev fee.  *grin*

Making headway at getting the build working better, but it's still a ghastly piece of spaghetti and not fit for pushing to the repository.

  -Dave

39
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 17, 2014, 05:53:01 pm »
the static bin does not work for me as well i am using non avx cpu (1st gen. core i3)
I get
Code: [Select]
Illegal instruction

Right.  The pre-built one is *just* my advanced preview for avx2.  For other architectures, just grab the current version from the open source release and build.

  -Dave

40
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 17, 2014, 02:44:01 pm »
ok - thanks again for the feedback on this.  I've put beta4 online in the usual place:

http://www.cs.cmu.edu/~dga/ptsminer/

Along with a static build to address the gentoo library versioning issue.

I'm kind of proud of this one - it's the first of the Haswell builds that cracks 500 cpm on a non-overclocked CPU.  I haven't quite determined if 6 or 7 threads is better, but it's one of those two settings.

Delta from beta3:
  - Uses about 20MB less memory per thread
  - Further optimized sha512 computation code
  - Static build is now part of my default build chain, so we'll keep this one around.
  - Still 3% advanced-build dev fee, but I hope that the 170cpm you'll get more than any other miner should more than compensate for that.  :-)

41
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 17, 2014, 12:31:53 pm »
anyway I did notice one thing, why using so many hugepages if the miner @ 4 threads only uses 4 hugepages:

Code: [Select]
# cat /proc/meminfo |grep -i hugepages
AnonHugePages:         0 kB
HugePages_Total:     512
HugePages_Free:      508
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

perhaps
Code: [Select]
echo 4 > /proc/sys/vm/nr_hugepagescould be enough? more than that i would consider waste of memory, or may it use more hugepages over time?

ps. still getting this:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memoryeven after I recompiled my kernel with
Code: [Select]
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
thanks

Each hugepage is 2MB.  Each thread needs about 600MB.  I'll reduce that by another 50MB in beta4 later today, but for now, that's the math.  so you need 300 hugepages per thread.  With 6 threads, that's 1800 hugepages.

echo 2048 > /proc/sys/vm/nr_hugepages

for 6 threads, or something a little higher if you want to try more threads.

  -Dave

42
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 17, 2014, 11:56:58 am »
Code: [Select]
$ ptsminer-dga-beta3-avx2-linux64.bin PkzbnN7Nkv6TcqJuNjpcLfmPqpPUphpu5W 2 sse4
ptsminer-dga-beta3-avx2-linux64.bin: error while loading shared libraries: libboost_system.so.1.53.0: cannot open shared object file: No such file or directory

I have gentoo linux and using repository libs boost 1.52 the binary you provided is compiled against boost 1.53 thanks

Ahh.  Can you try:

ptsminer-dga-beta3-avx2-linux64-static.bin.gz

from that same directory and let me know if it works for you?  You'll have to gunzip it before running, obviously. :)

43
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 17, 2014, 02:17:07 am »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

Slight update:  There's now a beta3 that tries to reduce rejects a bit

http://www.cs.cmu.edu/~dga/ptsminer/

The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block.  This could lead to excessive numbers of rejects (and thus, a disconnection).  Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.

There are some small speed tuning-related changes, but probably not anything measurably different.  I'm still seeing in the 450-475 range on i7-4770.   The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.

I've figured out several of these changes that should help improve performance on non-avx2 systems.  Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too.  Perhaps one optimized for Amazon's machines?  *grin*

44
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 16, 2014, 07:51:21 pm »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Ok.  I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread.  Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses).  I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770.  The 4770k users should be cracking 500cpm.

  -Dave

45
BitShares PTS / Re: Open source optimized PTS CPU miner (BETA)
« on: January 16, 2014, 06:17:04 pm »
yea, i did that already but thanks anyway :D

and I'm getting this now:
Code: [Select]
Could not mmap hugepage, reverting to malloc: Cannot allocate memory


btw is there any way to reduce memory usage to say 512 or 768 MB per thread?

Not from the command line.  That's my next planned optimization.  I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.

Stay tuned.  I think I can get that into the binary by tonight.  For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.

  -Dave

Pages: 1 2 [3] 4 5 6 7 8 9