yea, i did that already but thanks anyway 
and I'm getting this now:
Could not mmap hugepage, reverting to malloc: Cannot allocate memory
btw is there any way to reduce memory usage to say 512 or 768 MB per thread?
Not from the command line. That's my next planned optimization. I need to finish poking at some other constants to figure out how aggressive I want to be about pushing the memory.
Stay tuned. I think I can get that into the binary by tonight. For now, you can run on fewer threads -- you'll find that 4 is actually nearly as happy as 6, and 6 is typically happier for me than 8.
-Dave
Ok. I've replaced the binary at the old URL with a new build that uses about 600MB of RAM per thread. Thanks for the feature request - I'd been meaning to implement this optimization, and it looks from here like it's giving a very pleasant speedup just from using less memory (for those who care, this helps reduce TLB misses). I haven't run it long enough to get a stable number out of it, but it's looking like 460-475 cpm on an i7-4770. The 4770k users should be cracking 500cpm.
-Dave
Slight update: There's now a beta3 that tries to reduce rejects a bit
http://www.cs.cmu.edu/~dga/ptsminer/The miner works by processing an entire block of 2^26 hashes at once, and so if new work came in, it would still submit anything found in the previous block. This could lead to excessive numbers of rejects (and thus, a disconnection). Beta3 tries a little harder to avoid this - and the wasted work it entails - and also bumps up the number of rejects before reconnecting a bit for safety.
There are some small speed tuning-related changes, but probably not anything measurably different. I'm still seeing in the 450-475 range on i7-4770. The reconnect changes I just made + the beta2 dev mining changes should make it a lot easier for people to get longer-running performance measurements out of this code.
I've figured out several of these changes that should help improve performance on non-avx2 systems. Once I get to that phase, if there's interest in beta testing a linux avx build for sandybridge/ivybridge, I can do that too. Perhaps one optimized for Amazon's machines? *grin*