As long as the high-performance mode has a fallback option to low performance the instructions are not supported.
Are there any small tweaks that you can think of that would make it harder for a gpu?
Sent from my iPhone using Tapatalk
That's do-able. It's mostly just making sure that compilation isn't a mess. Basically, I'd modularize it as:
generate_sha512(buf, num_hashes, starting_nonce);
And as long as there was a version of generate_sha512 that worked reasonably well, it would be fine.
The tweaks: My first reaction is adding more branches/conditionals to cause warp divergence on a SIMD machine. That would slow down the CPUs, too, of course, but it would be really painful for the GPUs. I'm not sure exactly where I'd add them. Possibly a changed sha512 core with a slightly variable or tweaked number of rounds depending on something in the input that caused divergence on a per-nonce basis. It'd make it more evil, at least. If you could force all 16 or 32 units in the vector to have to diverge early on and remain diverged through the end of the sha512, you'd slow the GPUs by a factor of 16 or 32 while only slowing the CPU down by 2-4x.
Well you know this stuff better than most, so come up with something solid and we will include it.
Sounds good.
Here's what I propose. Perhaps surprisingly, it's taken way more work to do the fast CPU implementation of PTS than the basic GPU implementation. Instead of going by my consulting rates (grin), I'll admit that I did it for fun, too, and judge it be about 450 PTS worth of work based upon the previous rates you were offering, and about 50PTS more work to actually manage the integration into momentum.cpp, since it differs substantially from the codebase I've been developing on.
Instead of having it all in one chunk, though, I think it makes more sense to split it in half for two different deliverables to help reduce risk and get something in your hands faster:
(a) Algorithmic improvements to mining that are completely platform-independent. (250).
(b) Platform-optimized implementation for sse4, avx, and avx2, delivered as GNU assembly code along with original source code files to generate that assembly. (250)
Both documented, of course.
I think I can get (a) done reasonably straightforwardly. For (b), I'll need to spend more time understanding the Makefile setup for it so that I can integrate it without breaking things.
As a nitpicky note based upon the copyright issues that arose in my previous release, just to be up front: Like other high-performance miners, for everything but avx2, I use the Intel sha512 implementation. Its license is compatible (redistributions must include the copyright notice). The code I'd integrate into momentum.cpp is entirely my own at this point, and I'd simply integrate it under the existing license.