---- UPDATE -----
I have changed the core gpu code a bit and have now managed to eek out about 10% more performance on sm35 devices (GTX titan, GTX 780, and GTX 780Ti, and some weird variant of the 640). On my gtx titan I now average 2200 (overclocked) or 2000 cpm (stock). My previous rates before the change were 2000 (overclocked) and 1800 (stock). Anyways I still haven't made any improvement for older devices as I don't own them so I wasn't as interested in non sm35 devices. I will send DGA a pull request to get the sm35 changes back ported to his code as well or maybe just submit a patch depending on what he prefers.
Anyways my changes currently have caused some increased register pressure that I haven't been able to get rid of so I am sure it could get even faster. Lastly I updated the donation to 2%. 1% for me and 1% for DGA. As always looking forward to hearing feedback if this release works better for those of you with top end cards, as I don't have a 780 gtx or 780 ti to test on.
Anyways the new sm35 binary can be downloaded below. Or as always you can pull the updated source from github and compile yourself. If you incorporate these changes into your own miners I would love a small donation
. Pc9oQoKptcwnQMoTj3RBvHzDVxx97fu6Kq
Updated Binary for SM35:
https://dl.dropboxusercontent.com/u/33838/cudaptswin-0.2-SM35.7z