Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - slothlike

Pages: [1]
1
BitShares PTS / Re: GPU miners comparsion
« on: January 22, 2014, 10:39:50 am »
2200 cpm on my gtx titan with my SM35 varient of DGAS code with a improved sha512 core.



https://dl.dropboxusercontent.com/u/33838/cudaptswin-0.2-SM35.7z

2
---- UPDATE -----
I have changed the core gpu code a bit and have now managed to eek out about 10% more performance on sm35 devices (GTX titan, GTX 780, and GTX 780Ti, and some weird variant of the 640).  On my gtx titan I now average 2200 (overclocked) or 2000 cpm (stock).  My previous rates before the change were 2000 (overclocked) and 1800 (stock).    Anyways I still haven't made any improvement for older devices as I don't own them so I wasn't as interested in non sm35 devices.  I will send DGA a pull request to get the sm35 changes back ported to his code as well or maybe just submit a patch depending on what he prefers.   
Anyways my changes currently have caused some increased register pressure that I haven't been able to get rid of so I am sure it could get even faster.  Lastly I updated the donation to 2%.  1% for me and 1% for DGA.  As always looking forward to hearing feedback if this release works better for those of you with top end cards, as I don't have a 780 gtx or 780 ti to test on. 
Anyways the new sm35 binary can be downloaded below.  Or as always you can pull the updated  source from github and compile yourself.  If you incorporate these changes into your own miners I would love a small donation ;).  Pc9oQoKptcwnQMoTj3RBvHzDVxx97fu6Kq

Updated Binary for SM35:
https://dl.dropboxusercontent.com/u/33838/cudaptswin-0.2-SM35.7z

3
Only curious--did my posts about iruu's suggestion and also any of my code help out?

(I still vote you get mining donation prizes for getting it done.)

I should mention... I tried out your code yesterday and it worked for me. Initially it wasn't printing anything to the screen after the startup messages, however, I merged the latest  changes dga made with your source and it worked fine. Was going to post but then saw this so didn't bother. This version seems to be perform the same, 1600-1700 c/m on a 780.

None of these versions should have any changes in speed, all of them are based on dga's code.

Actually I find what matters most with speed is weather you pick the right architecture for your card.  Archit when I run your code I get 1500 cpm on my titans.  When I run this code complied with sm35 or sm20 I get 1500 cpm, but when I run this code compiled with sm30 I get 2000 cpm per titan.  I haven't had the time to decompile and see what the cuda compiler is doing.  By all rights sm 35 should be the faster as the sha 512 can take advantage of the new funnle shift operators but its not. 

4
Only curious--did my posts about iruu's suggestion and also any of my code help out?

(I still vote you get mining donation prizes for getting it done.)

Nope I started playing with make some example cuda projects from scratch and realized I was just being retarded with how I was setting up stuff in the project file so I started over from scratch with a fresh cuda project and imported dga's source.  The other gotcha is making sure you don't have the sha helper file set to compile and you set it to include only.  DGA also tried to make a few changes to make it easier to compile on windows.  Sadly visual studio 2012 isn't c99 compatible so you have to compile it as a c++ project and move all the restricts as well.

5
Hello.

I tried the compiled version. I get the error:

---------------------------
CUDAPTSWIN.exe - System Error
---------------------------
The program can't start because MSVCP110.dll is missing from your computer. Try reinstalling the program to fix this problem.

Windows 8 x64 | Phenom II X6 | Asus Geforce 660 GTX

Any ideea what should I do? Thanks.


Ah my apologies that means you need the microsoft visual c++ redistribute.  It can be found here.  Installing it should fix your problems. 
http://www.microsoft.com/en-au/download/details.aspx?id=30679

6
What sm are you suing? And you ares sure that a higher cm gives better performance?
sm30 it gives me 300 extra cpm not sure why.  sm35 drops it as well so it may be something specific to the gtx titan.

7
please add cuda 2.1 version, 560ti
SM 2.0 version should work on your 560 ti but slower than the sm 30 version if you can run it at least in my testing:   https://dl.dropboxusercontent.com/u/33838/CUDAAPTSWINSM20.7z

8
I won't add ypool.  Is there another pool you want.  Ypool has way too much of the market already.  Plus their fees are outrageous.

9
I pulled in his very latest changes from the 14th

"EDIT: Doesn't work for me on a GTX 560 ti"

The 560 ti is compute 2.0 not 3.0.  Only 660s and higher support Cuda 3.0.  If you want to give this build a go you can change the cuda architecture to 2.0 in the project properties under cuda.  Using Cuda 3.0 is a huge performance boost for those of us who have it.  I will try to make another build config in the project to support lower cuda versions for older cards.

10
---- UPDATE -----
I have changed the core gpu code a bit and have now managed to eek out about 10% more performance on sm35 devices (GTX titan, GTX 780, and GTX 780Ti, and some weird variant of the 640).  On my gtx titan I now average 2200 (overclocked) or 2000 cpm (stock).  My previous rates before the change were 2000 (overclocked) and 1800 (stock).    Anyways I still haven't made any improvement for older devices as I don't own them so I wasn't as interested in non sm35 devices.  I will send DGA a pull request to get the sm35 changes back ported to his code as well or maybe just submit a patch depending on what he prefers.   
Anyways my changes currently have caused some increased register pressure that I haven't been able to get rid of so I am sure it could get even faster.  Lastly I updated the donation to 2%.  1% for me and 1% for DGA.  As always looking forward to hearing feedback if this release works better for those of you with top end cards, as I don't have a 780 gtx or 780 ti to test on. 
Anyways the new sm35 binary can be downloaded below.  Or as always you can pull the updated  source from github and compile yourself.  If you incorporate these changes into your own miners I would love a small donation ;).  Pc9oQoKptcwnQMoTj3RBvHzDVxx97fu6Kq

Updated Binary for SM35:
https://dl.dropboxusercontent.com/u/33838/cudaptswin-0.2-SM35.7z



As a warning while it is stable right now and fast it does need some work to get threading to work.  If you have multiple devices you must specify a device number or it will crash.   Also please note that this miner appears to be the fastest windows one I have used thanks to DGA.  I get 1950 cpm on my titan with it.  I only get about 1500 cpm with abc's miner which was teh previous best miner I had seen for windows.  I have posted a rough draft of the windows README below.  This miner only works for the ptsweb.beeeeer.org pool (YPOOL controls way way too much of the network anyways, 51%+ is bad for PTS).  You do not need to register just use your address and they will send you pts every time you reach .1 pts. 
--Adam
 
PURPOSE:
The purpose of CUDAPTSWIN is that I got sick of seeing all these ports of DGA's code with no one releasing their source code nor how they got it to compile under windows.  So I decided to do it myself  (Note I also added my own address to the donation list feel free to take it out if you want when you build so this miner has a 1% fee with 2/3 going to DGA and 1/3 to me).  The most current windows code can always be found at:
https://github.com/acarasso/cudapts.git
I have changed as little code as possible as my intention is to continue to pull in DGA's improvments from the linux miner as he makes them and I will do my best to keep it up to date with his latest changes and any other changes people offer up that might improve performance.
BINARIES
The latest binary distribution can be found at:
https://dl.dropboxusercontent.com/u/33838/CUDAPTSWINx64sm30.zip
This binary is for x64 systems and requires compute 3.0 capability (gtx 650 and up I believe).  I currently get 1950 to 2000 cpm on my gtx titan with this release. 
Also you might need to download and install http://www.microsoft.com/en-au/download/details.aspx?id=30679.  It comes wtih most windows games though so if you have skyrim, or photoshop or a host of other programs installed already you probably won't need it.
IF YOU HAVE MORE THAN ONE CUDA DEVICE YOU MUST SPECIFY A DEVICE NUMBER like the example below (I broke threading when I ported it to windows and you will get an error if you don't, hopefully will fix that soon):
CUDAPTSWIN Pc9oQoKptcwnQMoTj3RBvHzDVxx97fu6Kq 0


REQUIREMENTS TO COMPILE. 
In order to compile this code yourself.  You will need the cuda sdk for windows 5.5.  You can get this from nvidia: https://developer.nvidia.com/cuda-toolkit
You will also need boost libraries and headers for windows:
http://www.boost.org/doc/libs/1_55_0/more/getting_started/windows.html
I REALLY REALLY suggest downloading the binary release rather than compiling yourself as a self compile of boost takes days.  The binaries can be found here:
http://sourceforge.net/projects/boost/files/boost-binaries/1.55.0/
I put my version of boost in c:/local/boost_1_55_0/ and my project file expects it to be there.  If you move it you need to change 3 settings on the project.  Under Visual C++ Directories in the project config change add your boost header location to the header path.  Add your boost library location to the library path.  And Add your boost library path to the linker library path.
If you have any questions feel free to ask me and I will try to answer them.

11
Can you please add in a quick command line option to select the cuda device this runs on, ie -d 0 or -d1.  Right now it only runs on device 0.

12
archit was it problems with yasm?  I have been fing around with it all night.


13


Slothlike - I did it but it's a lot slower than abc123
Archit did you compile for compute 3.5 and did you change the constant that dave recommended (Copied his post below)?



"I see you found which constants to tune to get a little more speed at the use of a little more memory. :-)

If anyone running Linux wants to try to match his performance numbers, change the constant:

#define NUM_COUNTBITS_POWER 31

in gpuhash.cu from 31 to 32.  You'll have to have about 1.2GB of memory on your GPU, but if you have it, you'll get a better c/m rate that should match what the above author posted.  I plan a future release that auto-selects this a bit more carefully.

n.b.  It's perfectly within the license of the code I released to re-brand and add your own donation to it, but I just want to be clear to the forum that these donations aren't going to me.  Obviously, however, it's also a pain to get these things working on Windows."



14
Guys I am pretty sure we are going about building a windows release all wrong.  Cuda is basicly impossible to get to work under cygwin.  The nvida site is riddled with support request about this with nvidias official response being we aren't ever going to support it.  Instead I suggest we take Daves gpuhash kernel code which is relatively self contained and easy to get to compile under visual studio and then merge it in with the source for https://github.com/jh000/jhProtominer here.  I have gotten the kernel to compile no problem.  If anyone who is more familiar with the protoshares algorithm is willing to point me in the right direction of what function call I need to replace with a call to daves kernel I will happily give it a shot and put up a github link with the code. 
--Adam

Pretty sure this is what abc123 did btw .  As its the only way I can think to get past the yasm roadblock with Dave's source.

Pages: [1]