Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - dga

Pages: 1 2 3 4 5 [6] 7 8 9
76
Well, let's not let the AMD folks win, shall we?

Obviously, they can keep cloning my software, but let's let the folks using the open version have some fun for a while longer.

I've committed another optimization to the repository.  My GTX690 is now getting about 1850 c/m - not a huge boost, but a little.

The more important thing about this commit is that it removes the dependency upon yasm, which I know was causing some folks a headache trying to compile it on Windows.  Let's see if this makes it easier to use the open source version instead of a who-knows-what-you're-getting closed clone. :-)

  -Dave

And - many thanks to the people who're running the miner with dev donations to me enabled.  I know you don't need to, and I appreciate it.  That, and the few direct donations I've received, are what's going to keep these improvements flowing.

  -Dave

77

dga, do you have any idea why GTX680 card is not faster than 580? And there is any improvements of sm3.5 over sm3.0? In my test 3.5 card (GTX780) loses 20% cpm when using sm3.5 over sm3.0.
Thanks in advance if anyone could explain it for me.

Because the biggest bottleneck with my current design is memory latency and the number of memory references that can be in flight at a time.  This isn't something that improves nearly as fast with different generations of cards (and sometimes gets worse).  Though I'm surprised at your 780 slowing down with sm_35.  I'll have to try that.

Lesee.  K20c I get 1104 c/m with sm_35.  Results are nearly unchanged with sm_30.

The biggest crypto-related improvement in sm_35 is the funnel shifter, but the design of Momentum is less crypto-intensive than a lot of PoW functions because it generates 8 memory references for every one execution of SHA512.  This is in part why I don't expect there to be as big an nvidia-AMD gap (except that by being so much faster, the AMD crew can try to exploit space/time tradeoffs in different ways.).

I'm playing around with some very different design options for faster cards, but haven't decided on anything I really like yet.

Someone over 1gh pool did something similar like you, dga.
https://bitsharestalk.org/index.php?topic=1784.0

I just gave it a short test, and so far I can say that this GPU miner has some similar performance with your miner ( around 10% more depends on cards but it uses at least 2 threads of CPU with 100% load). Temperature of cards stay the same but more memory is required (1.2GB).

Since he doesn't public his code I can say for sure but it maybe use your code. And there some reports that it works with AMD cards too.

PTS mining is exploding in a few days.

Well, let's not let the AMD folks win, shall we?

Obviously, they can keep cloning my software, but let's let the folks using the open version have some fun for a while longer.

I've committed another optimization to the repository.  My GTX690 is now getting about 1850 c/m - not a huge boost, but a little.

The more important thing about this commit is that it removes the dependency upon yasm, which I know was causing some folks a headache trying to compile it on Windows.  Let's see if this makes it easier to use the open source version instead of a who-knows-what-you're-getting closed clone. :-)

  -Dave

78
Hey, @Archit, @crz, and anyone else who's been porting the software to other platforms:

Does it cause you increased pain if I include any of the Thrust header files?

You could try, e.g., uncommenting thrust/sort.h in gpuhash.cu and see if it breaks anything.  (I'll warn that I've actually had the most problems with it at link time when functions are used, though, not just from using the headers).

I'm trying to get rid of the need for yasm by moving a bit more functionality onto the GPU, and using Thrust will let me be lazy about it.  *grin*

  -Dave

79

dga, do you have any idea why GTX680 card is not faster than 580? And there is any improvements of sm3.5 over sm3.0? In my test 3.5 card (GTX780) loses 20% cpm when using sm3.5 over sm3.0.
Thanks in advance if anyone could explain it for me.

Because the biggest bottleneck with my current design is memory latency and the number of memory references that can be in flight at a time.  This isn't something that improves nearly as fast with different generations of cards (and sometimes gets worse).  Though I'm surprised at your 780 slowing down with sm_35.  I'll have to try that.

Lesee.  K20c I get 1104 c/m with sm_35.  Results are nearly unchanged with sm_30.

The biggest crypto-related improvement in sm_35 is the funnel shifter, but the design of Momentum is less crypto-intensive than a lot of PoW functions because it generates 8 memory references for every one execution of SHA512.  This is in part why I don't expect there to be as big an nvidia-AMD gap (except that by being so much faster, the AMD crew can try to exploit space/time tradeoffs in different ways.).

I'm playing around with some very different design options for faster cards, but haven't decided on anything I really like yet.

80
GTX 780 - 1800cpm

Jesus Christ!

* donschoe went off shopping!

grin.  Now you're seeing which platforms I'm able to develop for.  Gorgeous result.

I have a dual GTX690 rig that's currently doing about 3600 c/m.  I like it.

81
from 0 to ~560c/m @ AWS EC2 g2.2xlarge:

5.
Profit
(well maybe a bit too optimistic, but i was pretty excited learning all that)
Just curious, is that specific EC2 around $16 a day? And at current diff this would generate 0.2244 PTS?
(http://mrx.im/pts.php?cpm=560) I'm not knocking what you have provided in the way of guidance, I'm checking my math to realize ROI over upgrading some existing equipment (CPU's) or wait for ATI GPU miner as I have stacks of them left over from the early BTC days.

It's cheaper than that on the spot market, sometimes.

http://aws.amazon.com/ec2/spot-instances/

For example, right now, most markets do cost $16/day on spot (because of all the damn miners. grin.)

But Northern California is only $0.2/hour = $4.80 per day.

Which, alas, is below being profitable.

But, um, if you grab my most recent code you might find that the profitability threshold on ec2 just went up. I'm getting over 800 c/m on Amazon, but my instances only run when the spot market prices drop below about $0.11/hour -- which they don't seem very likely to do.

Note:  Running on spot instances is a little scary.  It's VERY easy to typo in the box where you say how much you'll pay per hour.  DOUBLE AND TRIPLE CHECK YOUR BIDS.  From watching the prices historically, it's very clear that some people missed the decimal point and ended up paying more on the spot market than if they just paid the $16/day for a normal g2.2xlarge instance.  (!!!)

You also have to watch the market prices of the things you're mining very carefully.

  -Dave

82
2014-01-01 update:

I've committed some new changes to the repository.  Some are cosmetic, but three are important:

  1)  Memory use on the host side is reduced by about 500MB.  This may or may not matter for you.

  2)  Speed is boosted by 10-20% on a lot of platforms.  I have another speed boost patch coming next week once I've made it not horrible, but this one gets a decent chunk of the gains.

  3)  There's now a developer fee that goes to me.  Kinda. 

I'm doing an experiment with this code release in the developer fee:  It's easy to disable.  It's not hidden.  But it's also just a list of addresses that share the dev fee equally.

So here's my proposal:  If you port this software to another platform or release a binary, don't remove my address.  Instead, add yours to the list -- I've tried to make it super easy for you to get your own share.  If this works out, I'll continue to release improvements and try to make it even easier for other developers who improve upon the code, because we'll all have a reason to make software that remains open source and which is user-friendly and high performance.

If you think this is horrible, let me know and let's try to find a way to make it work better.

If you're a user who hates the idea of a dev fee, the source is yours and you can delete the addresses listed there and/or add your own.

  -Dave

83
I have just the original version which compiles and runs too! In gpuhash.cu there was a problem with restrict though, you may fix that

Cool.  Yeah, the windows nvcc doesn't seem to like __restrict__ for some reason.  Odd.

84
dga the Windows version of your code compiled finally and it works

I saw - congrats!  You went by re-basing it on jhprotominer and incorporating the mining core, right?

Any changes to the mining core to make it easier next time?  I'm going to release some patches in a week that give a decent speedup, and I'd like to not cause people a huge build headache.

  -Dave

85
Thanks! 5000 MMC on its way to you. Keep up the good work.

Wow - thank you!  That's a very nice incentive to keep pounding on this thing. :)

86
dga, thanks for releasing this! I think it's going to be helpful for MemoryCoin users too - do you have an MMC address? I'd like to make a tip.

Hmm.  MMC looks interesting - I'll have to check it out.

Thanks for the tip offer.  Donation addresses:

MMC:  MTZGEoyforE2ikMWcu9QipHimZgX12Dgom
PTS: Pr8cnhz5eDsUegBZD4VZmGDARcKaozWbBc
BTC: 17sb5mcCnnt4xH3eEkVi6kHvhzQRjPRBtS

87
No offence, but why post it with a newbie account...

with 0 tax ...... so weired


Perhaps after bigger fish.  Running a binary file from an unknown source is probably the most dangerous thing you can do.  At the very least I'd only run this binary on a computer with a throw-away Windows install connected to a completely separate network from my other machines and then only if I felt this binary would return a substantial profit vs other avenues.

If you're getting something for free you'd otherwise need to pay for you're not the customer you're the product.

I gotta agree with this.

Screw reverse-engineering this. I am not going to try to patch it to mine to some other address--I had thought of establishing a donation address that way, to fund developing an open-source Windows compile (for example by mining five percent of the time to the donation address), but I'm not going to bother. And I am not going to run this executable.

If someone very expert with reverse-engineering/computer forensics wants to make sure this executable does nothing malicious, and/or patch it to mine to a donation address for the purpose I mention, I'd welcome that.

And as an alternative:  If someone wants to team up to make binary builds of my miner that split donation mining between us, I'd be delighted to collaborate.  I don't have the time to maintain the software at the level that makes it really easy to use, but I _do_ have the ability to devise some more speedups to keep it a fast and effective tool.

88
Any help with the compiler errors I listed?

Archit, how did you get it to compile in Visual Studio 12?

dga, how did you discover what code abc123 changed? And I'd really rather the latter open source however the hockeysticks he got that (apparently modded) compile . . .

abc123, if you're reading this, any help on how to get it to compile?

I didn't - I'm guessing based upon the performance numbers he posted and the memory requirements.  *grins*  For all I know, he did something completely different.  There are several optimizations possible right now.

  -Dave

89
dga
are you expecting that AMD cards will be able to mine this faster once someone ports the code like they are with other gpu mined coins?

I haven't made up my mind about that yet.  The PTS PoW is a little more interesting on GPU than the previous ones, and there are more design choices involved.  The actual work of doing the sha512 hashing *will* be faster on AMD, which gives more opportunities to exploit some of the time/space tradeoffs, but Nvidia cards have a nice memory architecture.

Prediction #1:  There's a lot of optimization left for GPU PTS mining.  My miner is not going to be the fastest one on the block -- I've already improved its speed by about 20% on my development branch, though it uses more memory.  I have a few more optimizations in mind.  I plan a second release in two weeks once I've had a chance to iron out the kinks.  I would be shocked if people didn't at least double the speed of my miner on all architectures in the longer run.

Prediction #2:  The AMD-Nvidia gap will be narrower for PTS than for other coins.  I don't know which way it will go, though.

Past that, my crystal ball is murky.  *grins*

  -Dave

90

System Requirement: Main Memory >=2GB, Video memory >=1.5GB. Less memory may work, this is to be tested. NVDIA cards that support at least SM 3.0 is required, view this page to find whether a NVDIA cards support it or not:
http://en.wikipedia.org/wiki/CUDA

I see you found which constants to tune to get a little more speed at the use of a little more memory. :-)

If anyone running Linux wants to try to match his performance numbers, change the constant:

#define NUM_COUNTBITS_POWER 31

in gpuhash.cu from 31 to 32.  You'll have to have about 1.2GB of memory on your GPU, but if you have it, you'll get a better c/m rate that should match what the above author posted.  I plan a future release that auto-selects this a bit more carefully.

n.b.  It's perfectly within the license of the code I released to re-brand and add your own donation to it, but I just want to be clear to the forum that these donations aren't going to me.  Obviously, however, it's also a pain to get these things working on Windows.

  -Dave

Pages: 1 2 3 4 5 [6] 7 8 9