Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Ykw

Pages: 1 [2] 3 4 5 6
16
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 08, 2014, 05:41:50 pm »
As I mentioned many times, the "slowness" of R9 290(X) seems to be caused by the driver. Every kernel executes much faster than on a R9 280X, but the card (or the driver) is not able to overlap the execution of the kernels like it is the case with the 280X. On Windows the 3 kernels are completely processed sequentially, so the Linux version is faster because at least one kernel is able to overlap with another kernel. Perhaps a different set of kernels which only need less than 1GB memory will circumvent this behaviour, but I can't give any guarantees because the R9 290(X) has different ACEs than the 280X, too.

Has Mantle the ability to get higher performance? Can it be compiled for this? Or does it need to be "ported" to this SDK?

17
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 06, 2014, 05:15:57 pm »
@Ykw

I did the experiment for you :)
So, on screenshot you can see the vga1 that is connected on pcie 16x first cmd window, on second cmd window is vga2 on pcie 1x.
https://docs.google.com/file/d/0B3YMX5z3l5htQ21UbGgtZzZ3Vlk/edit?pli=1
There is, however, one catch ;)
The vga1 is sapphire 280x dual oc elpida memomry
Vga2 is Asus 280x  direct cu 2 Hynix memory ;)
I think that Asus is better vga ;)
Anyway, in principle, there is no difference from that of which slot the card is attached.

Very nice... thanks for the time...

There is only one chance to know for sure.. switch cards..

My 290X does not change much with memory... frequency.. but timming would be a different matter... so.. i can give some credit for what you say.. but still... not sure if the huge difference is only card... assuming you have all frequencies the same..

18
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 05, 2014, 10:17:32 pm »
Quote
I know I am being.. a bit annoying.. but if you get the chance.. can you run 2 instances.. separate.. and report the speeds..? =) although you have 280Xs... I would like to see if there is difference over the two PCIe you are using.. over that board.. when and if you get a chance of course.. and.. by the way.. thanks in advance.. =)

@Ykw
From some hours I am running on coin that I don't want to break the miner :)
However this experiment I've done in the past, no big difference on running separate process (one instances to each device).
I remember that I got a little worst collision summary than one instance on all devices (-t 0,0,1,1.....e.t.c.).

Quote
if it is on the pci bus, then we must increase but only from 100 to 110 Mhz and get 10% more performance


When.. you have time... and patience..
What I would like to see is which card gets the lower performance... =) the one that uses the PCIe from the CPU directly or the one from the ICH7 (in your case).

19
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 04, 2014, 12:44:58 am »
so now I have time :)

I have

CPU Typ   QuadCore AMD Phenom X4 Black Edition 9950, 2600 MHz (13 x 200)
CPU Takt   2611.9 MHz  (Original: 2600 MHz)
Motherboard Name   Asus M2N32-SLI Deluxe  (2 PCI, 1 PCI-E x1, 1 PCI-E x4, 2 PCI-E x16, 4 DDR2 DIMM, Audio, Dual Gigabit LAN, IEEE-1394)
Motherboard Chipsatz   nVIDIA nForce 590 SLI, AMD K10
Arbeitsspeicher   4096 MB  (DDR2-800 DDR2 SDRAM)
Speicherbus   535.8 MHz x2 1066 MHz ungagged mode
Command Rate (CR)   2T
HyperTransport Takt   1004.6 MHz
North Bridge Takt    2009.3 MHz
Grafikkarte   Asus R9290X-DC2OC-4GD5
GPU Codename   Hawaii XT  (PCI Express 3.0 x16 1002 / 67B0, Rev 00)
GPU Takt   1050 MHz  (original: 1050 MHz)
Speichertakt   1350 MHz  (original: 1350 MHz)


While among many things I might think that could impact mining for algos that do not depend on CPU memory. I am still thinking that my explanation to why some have ~5200 and we have about ~4700 with a 290X cards.... is PCIe tech...

PCIe 3.0 has better encoding than PCIe 2.. thus.. costs less cpu cycles.. and makes GPU data to offload better CPU command instructions. I can't prove this right now.. but as I already did some experiments.. over boards that had PCIe 3.. I remember my results.. and they where.. easy better.. for SHA it might not make much difference.. but for scrypt and alike algos.. it makes.

Remember.. it's not the throughput that matter in crypto.. its the latency and command efficiency per Hz or pipeline cycle.

I also have PCIe 2.0.. does anyone has PCIe 2.0 and 5200 col/s with a 290X to prove me wrong.. =) Thanks

Hello, may be you are right, however, I do not think so.
My 2x280x cards is working on mode x1 v1.1. with 30cm extenders ( to pcie x16, second to pcie x1) with old mainboard.
Making  ~9980 -:- 10060 c/m, i.e. ~4990-:-5030 c/m on each card.
I'm not saying that there is no logic in what you're saying, but in practice it doesn't matter, important operations are performed inside the video cards.
Rather may affect the rejected/discard percent of work.
However I don't have a chance to check it, don't have a new hardware ;)

Only the 290/290X get affected... 280X and bellow... do not.. that's my experience. But again.. I am not sure.. too.. Just .. saying.. Once I have my hands over 2 new boards.. this will get sorted..

what chipset you got over that board? I am also considering that PCIe could be splitting (switching) in my case... I have to check.. and therefore creating more "waitting time"...

As I said before, an old motherboard ECS G31, s775, DDR2 4g E6600 :)

I know I am being.. a bit annoying.. but if you get the chance.. can you run 2 instances.. separate.. and report the speeds..? =) although you have 280Xs... I would like to see if there is difference over the two PCIe you are using.. over that board.. when and if you get a chance of course.. and.. by the way.. thanks in advance.. =)

20
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 03, 2014, 11:38:57 pm »
so now I have time :)

I have

CPU Typ   QuadCore AMD Phenom X4 Black Edition 9950, 2600 MHz (13 x 200)
CPU Takt   2611.9 MHz  (Original: 2600 MHz)
Motherboard Name   Asus M2N32-SLI Deluxe  (2 PCI, 1 PCI-E x1, 1 PCI-E x4, 2 PCI-E x16, 4 DDR2 DIMM, Audio, Dual Gigabit LAN, IEEE-1394)
Motherboard Chipsatz   nVIDIA nForce 590 SLI, AMD K10
Arbeitsspeicher   4096 MB  (DDR2-800 DDR2 SDRAM)
Speicherbus   535.8 MHz x2 1066 MHz ungagged mode
Command Rate (CR)   2T
HyperTransport Takt   1004.6 MHz
North Bridge Takt    2009.3 MHz
Grafikkarte   Asus R9290X-DC2OC-4GD5
GPU Codename   Hawaii XT  (PCI Express 3.0 x16 1002 / 67B0, Rev 00)
GPU Takt   1050 MHz  (original: 1050 MHz)
Speichertakt   1350 MHz  (original: 1350 MHz)


While among many things I might think that could impact mining for algos that do not depend on CPU memory. I am still thinking that my explanation to why some have ~5200 and we have about ~4700 with a 290X cards.... is PCIe tech...

PCIe 3.0 has better encoding than PCIe 2.. thus.. costs less cpu cycles.. and makes GPU data to offload better CPU command instructions. I can't prove this right now.. but as I already did some experiments.. over boards that had PCIe 3.. I remember my results.. and they where.. easy better.. for SHA it might not make much difference.. but for scrypt and alike algos.. it makes.

Remember.. it's not the throughput that matter in crypto.. its the latency and command efficiency per Hz or pipeline cycle.

I also have PCIe 2.0.. does anyone has PCIe 2.0 and 5200 col/s with a 290X to prove me wrong.. =) Thanks

Hello, may be you are right, however, I do not think so.
My 2x280x cards is working on mode x1 v1.1. with 30cm extenders ( to pcie x16, second to pcie x1) with old mainboard.
Making  ~9980 -:- 10060 c/m, i.e. ~4990-:-5030 c/m on each card.
I'm not saying that there is no logic in what you're saying, but in practice it doesn't matter, important operations are performed inside the video cards.
Rather may affect the rejected/discard percent of work.
However I don't have a chance to check it, don't have a new hardware ;)

Only the 290/290X get affected... 280X and bellow... do not.. that's my experience. But again.. I am not sure.. too.. Just .. saying.. Once I have my hands over 2 new boards.. this will get sorted..

what chipset you got over that board? I am also considering that PCIe could be splitting (switching) in my case... I have to check.. and therefore creating more "waitting time"...

21
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 03, 2014, 03:27:04 pm »
I'm not seeing a very heavy cpu-load for clpts though, even though I'm not using pci-3 for every card. Oh I've tried the "-a 1" option again and that does seem to be a lot faster than -a 0, but it also takes longer to get up to speed and fluctuates more.

It looks like I'm now averaging very close to 5200 c/m, with my 280Xs, with some up and down swings.

yes.. I found the same about -a 1 and -a 0... but still 0 is faster for 290X

But regarding cpu-load... you will not see it.. that's not how it works.. just because you don't have cpu load.. does not mean.. an application is not being the most efficient.. possible.. it's all about other things... like code.. and what tech are you using about hardware... number nowdays.. just serve to sell hardware.. you need more.. to understand how you get to a result.. actually.. you need just one thing.. math...  =)

22
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 03, 2014, 11:32:19 am »
No, there should be no pci-e bandwidth limitations because only a few bytes and the program code is transferred over pci-e. The quoted performance was for a factory overclocked 280X, so subtract 10% to get a performance estimate for a reference card. I used Catalyst 13.12 because Catalyst 14.4 was slower. Could you post your command line options?

As NaN said, it is very unlikely pci-e bandwidth is the issue. Quite possible it may have something to do with gpu and memory clocks, the relation between those two seem to be very important. I'm averaging closer to 5100 than 5200 for each of my 280X cards, but have not messed with overclocking and such, mostly because I haven't found an easy way to do it on a headless linux system.

And I agree... not bandwith.... latency (EDIT: to be more precise.. it's not latency.. is efficiency in this case...)!.. PCIe 3 has less overhead than PCIe 2... that causes CPU to use more o less.. cpu cycles to command processes to do things over PCIe communication (in average).
The more GPUs you have for example.. the more PCIe overhead you will have. THis of course is very low... but my indication is that it causes some differences... My 290X already did >210% of what my 270X does (PCIe3.0)... now it does <200% (195% at the moment).. using the same miners and GPU clocks.. only CPU/board/ram was different. Shame I can't do testes right now.. with the new miner...

But.. a few more months.. and a new board.. and CPU will come =) let's be patient =)

23
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 03, 2014, 10:48:45 am »
so now I have time :)

I have

CPU Typ   QuadCore AMD Phenom X4 Black Edition 9950, 2600 MHz (13 x 200)
CPU Takt   2611.9 MHz  (Original: 2600 MHz)
Motherboard Name   Asus M2N32-SLI Deluxe  (2 PCI, 1 PCI-E x1, 1 PCI-E x4, 2 PCI-E x16, 4 DDR2 DIMM, Audio, Dual Gigabit LAN, IEEE-1394)
Motherboard Chipsatz   nVIDIA nForce 590 SLI, AMD K10
Arbeitsspeicher   4096 MB  (DDR2-800 DDR2 SDRAM)
Speicherbus   535.8 MHz x2 1066 MHz ungagged mode
Command Rate (CR)   2T
HyperTransport Takt   1004.6 MHz
North Bridge Takt    2009.3 MHz
Grafikkarte   Asus R9290X-DC2OC-4GD5
GPU Codename   Hawaii XT  (PCI Express 3.0 x16 1002 / 67B0, Rev 00)
GPU Takt   1050 MHz  (original: 1050 MHz)
Speichertakt   1350 MHz  (original: 1350 MHz)


While among many things I might think that could impact mining for algos that do not depend on CPU memory. I am still thinking that my explanation to why some have ~5200 and we have about ~4700 with a 290X cards.... is PCIe tech...

PCIe 3.0 has better encoding than PCIe 2.. thus.. costs less cpu cycles.. and makes GPU data to offload better CPU command instructions. I can't prove this right now.. but as I already did some experiments.. over boards that had PCIe 3.. I remember my results.. and they where.. easy better.. for SHA it might not make much difference.. but for scrypt and alike algos.. it makes.

Remember.. it's not the throughput that matter in crypto.. its the latency and command efficiency per Hz or pipeline cycle.

I also have PCIe 2.0.. does anyone has PCIe 2.0 and 5200 col/s with a 290X to prove me wrong.. =) Thanks

24
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 02, 2014, 10:28:13 am »
[MASTER] @ 12:23:14 | work received | sharetarget 0x0080000000000000000
000000000000000000000000000000000000
[WORKER1] share found: 20754872 <-> 18110106 #892 (433718) @ 1399026195
[WORKER1] share found: 39356499 <-> 24461962 #893 (433966) @ 1399026198
12:23:23 | 4636 c/m | 9.536 sh/m | VL: 884 (98.99%) | RJ: 9 (1.01%) | S
0%) | ART: 4.076e+008 ms (22.43/9.899/15.55)
[WORKER0] share found: 4842003 <-> 10828914 #894 (434396) @ 1399026204
12:23:33[WORKER | 1] share found: 29758071 <-> 12186142 #895 (435240) @
154641
 c/m | 9.54 sh/m | VL: 885 (98.99%) | RJ: 9 (1.01%) | ST: 0 (0.00%) | A
e+008 ms (22.43/9.899/15.55)
[WORKER1] share found: 4620972 <-> 45861533 #896 (435504) @ 1399026218
[MASTER] @ 12:23:42 | work received | sharetarget 0x0080000000000000000
000000000000000000000000000000000000
12:23:43 | 4636 c/m | 9.534 sh/m | VL: 887 (99.00%) | RJ: 9 (1.00%) | S
0%) | ART: 4.076e+008 ms (22.43/9.899/15.55)

I get the same

your board+CPU+RAM(and clocks)... plus gpu clocks...

It normally does not have to do with this.. you might have found the same problem I have... let's check.

25
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 02, 2014, 10:19:37 am »
I know how much sh / m do you get?

fixed..
I get around 9.406 sh/m (4768 col/m) over the 290X.. but I would say.. this is not the usual value.. you should get above 5200 col/m... I have something preventing.. my card to pump... (acknowledged).

EDIT: Forgot I am EU... I use , for decimal... :S LOL

26
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 02, 2014, 10:10:55 am »
Damn

AMD update 14.4 with nans v1.2 update, I get with a 280x Toxic 5200 c/m, bfeore 3600 c/m

You are a great Man.

which option of-a?

how many sh / m?

I test also the 14.4 driver to date. I found out, that is the option in my R290x-t 1.1-a 1 on my Win7x64 and a monitor on the device. are best

you have 4 optional codes to use with the miner.. using -a flag... you have 0, 1, 2 or 3...

Usually 0 is for 290/290X and 1 for others... but it depends a bit on the OS/Board/CPU... your setup in general..

27
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 02, 2014, 09:23:06 am »
NaN, do NVIDIA/AMD cards... profit from DMA access for multicard miner implementation?.. will have tons of memory for memory allocation. plus.. threads can be NUMA so... is it possible to design NRS/PTS algos with advantage on this?

SHA is more unlike i know.. but scrypt might have it's advantages...

EDIT: can you share with me a profiled data of your miner running over a 290/290X card? or do you want me to do run it?.. I would like to see if there is place for implementing something like this.. if you are interested.. give me a ping.

28
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: May 01, 2014, 07:01:09 pm »
Great work. Win is still slower than linux but i am getting ~4800 c/m now with my 290x and Catalyst 14.

yep...

After changing "-a 1" to "-a 0"...
290X (2 thread) - 4817 col/m (from 4679)...

29
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: April 30, 2014, 01:16:33 am »
I've installed Catalyst 14, downloaded the correct version of clpts for catalyst 14 and I'm now averaging over 5000 c/m.

So a huge improvement compared to using Catalyst13, almost doubling what I got with 1.1 before, thanks.

told you matey!!! =) nice to have you on board... lets go ON.. I am not a average coder.. but I can read code..

like a wizard! let me say.. but.. time.. is a TOO much precious.. thing to me.. and... unfortunately.. i dont't have all the time.. I would like to have it though...

learn.. is my advice.. never reject learning... because that's the BEST THING IN THE OUR LIFES!

Sorry.. too much celebrating... =) It's obvious..  I am too happy .. but because of other affairs..

Anyways... keep going... Just let me give you this feedback.. Crypto.. will BE.. HUGE!

30
BitShares PTS / Re: fast AMD OpenCL PTS/NRS miner released
« on: April 29, 2014, 11:05:15 pm »
No, there should be no pci-e bandwidth limitations because only a few bytes and the program code is transferred over pci-e. The quoted performance was for a factory overclocked 280X, so subtract 10% to get a performance estimate for a reference card. I used Catalyst 13.12 because Catalyst 14.4 was slower.

Yes.. i acknowledge that.. NaN.. but my overclock simply... slowed down my timings plus some errors/instability I might be having... due to high voltage.. CPU+GPU latency... to get better CPU+RAM throughput... and CPU overclock... so.. in my case.. and my board.. I know I have some limitations.. but.. I have done some tests over the other boards (Zxx) and the performance quite goes further up.. with same GPU cards...

Another big detail.. is that I am using a test system with a CPU+BOARD+RAM.. that is running for............ well.. since 2009... 24/7... only the CPU changed...  once... free upgrade... to xeon. So.. it might have some issues.. It's ok.. for a 30 day... loop (system restart + WC water refill + windows updates + other hardware experiences)..


I might have a new test board.. by August.. or September... for free.. so by then.. I will have updated performance metrics... tests... tuning.. etc... one thing is for sure.. I have never had.. a system that could last so much!.. nor I think I will have it again.. 5 years!! and counting!

keep up... catalyst 14 is only good for 290X.. or 290.. the others.. are still optimized for catalyst 13... but.. things... will change.. =) new specs... of 3xx series... will come by end of 2014 =) I found them promising.. since AMD.. understood  finally! that DP.. has to go UP UP UP..

EDIT:
My command lines (due my conditions.. this were the best 5 hours.. specs/attributes..):

290X
clpts_x86-64.exe -o 6 -u XXX:XXX -t 0,0 -a 1

270X
clpts_x86-64.exe -o 6 -u XXX:XXX -t 1 -a 1

Pages: 1 [2] 3 4 5 6