Author Topic: BTS 0.8.1 is forking (Read 24894 times)

vikram

Quote from: emf on March 26, 2015, 03:16:43 pm

Quote from: arubi on March 26, 2015, 02:10:19 pm
I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long
We only save the undo state for about an hour's worth of blocks (maintaining it is expensive). For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain. If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning. And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints. Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.

I haven't read the rest of the thread but the above is the primary problem, and is common whenever we hardfork because we get long minority chains from delegates who haven't updated.

There is also a second problem where blocks after the hardfork are not necessarily invalid for old clients. So they might upgrade after the hardfork and if they don't reindex, they will have an inconsistent state but cannot tell until they finally start rejecting blocks.

It seems like we can mitigate the most common issues by a combination of:

Force all blocks after a hardfork to be invalid for old clients
Always reindex for hardfork releases
Always release a version with a checkpoint right after the hardfork as soon as possible. And force a reindex on startup if any checkpoints dont match the current state

Also theoretical had an idea to stop old clients syncing before a hardfork. I am not convinced it is worth the effort, but here is part of our conversation:

Quote

[3/26/15, 4:49:46 PM] Vikram Rajkumar: there is a remaining problem due to our implementation—some delegates will not upgrade and so old clients will go down that minority chain—once they go too far and are past the undo limit, and upgrade to say 0.8 for example which has the new rules but no new checkpoints after the hardfork—they are still stuck. it will not resync or reindex
[3/26/15, 4:49:56 PM] Vikram Rajkumar: automatically. so not sure how to address that
[3/26/15, 4:54:09 PM] theoretical bts: i think we talked about this the other day...have delegates publish in public_data the block number of next hardfork as well as the version they are running
[3/26/15, 4:54:53 PM] theoretical bts: then if client notices majority of delegates claim there is a hardfork at block #X but block #X does not exist in local hardfork database, client warns user that they are out of date and stops syncing at block #X
[3/26/15, 4:57:30 PM] Vikram Rajkumar: yea i guess that’s better than what we currently do is which is just disconnect all old clients from the main chain

Thom

Paging @xeroc or other devs, can you answer the questions I had above? I did notice a post by Vikram that he is checking into a forking issue, so this is beginning to sound like a bug...

Thom

This is a very important thread to understand for ALL delegates.

I must admit I don't fully understand how to handle the "forking" discussed here. I'm quite technical but there are many delegates that are not. I suspect many marketing delegates like fuzzy & methodx will struggle to understand this without help from the stronger technically savvy delegates.

I understand what a transaction fork is. I was under the impression such forking is natural in bitcoin blockchains where you have zillions of minors asynchronously trying to find the next puzzle key, but in dpos it's a round robin process, not a competition. So how to do these forks occur on our dpos blockchain? Should they? Is it a bug or an attack?

I have been led to believe that aside from the typical unix admin issues one generally needs to be competent to handle if running a delegate and just paying attention to version releases, the client runs and automatically takes care of itself. I've always had reservations about those claims (even BM himself has made statements to that it's pretty easy to run a delegate) and this thread heightens my suspicions.

delegate.verbaltech has yet to be voted in, so I don't check it every day. I have to jump thru some hoops in an effort to minimize / obscure access to the delegate node. So I am quite interested in how issues such as this one can be detected and resolved.

I see info like this:

Code: [Select]

"blockchain_average_delegate_participation": "90.99 %"
But I don't have the cmd line commands memorized to know how to obtain this info, is it simply the info cmd or something else? It would be good to know.

Is this cmd sufficient to detect being on the correct fork? If so tools like wackou's bts_tools could monitor that and send a notification the node is on a minority fork.

xeroc

I see .. fork resolution seems to be way over my head .. good to see smarter people around!

emf

Quote from: xeroc ¯\_(ツ)_/¯ on March 26, 2015, 03:35:50 pm

so .. Is there a reason a node on a minority (<50%) should publish blocks to other nodes?

I guess I'd say that just because you see <50% delegates, that doesn't necessarily mean it's not the best fork out there. There could be a three-way split, or there could really be 60% of delegates offline (maybe a crash that only happens on one OS).

The other way of dealing with the problem that comes to mind is don't publish your blocks if you know there's a longer fork out there, but that is vulnerable to attack because the only way to verify that the longer fork is valid is to roll back to the fork point and try to switch to the fork (and we can't do that when the fork point was too far in the past).

Quote from: arubi on March 26, 2015, 03:38:24 pm

Hopefully in the future clients could perform a re-index from a specific block height (like Bitcoin's re-org) on their own without the user's intervention.

It sounds nice, but my gut feeling (it's not really my area of the code) is that it would be a lot of work to reorganize the database to support that.

arubi

Quote from: emf on March 26, 2015, 03:16:43 pm

We only save the undo state for about an hour's worth of blocks (maintaining it is expensive). For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain. If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning. And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints. Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.

I see. Thanks.
Hopefully in the future clients could perform a re-index from a specific block height (like Bitcoin's re-org) on their own without the user's intervention.

xeroc

Quote from: emf on March 26, 2015, 03:16:43 pm

Quote from: arubi on March 26, 2015, 02:10:19 pm
I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long
We only save the undo state for about an hour's worth of blocks (maintaining it is expensive). For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain. If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning. And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints. Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.

so .. Is there a reason a node on a minority (<50%) should publish blocks to other nodes?

emf

Quote from: arubi on March 26, 2015, 02:10:19 pm

I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long

We only save the undo state for about an hour's worth of blocks (maintaining it is expensive). For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain. If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning. And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints. Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.

arubi

Quote from: kokojie on March 26, 2015, 02:07:06 pm

how do I know if I'm on the wrong chain?

That's what I get:

Code: [Select]

"blockchain_average_delegate_participation": "90.99 %"

Quote from: emf on March 26, 2015, 01:58:20 pm

Quote from: arubi on March 26, 2015, 12:22:43 pm
My guess is that forked clients want to tell us about their alternative chain and our client drops the connection once it sees their chain is alternative to our own

That's correct. It doesn't mean that it is switching to that fork, it just means it has connected to someone with the fork and it will start downloading and indexing those blocks until it hits the first one it considers invalid, then it disconnects.

I've made a checkpoints.json here with ~hourly checkpoints since the last hard fork. None of my clients have yet wandered off onto a minority fork so I'm not sure where things are going astray. Depending on when the minority fork(s?) appeared and how many blocks they contain, one checkpoint might not be enough to force your client to sync to the right fork. I *think* the json file hash enough checkpoints that you could reindex instead of redownload, but I'd just as soon download from scratch.

I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long

kokojie

how do I know if I'm on the wrong chain?

emf

Quote from: arubi on March 26, 2015, 12:22:43 pm

My guess is that forked clients want to tell us about their alternative chain and our client drops the connection once it sees their chain is alternative to our own

That's correct. It doesn't mean that it is switching to that fork, it just means it has connected to someone with the fork and it will start downloading and indexing those blocks until it hits the first one it considers invalid, then it disconnects.

I've made a checkpoints.json here with ~hourly checkpoints since the last hard fork. None of my clients have yet wandered off onto a minority fork so I'm not sure where things are going astray. Depending on when the minority fork(s?) appeared and how many blocks they contain, one checkpoint might not be enough to force your client to sync to the right fork. I *think* the json file hash enough checkpoints that you could reindex instead of redownload, but I'd just as soon download from scratch.

arubi

Quote from: monsterer on March 26, 2015, 11:03:22 am

Not quite sure what is going on with my delegate, got this in the output:

Code: [Select]
--- there are now 44 active connections to the p2p network --- syncing with p2p network, 13991 blocks left to fetch --- in sync with p2p network --- syncing with p2p network, 121260 blocks left to fetch --- in sync with p2p network
which suggests it's having trouble trying to figure out which fork to sync to, yet info returns this:

Code: [Select]
"blockchain_average_delegate_participation": "83.47 %",
which seems to imply its on the main fork...

I'm seeing this too. My guess is that forked clients want to tell us about their alternative chain and our client drops the connection once it sees their chain is alternative to our own

monsterer

Not quite sure what is going on with my delegate, got this in the output:

Code: [Select]

--- there are now 44 active connections to the p2p network
--- syncing with p2p network, 13991 blocks left to fetch
--- in sync with p2p network
--- syncing with p2p network, 121260 blocks left to fetch
--- in sync with p2p network

which suggests it's having trouble trying to figure out which fork to sync to, yet info returns this:

Code: [Select]

"blockchain_average_delegate_participation": "83.47 %",
which seems to imply its on the main fork...

svk

Quote from: xeroc ¯\_(ツ)_/¯ on March 26, 2015, 07:21:40 am

Quote from: svk on March 26, 2015, 07:08:44 am
Just woke up to find that my delegate had lost all connections once again!
Same here .. almost ... 2 out of 3 machines had 0 connections ... luckly my backup delegate is still 'on-track' so I switched over to that one to produce blocks ..
When I restarted the other clients, they initiated a resync automatically?! never saw that happen before

Yea that's happened twice to me as well, the automatic clearing of the blockchain and resyncing. It's very annoying cause it takes most of a day to get synced back up..

xeroc

Quote from: svk on March 26, 2015, 07:08:44 am

Just woke up to find that my delegate had lost all connections once again!

Same here .. almost ... 2 out of 3 machines had 0 connections ... luckly my backup delegate is still 'on-track' so I switched over to that one to produce blocks ..
When I restarted the other clients, they initiated a resync automatically?! never saw that happen before

Author Topic: BTS 0.8.1 is forking (Read 24894 times)

vikram

Re: BTS 0.8.1 is forking

Thom

Re: BTS 0.8.1 is forking

Thom

Re: BTS 0.8.1 is forking

xeroc

Re: BTS 0.8.1 is forking

emf

Re: BTS 0.8.1 is forking

arubi

Re: BTS 0.8.1 is forking

xeroc

Re: BTS 0.8.1 is forking

emf

Re: BTS 0.8.1 is forking

arubi

Re: BTS 0.8.1 is forking

kokojie

Re: BTS 0.8.1 is forking

emf

Re: BTS 0.8.1 is forking

arubi

Re: BTS 0.8.1 is forking

monsterer

Re: BTS 0.8.1 is forking

svk

Re: BTS 0.8.1 is forking

xeroc

Re: BTS 0.8.1 is forking