Author Topic: BTS 0.8.1 is forking  (Read 14382 times)

0 Members and 1 Guest are viewing this topic.

Offline vikram

I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long :(
We only save the undo state for about an hour's worth of blocks (maintaining it is expensive).  For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain.  If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning.  And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints.  Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.

I haven't read the rest of the thread but the above is the primary problem, and is common whenever we hardfork because we get long minority chains from delegates who haven't updated.

There is also a second problem where blocks after the hardfork are not necessarily invalid for old clients. So they might upgrade after the hardfork and if they don't reindex, they will have an inconsistent state but cannot tell until they finally start rejecting blocks.

It seems like we can mitigate the most common issues by a combination of:
  • Force all blocks after a hardfork to be invalid for old clients
  • Always reindex for hardfork releases
  • Always release a version with a checkpoint right after the hardfork as soon as possible. And force a reindex on startup if any checkpoints dont match the current state

Also theoretical had an idea to stop old clients syncing before a hardfork. I am not convinced it is worth the effort, but here is part of our conversation:
Quote
[3/26/15, 4:49:46 PM] Vikram Rajkumar: there is a remaining problem due to our implementation—some delegates will not upgrade and so old clients will go down that minority chain—once they go too far and are past the undo limit, and upgrade to say 0.8 for example which has the new rules but no new checkpoints after the hardfork—they are still stuck. it will not resync or reindex
[3/26/15, 4:49:56 PM] Vikram Rajkumar: automatically. so not sure how to address that
[3/26/15, 4:54:09 PM] theoretical bts: i think we talked about this the other day...have delegates publish in public_data the block number of next hardfork as well as the version they are running
[3/26/15, 4:54:53 PM] theoretical bts: then if client notices majority of delegates claim there is a hardfork at block #X but block #X does not exist in local hardfork database, client warns user that they are out of date and stops syncing at block #X
[3/26/15, 4:57:30 PM] Vikram Rajkumar: yea i guess that’s better than what we currently do is which is just disconnect all old clients from the main chain

Offline Thom

Paging @xeroc or other devs, can you answer the questions I had above? I did notice a post by Vikram that he is checking into a forking issue, so this is beginning to sound like a bug...
Injustice anywhere is a threat to justice everywhere - MLK |  Verbaltech2 Witness Reports: https://bitsharestalk.org/index.php/topic,23902.0.html

Offline Thom

This is a very important thread to understand for ALL delegates.

I must admit I don't fully understand how to handle the "forking" discussed here. I'm quite technical but there are many delegates that are not. I suspect many marketing delegates like fuzzy & methodx will struggle to understand this without help from the stronger technically savvy delegates.

I understand what a transaction fork is. I was under the impression such forking is natural in bitcoin blockchains where you have zillions of minors asynchronously trying to find the next puzzle key, but in dpos it's a round robin process, not a competition. So how to do these forks occur on our dpos blockchain? Should they? Is it a bug or an attack?

I have been led to believe that aside from the typical unix admin issues one generally needs to be competent to handle if running a delegate and just paying attention to version releases, the client runs and automatically takes care of itself. I've always had reservations about those claims (even BM himself has made statements to that it's pretty easy to run a delegate) and this thread heightens my suspicions.

delegate.verbaltech has yet to be voted in, so I don't check it every day. I have to jump thru some hoops in an effort to minimize / obscure access to the delegate node. So I am quite interested in how issues such as this one can be detected and resolved.

I see info like this:
Code: [Select]
"blockchain_average_delegate_participation": "90.99 %"
But I don't have the cmd line commands memorized to know how to obtain this info, is it simply the info cmd or something else? It would be good to know.

Is this cmd sufficient to detect being on the correct fork? If so tools like wackou's bts_tools could monitor that and send a notification the node is on a minority fork.

Injustice anywhere is a threat to justice everywhere - MLK |  Verbaltech2 Witness Reports: https://bitsharestalk.org/index.php/topic,23902.0.html

Offline xeroc

  • Board Moderator
  • Hero Member
  • *****
  • Posts: 12922
  • ChainSquad GmbH
    • View Profile
    • ChainSquad GmbH
  • BitShares: xeroc
  • GitHub: xeroc
I see .. fork resolution seems to be way over my head .. good to see smarter people around!

Offline emf

  • Jr. Member
  • **
  • Posts: 21
    • View Profile
so .. Is there a reason a node on a minority (<50%) should publish blocks to other nodes?

I guess I'd say that just because you see <50% delegates, that doesn't necessarily mean it's not the best fork out there.  There could be a three-way split, or there could really be 60% of delegates offline (maybe a crash that only happens on one OS).

The other way of dealing with the problem that comes to mind is don't publish your blocks if you know there's a longer fork out there, but that is vulnerable to attack because the only way to verify that the longer fork is valid is to roll back to the fork point and try to switch to the fork (and we can't do that when the fork point was too far in the past).

Hopefully in the future clients could perform a re-index from a specific block height (like Bitcoin's re-org) on their own without the user's intervention.

It sounds nice, but my gut feeling (it's not really my area of the code) is that it would be a lot of work to reorganize the database to support that.

Offline arubi

  • Sr. Member
  • ****
  • Posts: 209
    • View Profile

We only save the undo state for about an hour's worth of blocks (maintaining it is expensive).  For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain.  If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning.  And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints.  Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.

I see. Thanks.
Hopefully in the future clients could perform a re-index from a specific block height (like Bitcoin's re-org) on their own without the user's intervention.

Offline xeroc

  • Board Moderator
  • Hero Member
  • *****
  • Posts: 12922
  • ChainSquad GmbH
    • View Profile
    • ChainSquad GmbH
  • BitShares: xeroc
  • GitHub: xeroc
I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long :(
We only save the undo state for about an hour's worth of blocks (maintaining it is expensive).  For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain.  If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning.  And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints.  Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.
so .. Is there a reason a node on a minority (<50%) should publish blocks to other nodes?

Offline emf

  • Jr. Member
  • **
  • Posts: 21
    • View Profile
I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long :(
We only save the undo state for about an hour's worth of blocks (maintaining it is expensive).  For short forks this works out well, but if you get on a fork with >360 blocks on it, you won't have enough undo history available to roll your blockchain back to the fork point, thus you'll never be able to rejoin the main chain even if your client knows it is better and has downloaded all the blocks on the main chain.  If you don't have the undo history to get there, the only other way to get back to the fork point is to reindex or re-download the blockchain from the beginning.  And when re-downloading, as long as there are publicly-accessible nodes out there serving up the wrong blockchain, there's a chance you'll connect to them first and download the wrong blockchain and end up back in the same state, unless you force your client to reject the wrong chain with checkpoints.  Fortunately, as more public clients switch over to the main chain, the chance anyone will end up syncing to a minority fork decreases.

Offline arubi

  • Sr. Member
  • ****
  • Posts: 209
    • View Profile
how do I know if I'm on the wrong chain?

That's what I get:
Code: [Select]
"blockchain_average_delegate_participation": "90.99 %"
My guess is that forked clients want to tell us about their alternative chain and our client drops the connection once it sees their chain is alternative to our own

That's correct.  It doesn't mean that it is switching to that fork, it just means it has connected to someone with the fork and it will start downloading and indexing those blocks until it hits the first one it considers invalid, then it disconnects.

I've made a checkpoints.json here with ~hourly checkpoints since the last hard fork.  None of my clients have yet wandered off onto a minority fork so I'm not sure where things are going astray.  Depending on when the minority fork(s?) appeared and how many blocks they contain, one checkpoint might not be enough to force your client to sync to the right fork.  I *think* the json file hash enough checkpoints that you could reindex instead of redownload, but I'd just as soon download from scratch.

I still don't understand why minority fork clients don't just discard whatever short chain they're on and join the main chain where we have >90% delegates participation. We can't rely on checkpoints for long :(

Offline kokojie

  • Sr. Member
  • ****
  • Posts: 286
    • View Profile
how do I know if I'm on the wrong chain?

Offline emf

  • Jr. Member
  • **
  • Posts: 21
    • View Profile
My guess is that forked clients want to tell us about their alternative chain and our client drops the connection once it sees their chain is alternative to our own

That's correct.  It doesn't mean that it is switching to that fork, it just means it has connected to someone with the fork and it will start downloading and indexing those blocks until it hits the first one it considers invalid, then it disconnects.

I've made a checkpoints.json here with ~hourly checkpoints since the last hard fork.  None of my clients have yet wandered off onto a minority fork so I'm not sure where things are going astray.  Depending on when the minority fork(s?) appeared and how many blocks they contain, one checkpoint might not be enough to force your client to sync to the right fork.  I *think* the json file hash enough checkpoints that you could reindex instead of redownload, but I'd just as soon download from scratch.

Offline arubi

  • Sr. Member
  • ****
  • Posts: 209
    • View Profile
Not quite sure what is going on with my delegate, got this in the output:

Code: [Select]
--- there are now 44 active connections to the p2p network
--- syncing with p2p network, 13991 blocks left to fetch
--- in sync with p2p network
--- syncing with p2p network, 121260 blocks left to fetch
--- in sync with p2p network

which suggests it's having trouble trying to figure out which fork to sync to, yet info returns this:

Code: [Select]
  "blockchain_average_delegate_participation": "83.47 %",
which seems to imply its on the main fork...

I'm seeing this too. My guess is that forked clients want to tell us about their alternative chain and our client drops the connection once it sees their chain is alternative to our own

Offline monsterer

Not quite sure what is going on with my delegate, got this in the output:

Code: [Select]
--- there are now 44 active connections to the p2p network
--- syncing with p2p network, 13991 blocks left to fetch
--- in sync with p2p network
--- syncing with p2p network, 121260 blocks left to fetch
--- in sync with p2p network

which suggests it's having trouble trying to figure out which fork to sync to, yet info returns this:

Code: [Select]
  "blockchain_average_delegate_participation": "83.47 %",
which seems to imply its on the main fork...
My opinions do not represent those of metaexchange unless explicitly stated.
https://metaexchange.info | Bitcoin<->Altcoin exchange | Instant | Safe | Low spreads

Offline svk

Just woke up to find that my delegate had lost all connections once again!
Same here .. almost ... 2 out of 3 machines had 0 connections ... luckly my backup delegate is still 'on-track' so I switched over to that one to produce blocks ..
When I restarted the other clients, they initiated a resync automatically?! never saw that happen before

Yea that's happened twice to me as well, the automatic clearing of the blockchain and resyncing. It's very annoying cause it takes most of a day to get synced back up..
Worker: dev.bitsharesblocks

Offline xeroc

  • Board Moderator
  • Hero Member
  • *****
  • Posts: 12922
  • ChainSquad GmbH
    • View Profile
    • ChainSquad GmbH
  • BitShares: xeroc
  • GitHub: xeroc
Just woke up to find that my delegate had lost all connections once again!
Same here .. almost ... 2 out of 3 machines had 0 connections ... luckly my backup delegate is still 'on-track' so I switched over to that one to produce blocks ..
When I restarted the other clients, they initiated a resync automatically?! never saw that happen before