Author Topic: Can we get formal documentation on the blockchain consensus algorithm?  (Read 3035 times)

0 Members and 1 Guest are viewing this topic.

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
I'm looking for good documentation about the consensus protocol of DPOS (either BitShares 0.x or 2.0 is fine, but I would prefer the 2.0 one) and how nodes decide upon the particular blockchain to use. Specifically, I want to know everything about blockchain reorganization. When does a client, given some existing validated blockchain stored locally, accept a reorganization and when does it not? I want to understand all the possible failure modes we are aware of and how they can be achieved. How does the timing of which blocks are received in which order and by when (relative to the syncing process) change the outcome of which blockchain a client ends up choosing? Under what conditions can a full node get stuck and be unable to sync forward? Are all such cases bugs or are there cases in which this is the legitimate behavior? If is is the latter (which I believe to be the case), is the only way to fix the problem to add a trusted checkpoint to the client? If the problem can be fixed by resyncing the blockchain from scratch (which I have noticed had been recommended in the past as a solution to these stuck blockchain problems that had existed before) is that not an indicator of a bug in that particular case? In other words, if it wasn't a bug and got legitimately stuck, shouldn't it get stuck in the same place again (otherwise how is such non-deterministic behavior not a bug)?

Offline sittingduck

  • Sr. Member
  • ****
  • Posts: 246
    • View Profile
Good questions. 


Sent from my iPhone using Tapatalk

Offline xeroc

  • Board Moderator
  • Hero Member
  • *****
  • Posts: 12896
  • ChainSquad GmbH
    • View Profile
    • ChainSquad GmbH
  • BitShares: xeroc
  • GitHub: xeroc
We need this to be published in a DPOS/TaPOS whitepaper .. can devs spare some time and put the tech together? I can help do the texting/formating around but I need a formal description first ..
Give BitShares a try! Use the http://testnet.bitshares.eu provided by http://bitshares.eu powered by ChainSquad GmbH

Offline bytemaster

This is on my mind a lot.   It is hard to pick and choose which things should take priority.   Right now I think having a working product released is highest priority. 

I can answer some questions here:

1.  The blockchain always picks the longest chain signed by valid witnesses that doesn't have timestamps in the future.
2.  The blockchain maintains an "UNDO" history for the last 1000 blocks which means that if there is a fork it must be resolved within 1000 blocks or manual intervention will be required via a checkpoint.   With 1 second blocks this is about 16 minutes *IF* one block was produced every second, but if that were the case there couldn't be a fork.   So if w assume "worst case" a 50/50 split in the network, then it would take 32 minutes of un-interrupted hard-forking before manual intervention would be required.
3.  The 1000 block number can be increased to as large as desired and in theory could be all the way back to the last "checkpoint" or even genesis.   If it went all the way back to genesis then you would never get "stuck" on a fork, but this is overkill.     In theory we could make it dynamic and only hold history while witness participation is below 90% and then clear it as soon as it get above 90%.   
4. TaPOS is there to protect users from having their transactions included on alternative forks which may have different meanings (different account IDs, etc).  TaPOS could in theory be used as a check on the witnesses and be used to determine the "best chain", but it is much more costly to do and ultimately unnecessary.   No metrics have been defined for using TaPOS as part of the "best chain" consensus algorithm.

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
1.  The blockchain always picks the longest chain signed by valid witnesses that doesn't have timestamps in the future.

When does the client choose a blockchain? Is it always evaluating all possible valid chains? Or rather is it always evaluating all possible valid chains other than ones that would cause it to reorganize from its current chain in a way such that the fork point of the change is older than 1000 blocks? I guess I need to explain that better (see the scenario I describe below).

2.  The blockchain maintains an "UNDO" history for the last 1000 blocks which means that if there is a fork it must be resolved within 1000 blocks or manual intervention will be required via a checkpoint.   With 1 second blocks this is about 16 minutes *IF* one block was produced every second, but if that were the case there couldn't be a fork.   So if w assume "worst case" a 50/50 split in the network, then it would take 32 minutes of un-interrupted hard-forking before manual intervention would be required.

Let's say Alice has the following blockchain stored locally: B1, B2, ..., B1500.
The real consensus blockchain at time T is: B1, B2, ..., B3168.
There is a fake blockchain floating around in the network which first appeared at time T and looks like this: B1, B2, ..., B2000, B2001', B2002', ..., B3200'.
The common fork point between this fake blockchain and the real consensus chain is B2000. After this point this blockchain diverges form the real blockchain. The witnesses (which I will call "fake witnesses" from this point forward) at the time of B2000 continue to be active in the fake blockchain whereas they were all voted out in the real chain and replaced by new witnesses (which I will call "real witnesses" from this point forward) at block B2100 (start of new maintenance interval). Let's assume all the significant votes that caused the "fake witnesses" to be replaced by the "real witnesses" appeared between blocks B2001 and B2099.
This fake blockchain is a valid chain signed by witnesses according to the rules of the protocol following that chain. At time T it is longer than the real blockchain by about 32 blocks because the "fake witnesses" were able to have 100% witness participation while the "real witnesses" were only able to maintain 99% witness participation over the last 1000 blocks. Therefore, by the longest chain rule, people should prefer the fake chain over the real chain, but because the fork point is more than 1000 blocks old at the time T when the fake blockchain appeared on the network, none of the nodes online at that time would be tricked onto that fake chain (am I correct in this understanding?).

Now right around this time (a little bit after time T), Alice connects with the network and starts syncing. She can see there are two blockchains that build off the chain she currently has stored locally and both seem to follow the protocol rules. Does she pick the real chain (B1, B2, ..., B3168), the fake chain (B1, B2, ..., B2000, B2001', B2002', ..., B3200'), does she get stuck at block B2000, or something else entirely? Is the outcome dependent on which blocks she receives first. In other words, what if she gets the blocks from the fake blockchain well before the blocks from the real one. Can she sync to the present timestamp and have her local blockchain become the fake chain, and then later when she gets the blocks from the real chain she rejects them because the fork point is more than 1000 blocks old from her new current head block? And similarly, if she got the blocks from the real blockchain well before the fake one, would she instead sync to the real blockchain and reject the fake one?
« Last Edit: July 12, 2015, 06:13:09 pm by arhag »

Offline bytemaster

There are certainly a lot of edge cases that we need to carefully document and I think we can vaguely categorize them by the percentage of colluding witnesses and the time horizon for the attack:

100% collusion of witnesses means they can block transactions that would vote them out.  The network would be held hostage to their demands until a hard-fork can be distributed. 

Given that reality, the case of witnesses turning corrupt *AFTER* they have been voted out is rather unlikely and far less effective; therefore, it is safe to assume it wouldn't happen.

The network is "safe" against both of those attacks so long as 5% of the witnesses do not collude.  That 5% would be enough to allow the minority chain that had support of the stakeholders to elect new
witnesses and eventually overtake the attackers chain which would be stuck at 95% participation assuming that in a healthy network participation averages over 95%.

During the attack it is not safe to transact without PICKING a fork that you know will eventually win.  It would essentially be a "bank holiday" while votes are cast and a new consensus is reached.  Fortunately, everyone would be well aware of what was going on because the "minority" witnesses would indicate their intent to "fork" after the colluding witnesses start rejecting their otherwise valid blocks only because they vote against the colluding witnesses.   It would be an "open and shut" case and the network would likely recover in hours.

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
There are certainly a lot of edge cases that we need to carefully document and I think we can vaguely categorize them by the percentage of colluding witnesses and the time horizon for the attack:

100% collusion of witnesses means they can block transactions that would vote them out.  The network would be held hostage to their demands until a hard-fork can be distributed. 

Given that reality, the case of witnesses turning corrupt *AFTER* they have been voted out is rather unlikely and far less effective; therefore, it is safe to assume it wouldn't happen.

Can you please answer what the clients would do according to the protocol in the scenario I described in my previous post? I first want to just understand the protocol well enough and understand what attacks are possible before considering the attack economics and the probability of attacks occurring or the feasibility of successfully carrying out a particular attack.

Also, I don't understand why you think it is unlikely for witnesses that have already been voted out to collude to create a fake blockchain. At that point they have nothing to lose, especially if their real identities were never revealed.


The network is "safe" against both of those attacks so long as 5% of the witnesses do not collude.  That 5% would be enough to allow the minority chain that had support of the stakeholders to elect new witnesses and eventually overtake the attackers chain which would be stuck at 95% participation assuming that in a healthy network participation averages over 95%.

I think this is a different scenario from the one I was discussing, but an interesting one nonetheless. I don't understand how the protocol allows a minority chain to recover. I understand that a minority chain could include transactions that change enough votes to vote out the other 95% of bad witnesses by the next maintenance interval, but how exactly do clients decide to switch to that chain given that it is a minority chain? What if there were multiple minority chains that elected a different set of witnesses? How do clients choose which one to adopt?


During the attack it is not safe to transact without PICKING a fork that you know will eventually win.  It would essentially be a "bank holiday" while votes are cast and a new consensus is reached.  Fortunately, everyone would be well aware of what was going on because the "minority" witnesses would indicate their intent to "fork" after the colluding witnesses start rejecting their otherwise valid blocks only because they vote against the colluding witnesses.   It would be an "open and shut" case and the network would likely recover in hours.

This is all very interesting to further explore in detail and try to work out the edge cases. But I would also like clarification on how much of what you are describing is part of the current protocol that either is already implemented or will be by the time BitShares 2.0 is released, and how much of this are just future ideas that could and probably should eventually be included as part of the protocol and implemented in code in the clients. In other words, if right now 95% of delegates started rejecting transactions that voted them out or voted others in, would BitShares automatically recover (and by automatically I am still assuming enough stakeholders actually sign and broadcast transactions that change their votes to new non-censoring delegates) without any work necessary from the developers and without trusted checkpoints?

Offline bytemaster

The protocol is VERY simple, longest valid chain wins.   

Blocks from other forks are not considered until those forks are longer than the current fork. 

The code is written with a maximum "automatic" rollback period, any forks longer than that automatic period require manual intervention.  This is not part of consensus, it is there as a protection to the user because if it happens it indicates an attack of some form and closer attention may be needed.

Everything else I wrote is merely identifying what the possible scenarios are for the "longest chain" and explaining why the "longest chain" is all that matters.

There are THREE primary use cases:

1. Majority of Witnesses Collude
2. Minority of Witnesses Collude
3. No Collusion, just network problems / bugs.

The minority will never be able to produce a longer chain than the majority unless the majority resign and yield to the minority or the majority have poor performance for a long period of time.

If the Majority of Witnesses Collude we must ask what they are colluding to do:

1. Block transactions
2. Block anyone from voting them out.

As long as they are not colluding to prevent voting then the network will suffer from low delegate participation (as the minority delegates are ignored) until the majority is voted out.
If the are preventing people from voting then the minority needs to stop producing and users need to stop transacting on majority chain.  Everyone would pick a minority fork and vote out the bad witnesses.  At this point the minority fork would have 100% block production and the old majority fork will still be stumbling along at 50 to 90% corrupt-witness participation.   Eventually the minority fork will overtake it and become the longest chain.

Technically it is possible for the majority of stakeholders to collude to fracture the network and that is what most of your "attacks" are assuming: that the same users will vote differently on different forks and that they will not work to reach consensus.    I think this is not a realistic case.

   

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
If the are preventing people from voting then the minority needs to stop producing and users need to stop transacting on majority chain.  Everyone would pick a minority fork and vote out the bad witnesses.  At this point the minority fork would have 100% block production and the old majority fork will still be stumbling along at 50 to 90% corrupt-witness participation.   Eventually the minority fork will overtake it and become the longest chain.

Everyone "picking" the minority fork is not an automatic process done by code. It needs to be selected by each user presumably through a set of blacklisted witnesses that everyone else has agreed to use (to make sure everyone gets on the same fork). If that is the case, why does it even matter if the "minority fork" ever becomes longer than the original "majority chain". It just needs to vote out the bad witnesses so that it can back to normal high witness participation.


Technically it is possible for the majority of stakeholders to collude to fracture the network and that is what most of your "attacks" are assuming: that the same users will vote differently on different forks and that they will not work to reach consensus.    I think this is not a realistic case.

I'm not trying to argue what the probability of any particular event occurring is. I am not trying to say there aren't ways of resolving issues that arise in the rare events that they do. All I am trying to understand at this stage is all the possible ways attacks can force the system into a state that requires manual action (such as relying on social networks to gain consensus) because the protocol cannot automatically resolve it.

For example, you still haven't answered what happens with Alice in the situation described in my earlier post. My assumption from what you have said so far is that her client will pick the fake chain because it is longer without giving her any warning that something weird is going on with the blockchain. Is that correct or would the client recognize the two possible blockchains forking at block B2000? It is confusing because if the fake chain was released into the wild after Alice already synced, I presume she would stay on the real chain because of the 1000 block limit of the undo history. So it seems the results of which chain her client ends up on seem to be dependent on timing of when her client syncs relative to when blocks are received from the network (is that correct?). Again, I'm not saying this is necessarily an attack, I just want to understand what happens, what the client is aware of, and what the client warns to the user.

Let me rephrase the problem. Let's say I was an active witness 3 months ago along with 100 of my friends. By now, we have all been voted out and replaced by 101 new witnesses that are doing a good job but on average have a 99% witness participation rate. My friends and I decide to all get together to produce a fake blockchain forking off from the point where we were all active witnesses 3 months ago to the present, and we distribute this fake blockchain on the peer-to-peer network. What does this do to the full nodes currently online and active on the network (I would guess nothing)? Now what if someone with a full node that hasn't synced with the network for 4 months decides to finally resync. What happens to that full node?
« Last Edit: July 14, 2015, 02:01:54 am by arhag »

Offline sittingduck

  • Sr. Member
  • ****
  • Posts: 246
    • View Profile
If there is an attack chain in the wild then a checkpoint will have to be added.   You seem to grasp the issue very well.




Sent from my iPhone using Tapatalk

Offline bytemaster

A new node coming on line can get "stuck" on a fork created by 100% collusion of former witnesses and requires manual intervention of specifying a checkpoint to get off that fork.

In the event that 100% of former witnesses collude then the blockchain will require 1 checkpoint every 1000 blocks until the attackers are voted out to prevent new nodes from accidentally getting stuck on a fork created by the attackers.

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline bytemaster

In theory we could add a second metric for measuring the best blockchain via TAPOS which would be "total coindays approved".

Every time an account signs a transaction with a TAPOS header then we can perform the following calculation:

CHAIN_COINDAYS_APPROVED  +=   ACCOUNT_BALANCE * (CURRENT_TAPOS_UPDATE - LAST_TAPOS_UPDATE)

Under this model the colluding witnesses would require both the longest chain *AND* the highest CHAIN_COINDAYS_APPROVED.   

This metric would be relatively cheap to compute, but is still an extra computation on EVERY transaction.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
A new node coming on line can get "stuck" on a fork created by 100% collusion of former witnesses and requires manual intervention of specifying a checkpoint to get off that fork.

In the event that 100% of former witnesses collude then the blockchain will require 1 checkpoint every 1000 blocks until the attackers are voted out to prevent new nodes from accidentally getting stuck on a fork created by the attackers.

Thank you for your responses.

But I am still interested in better understanding the client logic and how it determines it is stuck. When does the client determine it is stuck? If it receives a full round of 101 (fake) blocks after the (yet to be known) fork point prior to receiving 90 real blocks (signed by a subset of the same witnesses who signed the 101 fake blocks) after the fork point, does it then cause the client to realize the fork and go back and get "stuck" at the fork point (and thus warn the user it is stuck)? I presume so because the fork point is less than 1000 blocks in the past.

Now if the attacker is able to control the victims network connection for long enough (more than 1000 blocks) to prevent the real chain from appearing, the victim will continue on thinking everything is fine on the fake chain. When the attacker gives up control of the victim's network connection, the victim is then able to see all the real blocks flooding into the system. But by the this point the fork point is more than 1000 blocks older than the present, so it does not do a chain reorganization nor get stuck, correct? In fact, the client happily continues on in the fake chain thinking everything is fine until one day the victim realizes he isn't receiving transactions from a user on the real chain or that his transactions to a user on a real chain are not being received, at which point he discovers that there is problem and ends up requiring a trusted checkpoint (and probably ends up realizing he lost money from a double spend attack at that point).


In theory we could add a second metric for measuring the best blockchain via TAPOS which would be "total coindays approved".

It sounds like another interesting statistic that can be learned from the blockchain but I'm not very interested in that as part of the protocol because it makes it harder to reason about and I worry it can potentially add new attack vectors. In the case of a stuck blockchain, we could have each specific user's full client (this problem doesn't even apply for lightweight clients since those depend on trust in the wallet provider anyway) offer to reindex the chain from the last trusted checkpoint to calculate the total coindays approved since then and also other valuable statistics, rather than burdening that task (as small as it may be) on all full nodes by making it part of the protocol. The user could then use this information to pick a particular chain themselves if they wanted, although it would still be better for them to just rely on their social network for a trusted checkpoint in these rare cases.

Offline BTSdac

  • Hero Member
  • *****
  • Posts: 1219
    • View Profile
  • BitShares: K1


Let's say Alice has the following blockchain stored locally: B1, B2, ..., B1500.
The real consensus blockchain at time T is: B1, B2, ..., B3168.
There is a fake blockchain floating around in the network which first appeared at time T and looks like this: B1, B2, ..., B2000, B2001', B2002', ..., B3200'.
The common fork point between this fake blockchain and the real consensus chain is B2000. After this point this blockchain diverges form the real blockchain. The witnesses (which I will call "fake witnesses" from this point forward) at the time of B2000 continue to be active in the fake blockchain whereas they were all voted out in the real chain and replaced by new witnesses (which I will call "real witnesses" from this point forward) at block B2100 (start of new maintenance interval). Let's assume all the significant votes that caused the "fake witnesses" to be replaced by the "real witnesses" appeared between blocks B2001 and B2099.
This fake blockchain is a valid chain signed by witnesses according to the rules of the protocol following that chain. At time T it is longer than the real blockchain by about 32 blocks because the "fake witnesses" were able to have 100% witness participation while the "real witnesses" were only able to maintain 99% witness participation over the last 1000 blocks. Therefore, by the longest chain rule, people should prefer the fake chain over the real chain, but because the fork point is more than 1000 blocks old at the time T when the fake blockchain appeared on the network, none of the nodes online at that time would be tricked onto that fake chain (am I correct in this understanding?).

I don`t know if I understand you ,and the following is my thinking .
after fake delegates fork blockchain , on the fake chain would miss the block that should been produced by real delegates and meantime on the real chain would miss the block that should been produced by fake witnesses till  fake delegates been voted out ,
so
I resume there are 50 fake witnesses and 51 real witnesses , and if we don`t consider delegate off line and delegate order .
time                                               after C*101 blocks              after fake delegate been voted out  ,N*101 blocks
block height of fake chain              2000+50*C                         2000+50*C+50*N
block height of real chain               2000+51*C                         2000+101*C+101*N

and we can see if the fake delegates is less than real delegates ,the real chain must longer than fake chain if don`t consider delegate order .
but if we consider the delegate order .
I resume all the fake 50 delegate in the (1~50)order in first round after fork . and all the 51 delegates in the (51~101)order in this round , fork on B2000,  time from B2001 to B2050, the fake chain is the longer chain .
but the probability is very small, I think .
but on other hand , if this situation would as following
B0->    ->     ->    ->    ->    ->    ->    ->    ->    ->
B0->B1->B2->B3->B4->B5->B6->B7->B8->B9->
the first chain is a chain without block follow
so this is not fork actually .
« Last Edit: July 15, 2015, 05:01:53 am by BTSdac »
github.com :pureland
BTS2.0 API :ws://139.196.37.179:8091
BTS2.0 API 数据源ws://139.196.37.179:8091

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
The network is "safe" against both of those attacks so long as 5% of the witnesses do not collude.  That 5% would be enough to allow the minority chain that had support of the stakeholders to elect new witnesses and eventually overtake the attackers chain which would be stuck at 95% participation assuming that in a healthy network participation averages over 95%.

I think this is a different scenario from the one I was discussing, but an interesting one nonetheless. I don't understand how the protocol allows a minority chain to recover. I understand that a minority chain could include transactions that change enough votes to vote out the other 95% of bad witnesses by the next maintenance interval, but how exactly do clients decide to switch to that chain given that it is a minority chain? What if there were multiple minority chains that elected a different set of witnesses? How do clients choose which one to adopt?

I also want to add to these questions a bit to hopefully make it more clear what I am asking here.

Let's say there are N = 100 witnesses all producing blocks and everything is good. A new round begins (which is also the second to last round of this maintenance interval) and it is ordered according to: W1, W2, ..., W100. When it is W2's turn they produce a block that has a transaction that votes out witnesses W1 and W3 to W100, and replaces them with W101 to W198 (this transaction is the deciding vote, meaning if included it will kick those witnesses out and replace them with W101 to W198). When it is W3's turn to produce a block, they build off of W1's block, not W2's block, but nevertheless include that transaction that votes out the other witnesses. When it is W4's turn to produce a block, they build off of W1's block, not that of W2 or W3, and they do not include that vote changing transaction. W5 builds off of W4's block and doesn't include the vote changing transaction as well. And similarly for W6 to W100, thus ending that round with a witness participation of 98% (since the blocks produced by W2 and W3 were not included in the majority chain and therefore considered missing).

Now at the start of the next block (which is the last round of this maintenance interval), a witness shuffle takes place in the majority chain using the accumulate random number (which I guess did not include W2's and W3's random numbers since they were not revealed in the main chain). W2 and W3 are ordered somewhere in the middle of that round and will not produce blocks that round, even if they did they would likely be skipped over by the other witnesses. The other witnesses will also not include the vote change transaction.

What about the other orphan forks by W2 and W3? What happens at the end of the previous round when 99% of their witnesses were missing blocks. Is there a witness shuffle? If so how does the algorithm work? In fact, more generally how does the random number commitment, reveal, and accumulation process work in light of missing blocks in rounds or witnesses being voted in and out in maintenance interval transitions? I presume however the witnesses are ordered in the new round, both W2 and W3 will have an opportunity to produce a block in each chain. So let's say W2 builds a block off their own block in their chain and W3 builds a block off their own block in their own chain.

Let me first get some notation out of the way.

B(W, t) means a block signed by witness W with timestamp t.
The blockchain interval is Δt.

B1 ← B2 is a blockchain subsequence that means block B2 builds off block B1.
B1 ← B2 = B(W3, t) ← B3 denotes a blockchain subsequence of three blocks where the middle block is given an alias of B2 and has a timestamp t and was signed by W3.
B(n) denotes a blockchain starting from the genesis block and ending with a block with block number n.

P(t : time, i : bounded_int([0, size(L)!)), L : [optional(witness_t)]) defines a particular blockchain subsequence given by the following pseudocode:
Code: [Select]
for all types I,
fn permuted<I>(acc : [I], indices : [I], j : uint) -> [I];
// Deterministically permutes seq using index j according to whatever blockchain protocol rules require.
// Note that with no restriction on the permutation the type of j should actually be bounded_int( [0, size(indices)!) ).
// However if the blockchain protocol rules are more restrictive than the bounds on j need to be tighter.

// For completeness here is one permutation implementation that allows any permutation of indices (no further restrictions like those needed for DPOS 2.0).
fn permuted<I>(acc : [I], seq : [I], j : uint ) -> [I]
 {
     ASSERT( j < size(seq)! );
     match size(seq) {
          0 => acc,
          m =>  let j1 = j % m in
                    let j2 = j / m in
                    let swapped_seq = swap(seq, 0, j1) in // let swapped_seq equal seq except with the values at the index 0 and j1 swapped
                    permuted<I>(append(acc, head(swapped_seq)), tail(swapped_seq), j2)
     }
}

fn helper(start : time, acc : [(time, witness_t)], seq : [optional(witness_t)]) -> [(time, witness_t)] {
     match seq {
          empty_list => acc,
          prepend(head, tail) => match head {
               Some(w : witness_t) => helper(start + Δt, append(acc, (start, w)), tail),
               None  => helper(start + 2Δt, acc, tail)
     }
}

datatype block = Block(witness_t, time)
datatype block_subsequence = Rest | Chain(block_subsequence, block)

fn helper2(L' : [(time, witness_t)], acc : block_subsequence) -> block_subsequence {
     match L' {
          prepend((head_t, head_w), tail) => helper2(tail, Chain(acc, Block(head_w, head_t))),
          _ => acc
     }
}

fn P(t : time, i : bounded_int( [0, size(L)!) ), L : [optional(witness_t)]) -> block_subsequence {
     let L' = helper(t, [], permuted<optional(witness_t)>([], L, i)) in
     helper2(L', Rest)
}
L is a sequence of witnesses, e.g. [Some(W1), Some(W2), None, Some(W5), ...], that defines the witnesses that will be participating in block signing and how many are missing. As a shorthand [Some(W1), Some(W2), None, Some(W5), ...] can be written as {W1, W2, _, W5, ...}. L' is a sequence of (time, witness_t) tuples, e.g. [(t, W2), (t+2Δt, W5), (t+3Δt, W1), ...], that defines the blocks of the blockchain subsequence returned by P, e.g. B(W2, t) ← B(W5, t+2Δt) ← B(W1, t+3Δt) ← ..., in the same order. Note that the timestamps of the blocks of the resulting blockchain subsequence will always be within the interval [t, t+(size(L)-1)*Δt].
Examples:
B(W1, t) ← P(t+Δt, 0, {W2, W3}) is equivalent to the blockchain subsequence B(W1, t) ← B(W2, t+Δt) ← B(W3, t+2Δt).
B(W1, t) ← P(t+Δt, 1, {W2, W3}) is equivalent to the blockchain subsequence B(W1, t) ← B(W3, t+Δt) ← B(W2, t+2Δt).
B(W1, t) ← P(t+Δt, 6, {W2, _, W3, W4}) is equivalent to the blockchain subsequence B(W1, t) ← B(W3, t+Δt) ← B(W2, t+2Δt) ← B(W4, t+4Δt).
B(W1, t) ← P(t+Δt, 2, {W2, _ x 5, W3}), which is the shorthand for B(W1, t) ← P(t+Δt, 6, {W2, _, _, _, _, _, W3}), is equivalent to the blockchain subsequence B(W1, t) ← B(W2, t+3Δt) ← B(W3, t+7Δt).
 
Okay, enough notation.

So by the end of this round (the last round of the maintenance interval), just after time t+199Δt, the three blockchains look like this:

Chain 1 (majority chain with length n + 196): Bbase ← Bf = B(W1, t) ← B(W4, t+3Δt) ← B(W5, t+4Δt) ← ... ← Bl,1 = B(W100, t+99Δt) ← P(t+100Δt, i1,1, {W1, _, _, W4, W5, ..., W100}),
where i1,1 is some index derived from the accumulated random value as of block Bl,1, and Bbase = B(n) for some positive integer n.

Chain 2 (minority chain with length n + 3): Bbase ← B(W1, t) ← Bl,2 = B(W2, t+Δt) ← Bv,2 = B(W2, t+k1Δt),
where 100 ≤ k1 ≤ 199 depending on the witness shuffling algorithm determined just after time t+99Δt using information by block Bl,2.

Chain 3 (minority chain with length n + 3): Bbase ← B(W1, t) ← Bl,3 = B(W3, t+2Δt) ← Bv,3 = B(W3, t+k2Δt),
where 100 ≤ k2 ≤ 199 depending on the witness shuffling algorithm determined just after time t+99Δt using information by block Bl,3.

The common fork point of all three chains is block Bf which has block number n+1.

By time t+200Δt, a new maintenance interval has started and therefore the votes have been updated and some witnesses might have been removed and replaced by others. This happens in each of the three chains despite the fact that chain 2 and chain 3 had only 1 block in the previous round, correct? Let's assume that no change of witnesses occur in chain 1. So the same witness W1 to W100 are still active in that chain, there is just a regular reordering since it is the start of a new round.

But in chain 2 and 3, we can imagine enough votes were accumulated in Bv,2 and Bv,3, respectively, that witnesses W1 and W4 to W100 were voted out and replaced by witnesses W101 to W198. Therefore, I assume that in the next round, a witness shuffling will occur for the sets {W2, W3, W101, W102, ..., W198} in both chain 2 and chain 3 (but each with their own random number used for the shuffling), correct? If so, the blockchain another 100 blocks later could end up something like this:

Chain 1 (majority chain with length n + 294): Bbase ← B(W1, t) ← B(W4, t+3Δt) ← B(W5, t+4Δt) ← ... ← B(W100, t+99Δt) ← P(t+100Δt, i1,1, {W1, _, _, W4, W5, ..., W100}) ← P(t+200Δt, i1,2, {W1, _, _, W4, W5, ..., W100}),
where i1,2 is some index that gives the "appropriate" permutation to satisfy the blockchain protocol rules for witness shuffling at this point in the blockchain.

Chain 2 (majority chain, at least for this past round since it has 91% witness participation, with length n + 94): Bbase ← B(W1, t) ← Bl,2 = B(W2, t+Δt) ← B(W2, t+k1Δt) ← P(t+200Δt, i2,2, {_ x 9, W2, W101, W102, ..., W190}).

Chain 3 (minority chain, since it only has 9% witness participation in this last round, with length n + 12): Bbase ← B(W1, t) ← Bl,3 = B(W3, t+2Δt) ← B(W3, t+k2Δt)  ← P(t+200Δt, i3,2, {_ x 91, W3, W191, W192, ..., W198}).

Here I assumed that nearly all (90 out of 98) of the newly elected witnesses chose to build on chain 2, while the rest (8 out of 98) decided to support chain 3. Notice that none of the 98 could build on chain 1 because they were active witnesses in that chain (the transactions that got them elected in were not included in chain 1).

Also notice that someone looking at just the past round would think that chain 2 is doing pretty decently since 91% of its active witnesses produced blocks in the last round (despite the fact that only 1% of witnesses produced blocks in the prior round). However, going by the longest chain metric neither chain 2 nor chain 3 win compared to chain 1, which is clearly the longest chain. Therefore, all full nodes following the protocol still stay on chain 1. If we assume that the remaining witnesses that are wasting their time on chain 3 decide to finally join chain 2, then chain 2 can theoretically achieve 100% witness participation from that point forward.

If witnesses W2, W3, W191, ..., W198 continue to always produce blocks for every round thereafter, and if witnesses W1, W4, ..., W100 continue to always produce blocks for every round as well, then the block height of chain 1 will increase by 98 each round while the block height of chain 2 will increase by 100 each round. So after r rounds, chain 1 has n1 = n + 294 + 98*r  blocks and chain 2 has n2 = n + 94 + 100*r blocks. We can see that n1 equals n2 at r = 100, or after 100 more rounds (or 10,000 more blocks in chain 2 or just before time t + 10199Δt) chain 1 and chain 2 are of equal length. In the next round afterwards, chain 2 will become the longest chain. However, by this point the common fork point between these two chains (which remember is at block Bf and block number n+1) is way more than 1000 blocks old. In fact it is at least 10094 blocks old. So none of the live full nodes will switch over to chain 2, correct?

Now some other user, Alice, runs their BitShares full node client and connects to the network. The last block they had synced in their previous session of using the client just prior to shutting down was block Bf which has block number n+1 and timestamp t+Δt. Now it is time t+10201Δt (10200 seconds, or 2.83 hours, later with Δt = 1 second) and Alice's client needs to sync the last nearly 3 hours worth of blocks. From Alice's perspective, the longest chain is chain 2 not chain 1. Her client was not online to see what happened between time t+Δt and t+10201Δt. The client has no way of knowing that everyone has decided to stick with chain 1 because the fork point was too old to trigger a blockchain reorganization. So after getting the two candidate chains, does Alice's client conclude that it must get stuck at block Bf and warn the user? At what time in the syncing process does it figure this out? Given the two chains, 1 and 2, stored locally that are in the state as I described it as of time  t+10199Δt, is the decision Alice's client makes deterministic or does it depend on when the client receives the corresponding blocks as it is syncing? Does the client keep going back and forth between chain 1 and 2 (using its undo history to constantly rewind back to Bf) as each chain alternates in winning over the other in longest length like a close horse race as blocks from each chain come in through the peer to peer network?

My understanding of how this should work is that the clients should be designed to get stuck unless they are able to sync to the head block that has a valid timestamp that gets as close as possible (relative to the time reported by NTP) without exceeding it AND where the common fork point of any other competing valid chains that occur in a block after the block that was last synced by the client by the end of the previous session is not too old to prevent reorganization. I believe that is the only time where it makes sense to keep moving forward with the longest chain with no error, and any other scenario requires the client to get stuck and warn the user (would you say this characterization of the problem is correct?).


Now, putting those questions aside for now, I think it is worth looking at a simpler scenario of just two chains and to try to determine what it takes in terms of honest witnesses and in average witness participation for the "good" chain to win in longest length prior to the chain reorganization limit being reached. So in this new scenario we will suppose that there are G good witnesses (Worig,1, ..., Worig,G) that start on the new chain as soon as they have enough votes to vote out the bad witnesses (Wbad,1, ..., Wbad,N-G), presumably because they are censoring transactions, and no one wastes time on any other minority chains. The newly elected witnesses (Wnew,1, ..., Wnew,N-G) along with the original good witnesses then produce blocks each round with an average f witness participate rate (1 - G/N < f ≤ 1). In that case we might have the following chains early in the process (just before time t + 2*N*Δt):

Bad chain (length n + 2*(N-G)): Bbase ← P(t, ib,1, {_ x G, Wbad,1, ..., Wbad,N-G}) ← P(t + N*Δt, ib,2, {_ x G, Wbad,1, ..., Wbad,N-G}).

Good chain (length n + k + G + N): Bbase ← B(Wbad,j1, t) ← B(Wbad,j2, t+Δt) ← ... ← Bfork = B(Wbad,jk, t+(k-1)*Δt) ← P(t+k*Δt, ig,1, {_ x (N-G-k), Worig,1, ..., Worig,G}) ← P(t + N*Δt, ig,2, {Worig,1, ..., Worig,G, Wnew,1, ..., Wnew,N-G}),
where 0 ≤ k ≤ N - G.

At this point the bad chain is the longer chain as long as N - k - 3G > 0. If the good witnesses get lucky (k = N - G), then the good chain will already be the longer chain by this point in time. While we cannot tell what k will be (since it depends on the witness shuffling and thus on the random values), we can calculate the probability distribution p(k) assuming uniformly random permutation. The distribution is p(k) = (G * (N-G)! * (N-k-1)! ) / ( (N-G-k)! * N! ) for 0 ≤ k ≤ N - G, and so the expected value for k should be ∑ k*p(k) from k = 0 to (N-G), which according to WolframAlpha is (N-G)/(G+1). So substituting that expression for k in the inequality and manipulating the terms a little gives the inequality 3*G - N  + 4 < 0, or N > T(G) where T(G) = 3*G + 4. When N > T(G), then at this point in time the bad chain will still be the longer chain, but if N < T(G), then at this point the good chain will have already become the longer chain. With N = 100, T(G) first becomes greater than N at G = 33.

Despite estimating the expected value for k, I will proceed with the worse case assumption that k = 0. Assuming the bad chain is still the longer chain, we can see what happens after several rounds. We should expect the bad chain to add (N-G) blocks each round, but we will only assume the good chain can add f*N blocks each round. So after r additional rounds (just before time t + (2*N + r)*Δt) the length of the bad chain is n + (2+r)*(N-G), and the length of the good chain is at least n + G + N + r*f*N. The difference between the length of the good chain and the length of the bad chain after r rounds is approximately D(r) = 3*G - N + r*(G - (1-f)*N). We can see that D(r) will be positive under the constraints (1-f)*N < G < N/2 and 0.5 < f < 1 when r > (g^2 - 2*f*g+ 2*f - 1)/(f+g-1) where g = G/N. We need D(r) to become positive (so that the good chain becomes the longest chain) before the fork block, Bfork at block number n+k, becomes older than R = 1000 blocks. This means that as soon as (2+r)*(N-G) > R, which is at about r = R/(N-G) - 2, D(r) must be positive which thus requires that r = (R/N)/(1-g) - 2 > (g^2 - 2*f*g+ 2*f - 1)/(f+g-1), or R > N*(1-g)*(2 + (g^2 - 2*f*g+ 2*f - 1)/(f+g-1)). This can also be rewritten as f > (g^2  - 2*(1-g) + R' - 1) / (R'/(1-g) - 2*(2-g)), where R' = R/N, and assuming R' > 2*(1-g)*(2-g) which given the previous inequality holds true for f > 2/3. So with f > 2/3, which means g < 1/3 (for g > 1/3 the good chain wins without any extra rounds being necessary), and the constraint that 1-g < f < 1, we find that R' > (-g^3 + g^2 - g + 1)/g. 

So with R = 1000 and N = 100 (thus R' = 10), the minimum value of g necessary to guarantee the good chain will become the longer chain prior to the chain reorganization limit being reached is approximately 0.0916. This means at least 10 good witnesses out of the total 100 witnesses are needed. And the minimum value of f needed as a function of g is given by this plot. With G = 10, the average witness participation needs to at least be 98.62%. With G = 15, we get a more sensible value of 90.8% as the minimum witness participation needed to ensure the good chain becomes the longest chain prior to the chain reorganization limit being reached.
« Last Edit: July 15, 2015, 08:11:45 pm by arhag »