Author Topic: Can we get formal documentation on the blockchain consensus algorithm?  (Read 7502 times)

0 Members and 1 Guest are viewing this topic.

Offline sittingduck

  • Sr. Member
  • ****
  • Posts: 246
    • View Profile
Complexity kills.  The more complex the more attacks there may be.    A manual vote tally could provide some protection.  It should only charge a fee if the votes didn't change the outcome.


Sent from my iPhone using Tapatalk

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
Every time a block is missed the undo history grows by 2 so if 34% of the witnesses notice vote censoring they can simply stop producing on one fork and start producing on the other.  This would cause the undo history on the majority fork to grow to its maximum value (1000) and then all block production would stop.     

Yeah it isn't a problem if more than a third of the witnesses are good. But I think the probability of successfully recovering from a censored chain gets bad very quickly when it is less than a third.


Chain reorganizations are VERY BAD.  A reorganization of more than 10 blocks is a disaster because many transactions would get orphaned because they reference a more recent TAPOS block.

That's true. In that case we need an alternative method to allow the good guys to recover from a censored chain.

First, there should be a way to call an early vote tally (and early maintenance interval change) under critical times. I believe the only reason a maintenance interval isn't each block or even each round is because of the unnecessary burden it places on computing the vote updates? I'd appreciate if you could explain the reason behind it seeing how DPOS 1.0 updates votes every block (or is it every round?) already. I'm assuming it wouldn't be a problem to add that burden in rare situations (a witness could even post a security deposit they forfeit if it turns out the rare situation, which could only be determined by first doing the vote tally, was not true).

Second, I think we should change the longest chain metric to a more general longest accumulated weight metric. Each active witness has a dynamic weight at each point along the blockchain. Initial (and typical) weights could be 1/N where N is the number of active witnesses. But whenever a new active witness is elected, then in their first round (meaning each round in which they are an active witness but also were not an active witness in the previous round) they get a one round boost where their weight becomes b/N for that round (with boosting factor b > 1). In the next round their weight goes back down to 1/N.

If we set an appropriate boosting factor parameter, and allow an active witness to call an early vote tally (potentially with a security deposit, but I think the threat of being voted out for crying wolf is enough motivation for it to not be abused) which goes into effect (as a new round with the new set of witnesses shuffled) immediately in the next block, then we can allow even a single honest witness to recover a censored chain assuming they can get enough legitimate unexpired vote transactions to vote the censoring witnesses out.

Edit: In fact, we could go further to allow the system to recover even if all active witnesses were compromised. Every round the system can randomly select a single standby witness where the probability of selecting the witness depends on their approval. This selected witness would have the opportunity to produce a single block at the beginning of every round if they also pay a small fee in that block (so not only do they not get paid to produce the block they have to actually pay for the privilege). Their weight for that block would be boosted to b'/N with b' > 1 to give their block an advantage over the block by the regular active witness scheduled for that same slot. The fee is set so that it is normally unlikely for any standby witness to take "advantage" of this opportunity to produce a block (and therefore avoid situations where there is normally a couple of blocks that need to be rewound every round or alternatively where the first active witness of each round gets skipped) and to compensate the network for the cost of having to tally votes (see next sentence). However, if the block by a standby witness causes a significant fraction (doesn't have to be majority but it should probably be larger than 25%) of active witnesses to be voted out, then an early maintenance interval is called, and the standby witness that produced that block will get back their fee plus some interest (perhaps considerable interest). So even if the standby witness scheduled for that round isn't going to get voted in himself, he is still economically motivated to do the right thing which helps the network recover.
« Last Edit: July 16, 2015, 01:41:56 am by arhag »

Offline bytemaster

The number of blocks of "undo history" that we maintain is now dynamic based upon witness participation.

https://github.com/cryptonomex/graphene/issues/160

Doesn't a minimum undo history of 10 mean that if a good witness wants to correct a chain that is controlled by censoring witnesses, they need to make sure that they can become the longest chain with the new set of elected witnesses in less than 10 blocks prior to the fork point? This seems way too limiting.

I haven't yet properly done the analysis with this new dynamic undo history size change, but from some simple initial examination, I think the minimum undo history size should also be dynamic and be proportional to the number of active witnesses. I believe a minimum undo history size of 5 times the number of active witnesses, should allow even 10% of honest active witnesses to recover a censored blockchain from the majority bad witnesses with high probability of success by strategically collecting (hopefully still unexpired) vote change transactions and initiating the fork in the round just prior to the end of a maintenance interval so that the newly voted in honest witnesses can become active immediately in the following round.

Chain reorganizations are VERY BAD.  A reorganization of more than 10 blocks is a disaster because many transactions would get orphaned because they reference a more recent TAPOS block.   

Every time a block is missed the undo history grows by 2 so if 34% of the witnesses notice vote censoring they can simply stop producing on one fork and start producing on the other.  This would cause the undo history on the majority fork to grow to its maximum value (1000) and then all block production would stop.     

This actually introduces a new "attack".... 34% of the witnesses can turn off and the blockchain will halt until a checkpoint is introduced and new witnesses are voted in.   

I suspect there is a balancing act between preventing forks and allowing a large minority to halt the blockchain without manual intervention.

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
The number of blocks of "undo history" that we maintain is now dynamic based upon witness participation.

https://github.com/cryptonomex/graphene/issues/160

Doesn't a minimum undo history of 10 mean that if a good witness wants to correct a chain that is controlled by censoring witnesses, they need to make sure that they can become the longest chain with the new set of elected witnesses in less than 10 blocks prior to the fork point? This seems way too limiting.

I haven't yet properly done the analysis with this new dynamic undo history size change, but from some simple initial examination, I think the minimum undo history size should also be dynamic and be proportional to the number of active witnesses. I believe a minimum undo history size of 5 times the number of active witnesses, should allow even 10% of honest active witnesses to recover a censored blockchain from the majority bad witnesses with high probability of success by strategically collecting (hopefully still unexpired) vote change transactions and initiating the fork in the round just prior to the end of a maintenance interval so that the newly voted in honest witnesses can become active immediately in the following round.
« Last Edit: July 15, 2015, 10:06:15 pm by arhag »

Offline bytemaster

The number of blocks of "undo history" that we maintain is now dynamic based upon witness participation.

https://github.com/cryptonomex/graphene/issues/160

If greater than 66% of witnesses are participating then the undo history is kept to a minimum of 10 blocks. Fully synced nodes with high participation never fork and never get tricked.

If an evil witness decides to create a minority fork with less than 33% participation then the required undo history will grow until it hits a maximum limit and no more blocks can be pushed without a checkpoint.

This means that an "attacker" wishing to maximally disrupt the syncing of a new node would create as many forks as possible off every block they produce and then run a node for each fork they wish to trick clients to go down.

This attack can be completely prevented by syncing from trusted seed nodes and it does not result in a risk of double spend. 

It would be an annoying attack, but it wouldn't prevent the node from eventually syncing. 

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
As far as understanding what happens, nodes ONLY get stuck if they travel down the wrong fork for 1000 blocks otherwise they will resolve the issue themselves.    That said it should be impossible to create a longer fork that is 1000 blocks different from the main fork without over 51% witness collusion.   

Great, this was very useful information to help me understand.

Now I am trying to understand how the client even evaluates other potential blockchains. Does it not need to consider every possible blockchain that forks off from a block in the last 1000 blocks from its current head block? Does it do this concurrently while syncing? Would this not potentially take a considerable amount of processing and memory since there is nothing to prevent an unlimited number of potential (not necessarily longer length) forks (especially if the offending witnesses don't care that they double sign)?

I mean if a client is syncing from the past along on a particular blockchain (with many more blocks to still sync) and a new block appears that forks off a block 10 blocks ago from the current head block that the client is on, shouldn't it evaluate that block since it potentially could be the start of another much longer blockchain than the one it is currently on? Imagine a nearly fractal like tree of blocks that has a short length (so all but one branches are less than 1000 blocks from the trunk). In order for the client to find that one that is actually the longer chain, doesn't it need to evaluate all of those branches? A branch that has low (e.g. 60%) witness participation early on might end up consistently maintaining 100% participation later in its branch, while a (fake) branch with 90% witness participation in the beginning of the branch (during the same time the real branch was at 60%) may never exceed 90% or may even stop dead soon after. So I am not sure how the client could dismiss branches early.
« Last Edit: July 15, 2015, 05:40:09 pm by arhag »

Offline bytemaster

Assume good witnesses   G1, G2, and G3
Assume bad witnesses   B1, B2, and B3
Assume 'v' means block contains a vote changing from B* to G*

Code: [Select]
B1  ->  B2  -> B3v  -> G1 -> G2 -> G3
           \-> B3   -> B1 -> B2 -> B3

Aside from the fact the B3 has no motive to ever create B3v in the first place then we have a situation will all nodes will stay on their current fork until either B* or G* becomes longer.  If B* becomes longer then G* will switch over and stop producing.  If they stay tied for 1000 blocks then it would require manual intervention to resolve.

For the sake of simplicity I do not want to consider any attacks that depend upon 51% witness collusion (past or present).     Nothing short of 95% witness collusion (past or present) would be able to manufacture an alternative chain that is longer. 

As far as understanding what happens, nodes ONLY get stuck if they travel down the wrong fork for 1000 blocks otherwise they will resolve the issue themselves.    That said it should be impossible to create a longer fork that is 1000 blocks different from the main fork without over 51% witness collusion.   

1. Normal witnesses running the standard code will stop producing blocks anytime they find themselves on a fork with 33% or less participation.   They would have to manually override the setting to produce blocks on such a low-participation chain.   This will result in minority chains "self-terminating" and/or freezing the blockchain state until what ever issue lead to the network split has been reviewed by humans and resolved.    This will prevent an attacker from hijacking honest witnesses to produce blocks on a minority fork. 


There is only one scenario that results in a stuck client on a chain produced by less than 51% of the witnesses.  A bad witness presents a user with 1000 blocks with 1% participation prior to the user connecting to the real network.  At this point in time the client will remain stuck on the 1% participation fork without manual intervention.

This particular scenario can be mitigated by having clients stop applying new blocks any time the participation rate falls below 33% unless there was a checkpoint within the last 1000 blocks.  Under this event even with 33% witness collusion, they would be unable to produce an alternative chain that could get a user stuck.   

The risk here is that if there are severe network issues like we have experienced a couple times in the past it would require all users to manually add checkpoints to get back on track.  I think this is probably a fair requirement given that severe network issues are either "bugs" or attacks.  Most users should be using light wallets / web wallets so it would be a service provider that is resolving these issues anyway.   


 

« Last Edit: July 15, 2015, 01:35:45 pm by bytemaster »
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline sittingduck

  • Sr. Member
  • ****
  • Posts: 246
    • View Profile
Dude.  Your posts are too wordy to follow.  I suggest you create a time line with just 5 witnesses total.   Show the two chains. 


Sent from my iPhone using Tapatalk

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
The network is "safe" against both of those attacks so long as 5% of the witnesses do not collude.  That 5% would be enough to allow the minority chain that had support of the stakeholders to elect new witnesses and eventually overtake the attackers chain which would be stuck at 95% participation assuming that in a healthy network participation averages over 95%.

I think this is a different scenario from the one I was discussing, but an interesting one nonetheless. I don't understand how the protocol allows a minority chain to recover. I understand that a minority chain could include transactions that change enough votes to vote out the other 95% of bad witnesses by the next maintenance interval, but how exactly do clients decide to switch to that chain given that it is a minority chain? What if there were multiple minority chains that elected a different set of witnesses? How do clients choose which one to adopt?

I also want to add to these questions a bit to hopefully make it more clear what I am asking here.

Let's say there are N = 100 witnesses all producing blocks and everything is good. A new round begins (which is also the second to last round of this maintenance interval) and it is ordered according to: W1, W2, ..., W100. When it is W2's turn they produce a block that has a transaction that votes out witnesses W1 and W3 to W100, and replaces them with W101 to W198 (this transaction is the deciding vote, meaning if included it will kick those witnesses out and replace them with W101 to W198). When it is W3's turn to produce a block, they build off of W1's block, not W2's block, but nevertheless include that transaction that votes out the other witnesses. When it is W4's turn to produce a block, they build off of W1's block, not that of W2 or W3, and they do not include that vote changing transaction. W5 builds off of W4's block and doesn't include the vote changing transaction as well. And similarly for W6 to W100, thus ending that round with a witness participation of 98% (since the blocks produced by W2 and W3 were not included in the majority chain and therefore considered missing).

Now at the start of the next block (which is the last round of this maintenance interval), a witness shuffle takes place in the majority chain using the accumulate random number (which I guess did not include W2's and W3's random numbers since they were not revealed in the main chain). W2 and W3 are ordered somewhere in the middle of that round and will not produce blocks that round, even if they did they would likely be skipped over by the other witnesses. The other witnesses will also not include the vote change transaction.

What about the other orphan forks by W2 and W3? What happens at the end of the previous round when 99% of their witnesses were missing blocks. Is there a witness shuffle? If so how does the algorithm work? In fact, more generally how does the random number commitment, reveal, and accumulation process work in light of missing blocks in rounds or witnesses being voted in and out in maintenance interval transitions? I presume however the witnesses are ordered in the new round, both W2 and W3 will have an opportunity to produce a block in each chain. So let's say W2 builds a block off their own block in their chain and W3 builds a block off their own block in their own chain.

Let me first get some notation out of the way.

B(W, t) means a block signed by witness W with timestamp t.
The blockchain interval is Δt.

B1 ← B2 is a blockchain subsequence that means block B2 builds off block B1.
B1 ← B2 = B(W3, t) ← B3 denotes a blockchain subsequence of three blocks where the middle block is given an alias of B2 and has a timestamp t and was signed by W3.
B(n) denotes a blockchain starting from the genesis block and ending with a block with block number n.

P(t : time, i : bounded_int([0, size(L)!)), L : [optional(witness_t)]) defines a particular blockchain subsequence given by the following pseudocode:
Code: [Select]
for all types I,
fn permuted<I>(acc : [I], indices : [I], j : uint) -> [I];
// Deterministically permutes seq using index j according to whatever blockchain protocol rules require.
// Note that with no restriction on the permutation the type of j should actually be bounded_int( [0, size(indices)!) ).
// However if the blockchain protocol rules are more restrictive than the bounds on j need to be tighter.

// For completeness here is one permutation implementation that allows any permutation of indices (no further restrictions like those needed for DPOS 2.0).
fn permuted<I>(acc : [I], seq : [I], j : uint ) -> [I]
 {
     ASSERT( j < size(seq)! );
     match size(seq) {
          0 => acc,
          m =>  let j1 = j % m in
                    let j2 = j / m in
                    let swapped_seq = swap(seq, 0, j1) in // let swapped_seq equal seq except with the values at the index 0 and j1 swapped
                    permuted<I>(append(acc, head(swapped_seq)), tail(swapped_seq), j2)
     }
}

fn helper(start : time, acc : [(time, witness_t)], seq : [optional(witness_t)]) -> [(time, witness_t)] {
     match seq {
          empty_list => acc,
          prepend(head, tail) => match head {
               Some(w : witness_t) => helper(start + Δt, append(acc, (start, w)), tail),
               None  => helper(start + 2Δt, acc, tail)
     }
}

datatype block = Block(witness_t, time)
datatype block_subsequence = Rest | Chain(block_subsequence, block)

fn helper2(L' : [(time, witness_t)], acc : block_subsequence) -> block_subsequence {
     match L' {
          prepend((head_t, head_w), tail) => helper2(tail, Chain(acc, Block(head_w, head_t))),
          _ => acc
     }
}

fn P(t : time, i : bounded_int( [0, size(L)!) ), L : [optional(witness_t)]) -> block_subsequence {
     let L' = helper(t, [], permuted<optional(witness_t)>([], L, i)) in
     helper2(L', Rest)
}
L is a sequence of witnesses, e.g. [Some(W1), Some(W2), None, Some(W5), ...], that defines the witnesses that will be participating in block signing and how many are missing. As a shorthand [Some(W1), Some(W2), None, Some(W5), ...] can be written as {W1, W2, _, W5, ...}. L' is a sequence of (time, witness_t) tuples, e.g. [(t, W2), (t+2Δt, W5), (t+3Δt, W1), ...], that defines the blocks of the blockchain subsequence returned by P, e.g. B(W2, t) ← B(W5, t+2Δt) ← B(W1, t+3Δt) ← ..., in the same order. Note that the timestamps of the blocks of the resulting blockchain subsequence will always be within the interval [t, t+(size(L)-1)*Δt].
Examples:
B(W1, t) ← P(t+Δt, 0, {W2, W3}) is equivalent to the blockchain subsequence B(W1, t) ← B(W2, t+Δt) ← B(W3, t+2Δt).
B(W1, t) ← P(t+Δt, 1, {W2, W3}) is equivalent to the blockchain subsequence B(W1, t) ← B(W3, t+Δt) ← B(W2, t+2Δt).
B(W1, t) ← P(t+Δt, 6, {W2, _, W3, W4}) is equivalent to the blockchain subsequence B(W1, t) ← B(W3, t+Δt) ← B(W2, t+2Δt) ← B(W4, t+4Δt).
B(W1, t) ← P(t+Δt, 2, {W2, _ x 5, W3}), which is the shorthand for B(W1, t) ← P(t+Δt, 6, {W2, _, _, _, _, _, W3}), is equivalent to the blockchain subsequence B(W1, t) ← B(W2, t+3Δt) ← B(W3, t+7Δt).
 
Okay, enough notation.

So by the end of this round (the last round of the maintenance interval), just after time t+199Δt, the three blockchains look like this:

Chain 1 (majority chain with length n + 196): Bbase ← Bf = B(W1, t) ← B(W4, t+3Δt) ← B(W5, t+4Δt) ← ... ← Bl,1 = B(W100, t+99Δt) ← P(t+100Δt, i1,1, {W1, _, _, W4, W5, ..., W100}),
where i1,1 is some index derived from the accumulated random value as of block Bl,1, and Bbase = B(n) for some positive integer n.

Chain 2 (minority chain with length n + 3): Bbase ← B(W1, t) ← Bl,2 = B(W2, t+Δt) ← Bv,2 = B(W2, t+k1Δt),
where 100 ≤ k1 ≤ 199 depending on the witness shuffling algorithm determined just after time t+99Δt using information by block Bl,2.

Chain 3 (minority chain with length n + 3): Bbase ← B(W1, t) ← Bl,3 = B(W3, t+2Δt) ← Bv,3 = B(W3, t+k2Δt),
where 100 ≤ k2 ≤ 199 depending on the witness shuffling algorithm determined just after time t+99Δt using information by block Bl,3.

The common fork point of all three chains is block Bf which has block number n+1.

By time t+200Δt, a new maintenance interval has started and therefore the votes have been updated and some witnesses might have been removed and replaced by others. This happens in each of the three chains despite the fact that chain 2 and chain 3 had only 1 block in the previous round, correct? Let's assume that no change of witnesses occur in chain 1. So the same witness W1 to W100 are still active in that chain, there is just a regular reordering since it is the start of a new round.

But in chain 2 and 3, we can imagine enough votes were accumulated in Bv,2 and Bv,3, respectively, that witnesses W1 and W4 to W100 were voted out and replaced by witnesses W101 to W198. Therefore, I assume that in the next round, a witness shuffling will occur for the sets {W2, W3, W101, W102, ..., W198} in both chain 2 and chain 3 (but each with their own random number used for the shuffling), correct? If so, the blockchain another 100 blocks later could end up something like this:

Chain 1 (majority chain with length n + 294): Bbase ← B(W1, t) ← B(W4, t+3Δt) ← B(W5, t+4Δt) ← ... ← B(W100, t+99Δt) ← P(t+100Δt, i1,1, {W1, _, _, W4, W5, ..., W100}) ← P(t+200Δt, i1,2, {W1, _, _, W4, W5, ..., W100}),
where i1,2 is some index that gives the "appropriate" permutation to satisfy the blockchain protocol rules for witness shuffling at this point in the blockchain.

Chain 2 (majority chain, at least for this past round since it has 91% witness participation, with length n + 94): Bbase ← B(W1, t) ← Bl,2 = B(W2, t+Δt) ← B(W2, t+k1Δt) ← P(t+200Δt, i2,2, {_ x 9, W2, W101, W102, ..., W190}).

Chain 3 (minority chain, since it only has 9% witness participation in this last round, with length n + 12): Bbase ← B(W1, t) ← Bl,3 = B(W3, t+2Δt) ← B(W3, t+k2Δt)  ← P(t+200Δt, i3,2, {_ x 91, W3, W191, W192, ..., W198}).

Here I assumed that nearly all (90 out of 98) of the newly elected witnesses chose to build on chain 2, while the rest (8 out of 98) decided to support chain 3. Notice that none of the 98 could build on chain 1 because they were active witnesses in that chain (the transactions that got them elected in were not included in chain 1).

Also notice that someone looking at just the past round would think that chain 2 is doing pretty decently since 91% of its active witnesses produced blocks in the last round (despite the fact that only 1% of witnesses produced blocks in the prior round). However, going by the longest chain metric neither chain 2 nor chain 3 win compared to chain 1, which is clearly the longest chain. Therefore, all full nodes following the protocol still stay on chain 1. If we assume that the remaining witnesses that are wasting their time on chain 3 decide to finally join chain 2, then chain 2 can theoretically achieve 100% witness participation from that point forward.

If witnesses W2, W3, W191, ..., W198 continue to always produce blocks for every round thereafter, and if witnesses W1, W4, ..., W100 continue to always produce blocks for every round as well, then the block height of chain 1 will increase by 98 each round while the block height of chain 2 will increase by 100 each round. So after r rounds, chain 1 has n1 = n + 294 + 98*r  blocks and chain 2 has n2 = n + 94 + 100*r blocks. We can see that n1 equals n2 at r = 100, or after 100 more rounds (or 10,000 more blocks in chain 2 or just before time t + 10199Δt) chain 1 and chain 2 are of equal length. In the next round afterwards, chain 2 will become the longest chain. However, by this point the common fork point between these two chains (which remember is at block Bf and block number n+1) is way more than 1000 blocks old. In fact it is at least 10094 blocks old. So none of the live full nodes will switch over to chain 2, correct?

Now some other user, Alice, runs their BitShares full node client and connects to the network. The last block they had synced in their previous session of using the client just prior to shutting down was block Bf which has block number n+1 and timestamp t+Δt. Now it is time t+10201Δt (10200 seconds, or 2.83 hours, later with Δt = 1 second) and Alice's client needs to sync the last nearly 3 hours worth of blocks. From Alice's perspective, the longest chain is chain 2 not chain 1. Her client was not online to see what happened between time t+Δt and t+10201Δt. The client has no way of knowing that everyone has decided to stick with chain 1 because the fork point was too old to trigger a blockchain reorganization. So after getting the two candidate chains, does Alice's client conclude that it must get stuck at block Bf and warn the user? At what time in the syncing process does it figure this out? Given the two chains, 1 and 2, stored locally that are in the state as I described it as of time  t+10199Δt, is the decision Alice's client makes deterministic or does it depend on when the client receives the corresponding blocks as it is syncing? Does the client keep going back and forth between chain 1 and 2 (using its undo history to constantly rewind back to Bf) as each chain alternates in winning over the other in longest length like a close horse race as blocks from each chain come in through the peer to peer network?

My understanding of how this should work is that the clients should be designed to get stuck unless they are able to sync to the head block that has a valid timestamp that gets as close as possible (relative to the time reported by NTP) without exceeding it AND where the common fork point of any other competing valid chains that occur in a block after the block that was last synced by the client by the end of the previous session is not too old to prevent reorganization. I believe that is the only time where it makes sense to keep moving forward with the longest chain with no error, and any other scenario requires the client to get stuck and warn the user (would you say this characterization of the problem is correct?).


Now, putting those questions aside for now, I think it is worth looking at a simpler scenario of just two chains and to try to determine what it takes in terms of honest witnesses and in average witness participation for the "good" chain to win in longest length prior to the chain reorganization limit being reached. So in this new scenario we will suppose that there are G good witnesses (Worig,1, ..., Worig,G) that start on the new chain as soon as they have enough votes to vote out the bad witnesses (Wbad,1, ..., Wbad,N-G), presumably because they are censoring transactions, and no one wastes time on any other minority chains. The newly elected witnesses (Wnew,1, ..., Wnew,N-G) along with the original good witnesses then produce blocks each round with an average f witness participate rate (1 - G/N < f ≤ 1). In that case we might have the following chains early in the process (just before time t + 2*N*Δt):

Bad chain (length n + 2*(N-G)): Bbase ← P(t, ib,1, {_ x G, Wbad,1, ..., Wbad,N-G}) ← P(t + N*Δt, ib,2, {_ x G, Wbad,1, ..., Wbad,N-G}).

Good chain (length n + k + G + N): Bbase ← B(Wbad,j1, t) ← B(Wbad,j2, t+Δt) ← ... ← Bfork = B(Wbad,jk, t+(k-1)*Δt) ← P(t+k*Δt, ig,1, {_ x (N-G-k), Worig,1, ..., Worig,G}) ← P(t + N*Δt, ig,2, {Worig,1, ..., Worig,G, Wnew,1, ..., Wnew,N-G}),
where 0 ≤ k ≤ N - G.

At this point the bad chain is the longer chain as long as N - k - 3G > 0. If the good witnesses get lucky (k = N - G), then the good chain will already be the longer chain by this point in time. While we cannot tell what k will be (since it depends on the witness shuffling and thus on the random values), we can calculate the probability distribution p(k) assuming uniformly random permutation. The distribution is p(k) = (G * (N-G)! * (N-k-1)! ) / ( (N-G-k)! * N! ) for 0 ≤ k ≤ N - G, and so the expected value for k should be ∑ k*p(k) from k = 0 to (N-G), which according to WolframAlpha is (N-G)/(G+1). So substituting that expression for k in the inequality and manipulating the terms a little gives the inequality 3*G - N  + 4 < 0, or N > T(G) where T(G) = 3*G + 4. When N > T(G), then at this point in time the bad chain will still be the longer chain, but if N < T(G), then at this point the good chain will have already become the longer chain. With N = 100, T(G) first becomes greater than N at G = 33.

Despite estimating the expected value for k, I will proceed with the worse case assumption that k = 0. Assuming the bad chain is still the longer chain, we can see what happens after several rounds. We should expect the bad chain to add (N-G) blocks each round, but we will only assume the good chain can add f*N blocks each round. So after r additional rounds (just before time t + (2*N + r)*Δt) the length of the bad chain is n + (2+r)*(N-G), and the length of the good chain is at least n + G + N + r*f*N. The difference between the length of the good chain and the length of the bad chain after r rounds is approximately D(r) = 3*G - N + r*(G - (1-f)*N). We can see that D(r) will be positive under the constraints (1-f)*N < G < N/2 and 0.5 < f < 1 when r > (g^2 - 2*f*g+ 2*f - 1)/(f+g-1) where g = G/N. We need D(r) to become positive (so that the good chain becomes the longest chain) before the fork block, Bfork at block number n+k, becomes older than R = 1000 blocks. This means that as soon as (2+r)*(N-G) > R, which is at about r = R/(N-G) - 2, D(r) must be positive which thus requires that r = (R/N)/(1-g) - 2 > (g^2 - 2*f*g+ 2*f - 1)/(f+g-1), or R > N*(1-g)*(2 + (g^2 - 2*f*g+ 2*f - 1)/(f+g-1)). This can also be rewritten as f > (g^2  - 2*(1-g) + R' - 1) / (R'/(1-g) - 2*(2-g)), where R' = R/N, and assuming R' > 2*(1-g)*(2-g) which given the previous inequality holds true for f > 2/3. So with f > 2/3, which means g < 1/3 (for g > 1/3 the good chain wins without any extra rounds being necessary), and the constraint that 1-g < f < 1, we find that R' > (-g^3 + g^2 - g + 1)/g. 

So with R = 1000 and N = 100 (thus R' = 10), the minimum value of g necessary to guarantee the good chain will become the longer chain prior to the chain reorganization limit being reached is approximately 0.0916. This means at least 10 good witnesses out of the total 100 witnesses are needed. And the minimum value of f needed as a function of g is given by this plot. With G = 10, the average witness participation needs to at least be 98.62%. With G = 15, we get a more sensible value of 90.8% as the minimum witness participation needed to ensure the good chain becomes the longest chain prior to the chain reorganization limit being reached.
« Last Edit: July 15, 2015, 08:11:45 pm by arhag »

Offline BTSdac

  • Hero Member
  • *****
  • Posts: 1219
    • View Profile
  • BitShares: K1


Let's say Alice has the following blockchain stored locally: B1, B2, ..., B1500.
The real consensus blockchain at time T is: B1, B2, ..., B3168.
There is a fake blockchain floating around in the network which first appeared at time T and looks like this: B1, B2, ..., B2000, B2001', B2002', ..., B3200'.
The common fork point between this fake blockchain and the real consensus chain is B2000. After this point this blockchain diverges form the real blockchain. The witnesses (which I will call "fake witnesses" from this point forward) at the time of B2000 continue to be active in the fake blockchain whereas they were all voted out in the real chain and replaced by new witnesses (which I will call "real witnesses" from this point forward) at block B2100 (start of new maintenance interval). Let's assume all the significant votes that caused the "fake witnesses" to be replaced by the "real witnesses" appeared between blocks B2001 and B2099.
This fake blockchain is a valid chain signed by witnesses according to the rules of the protocol following that chain. At time T it is longer than the real blockchain by about 32 blocks because the "fake witnesses" were able to have 100% witness participation while the "real witnesses" were only able to maintain 99% witness participation over the last 1000 blocks. Therefore, by the longest chain rule, people should prefer the fake chain over the real chain, but because the fork point is more than 1000 blocks old at the time T when the fake blockchain appeared on the network, none of the nodes online at that time would be tricked onto that fake chain (am I correct in this understanding?).

I don`t know if I understand you ,and the following is my thinking .
after fake delegates fork blockchain , on the fake chain would miss the block that should been produced by real delegates and meantime on the real chain would miss the block that should been produced by fake witnesses till  fake delegates been voted out ,
so
I resume there are 50 fake witnesses and 51 real witnesses , and if we don`t consider delegate off line and delegate order .
time                                               after C*101 blocks              after fake delegate been voted out  ,N*101 blocks
block height of fake chain              2000+50*C                         2000+50*C+50*N
block height of real chain               2000+51*C                         2000+101*C+101*N

and we can see if the fake delegates is less than real delegates ,the real chain must longer than fake chain if don`t consider delegate order .
but if we consider the delegate order .
I resume all the fake 50 delegate in the (1~50)order in first round after fork . and all the 51 delegates in the (51~101)order in this round , fork on B2000,  time from B2001 to B2050, the fake chain is the longer chain .
but the probability is very small, I think .
but on other hand , if this situation would as following
B0->    ->     ->    ->    ->    ->    ->    ->    ->    ->
B0->B1->B2->B3->B4->B5->B6->B7->B8->B9->
the first chain is a chain without block follow
so this is not fork actually .
« Last Edit: July 15, 2015, 05:01:53 am by BTSdac »
github.com :pureland
BTS2.0 API :ws://139.196.37.179:8091
BTS2.0 API 数据源ws://139.196.37.179:8091

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
A new node coming on line can get "stuck" on a fork created by 100% collusion of former witnesses and requires manual intervention of specifying a checkpoint to get off that fork.

In the event that 100% of former witnesses collude then the blockchain will require 1 checkpoint every 1000 blocks until the attackers are voted out to prevent new nodes from accidentally getting stuck on a fork created by the attackers.

Thank you for your responses.

But I am still interested in better understanding the client logic and how it determines it is stuck. When does the client determine it is stuck? If it receives a full round of 101 (fake) blocks after the (yet to be known) fork point prior to receiving 90 real blocks (signed by a subset of the same witnesses who signed the 101 fake blocks) after the fork point, does it then cause the client to realize the fork and go back and get "stuck" at the fork point (and thus warn the user it is stuck)? I presume so because the fork point is less than 1000 blocks in the past.

Now if the attacker is able to control the victims network connection for long enough (more than 1000 blocks) to prevent the real chain from appearing, the victim will continue on thinking everything is fine on the fake chain. When the attacker gives up control of the victim's network connection, the victim is then able to see all the real blocks flooding into the system. But by the this point the fork point is more than 1000 blocks older than the present, so it does not do a chain reorganization nor get stuck, correct? In fact, the client happily continues on in the fake chain thinking everything is fine until one day the victim realizes he isn't receiving transactions from a user on the real chain or that his transactions to a user on a real chain are not being received, at which point he discovers that there is problem and ends up requiring a trusted checkpoint (and probably ends up realizing he lost money from a double spend attack at that point).


In theory we could add a second metric for measuring the best blockchain via TAPOS which would be "total coindays approved".

It sounds like another interesting statistic that can be learned from the blockchain but I'm not very interested in that as part of the protocol because it makes it harder to reason about and I worry it can potentially add new attack vectors. In the case of a stuck blockchain, we could have each specific user's full client (this problem doesn't even apply for lightweight clients since those depend on trust in the wallet provider anyway) offer to reindex the chain from the last trusted checkpoint to calculate the total coindays approved since then and also other valuable statistics, rather than burdening that task (as small as it may be) on all full nodes by making it part of the protocol. The user could then use this information to pick a particular chain themselves if they wanted, although it would still be better for them to just rely on their social network for a trusted checkpoint in these rare cases.

Offline bytemaster

In theory we could add a second metric for measuring the best blockchain via TAPOS which would be "total coindays approved".

Every time an account signs a transaction with a TAPOS header then we can perform the following calculation:

CHAIN_COINDAYS_APPROVED  +=   ACCOUNT_BALANCE * (CURRENT_TAPOS_UPDATE - LAST_TAPOS_UPDATE)

Under this model the colluding witnesses would require both the longest chain *AND* the highest CHAIN_COINDAYS_APPROVED.   

This metric would be relatively cheap to compute, but is still an extra computation on EVERY transaction.
For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline bytemaster

A new node coming on line can get "stuck" on a fork created by 100% collusion of former witnesses and requires manual intervention of specifying a checkpoint to get off that fork.

In the event that 100% of former witnesses collude then the blockchain will require 1 checkpoint every 1000 blocks until the attackers are voted out to prevent new nodes from accidentally getting stuck on a fork created by the attackers.

For the latest updates checkout my blog: http://bytemaster.bitshares.org
Anything said on these forums does not constitute an intent to create a legal obligation or contract between myself and anyone else.   These are merely my opinions and I reserve the right to change them at any time.

Offline sittingduck

  • Sr. Member
  • ****
  • Posts: 246
    • View Profile
If there is an attack chain in the wild then a checkpoint will have to be added.   You seem to grasp the issue very well.




Sent from my iPhone using Tapatalk

Offline arhag

  • Hero Member
  • *****
  • Posts: 1214
    • View Profile
    • My posts on Steem
  • BitShares: arhag
  • GitHub: arhag
If the are preventing people from voting then the minority needs to stop producing and users need to stop transacting on majority chain.  Everyone would pick a minority fork and vote out the bad witnesses.  At this point the minority fork would have 100% block production and the old majority fork will still be stumbling along at 50 to 90% corrupt-witness participation.   Eventually the minority fork will overtake it and become the longest chain.

Everyone "picking" the minority fork is not an automatic process done by code. It needs to be selected by each user presumably through a set of blacklisted witnesses that everyone else has agreed to use (to make sure everyone gets on the same fork). If that is the case, why does it even matter if the "minority fork" ever becomes longer than the original "majority chain". It just needs to vote out the bad witnesses so that it can back to normal high witness participation.


Technically it is possible for the majority of stakeholders to collude to fracture the network and that is what most of your "attacks" are assuming: that the same users will vote differently on different forks and that they will not work to reach consensus.    I think this is not a realistic case.

I'm not trying to argue what the probability of any particular event occurring is. I am not trying to say there aren't ways of resolving issues that arise in the rare events that they do. All I am trying to understand at this stage is all the possible ways attacks can force the system into a state that requires manual action (such as relying on social networks to gain consensus) because the protocol cannot automatically resolve it.

For example, you still haven't answered what happens with Alice in the situation described in my earlier post. My assumption from what you have said so far is that her client will pick the fake chain because it is longer without giving her any warning that something weird is going on with the blockchain. Is that correct or would the client recognize the two possible blockchains forking at block B2000? It is confusing because if the fake chain was released into the wild after Alice already synced, I presume she would stay on the real chain because of the 1000 block limit of the undo history. So it seems the results of which chain her client ends up on seem to be dependent on timing of when her client syncs relative to when blocks are received from the network (is that correct?). Again, I'm not saying this is necessarily an attack, I just want to understand what happens, what the client is aware of, and what the client warns to the user.

Let me rephrase the problem. Let's say I was an active witness 3 months ago along with 100 of my friends. By now, we have all been voted out and replaced by 101 new witnesses that are doing a good job but on average have a 99% witness participation rate. My friends and I decide to all get together to produce a fake blockchain forking off from the point where we were all active witnesses 3 months ago to the present, and we distribute this fake blockchain on the peer-to-peer network. What does this do to the full nodes currently online and active on the network (I would guess nothing)? Now what if someone with a full node that hasn't synced with the network for 4 months decides to finally resync. What happens to that full node?
« Last Edit: July 14, 2015, 02:01:54 am by arhag »