Fault­Tolerant Systems

1. With respect to Byzantine Agreement (BA) answer the following questions: (a) Why does simple majority not solve the agreement problem? (b) In general, what is the total number of messages sent using the OM(m) algorithm. “In general” means of course that your answer should be a function of only N and m, and not an “example”. However, to test your general formula, compute the number of messages for m = 2, 3 and 4. i. For the simplex solution ii. For the multiplex solution (c) The OM(m) algorithm does not work for partially connected algorithms. What is the general problem with achieving BA in partially connected networks? (d) In Lamport’s paper the SM(m) algorithms is discussed. State the assumptions and indicate if/how one can meet these assumptions.

2. We mentioned in class that the algorithm proposed by Thambidurai and Park has been shown to have a flaw. Describe the what the flaw is, which is obviously not referenced in their own paper.

3. We have indicated how the Byzantine General problem started, i.e. there was a problem in SIFT that allowed a malicious processor to prevent the non-faulty processors from converging their clocks. Describe how this could happen, i.e. give the example that allows the clocks of the good processor to diverge.

4. With respect clock synchronization: (a) What is the relationship between F IXi p (T), AI and adji p ? (b) Why would it be important to deal with an adjustment interval. Give an example of an application which would cause problems if one would not use adjustment intervals. (c) Given the convergence function CF(p, x1, x2, …, xn) = CF(1, 2, 10, 6, 3, 5, 5, 2, 11, 6, 10, 2, 6), δ = k = 3, what are the results for the 4 convergence functions of page 11 of the Schneider paper Understanding Protocols for Byzantine Clock Synchronization? Note that the first number in CF is the processor, and NOT a value. i. Egocentric: ii. Fast convergence: iii. FT Midpoint: iv. FT Average:

5. Consider the Probabilistic Approach to Distributed Clock Synchronization presented by F. Cristian. (a) When giving the final expression for QP Q, (which is “P’s reading of Q’s clock”), why did he select the midpoint in the clock interval, e.g. rather than the right or left value of the interval? (b) In the paper it indicates that “a process can read the clock of another process with a given precision with a probability as close to 1 as desired”. Explain how this is possible. Hint: Look no further than assumptions made about roundtrip delays. (c) Let CQ(t) ∈ [T +min, T +2D −min] describe the interval for clock values under the assumption that there is no clock drift at all, based on the interval from the paper by F. Cristian. Given this new inerval CQ(t), derive an expression for the clock value that should be selected to minimize possible errors.