Abstract
The termination detection problem involves detecting whether an ongoing distributed computation has ceased all its activities. We investigate the termination detection problem in an asynchronous distributed system under crash-recovery model. It has been shown that the problem is impossible to solve under crash-recovery model in general. We identify two conditions under which the termination detection problem can be solved in a safe manner. We also propose algorithms to detect termination under the conditions identified.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tel, G.: Distributed Control for AI. Technical Report UU-CS-1998-17, Information and Computing Sciences, Utrecht University, The Netherlands (1998)
Dijkstra, E.W., Scholten, C.S.: Termination Detection for Diffusing Computations. Information Processing Letters (IPL) 11(1), 1–4 (1980)
Francez, N.: Distributed Termination. ACM Transactions on Programming Languages and Systems (TOPLAS) 2(1), 42–55 (1980)
Stupp, G.: Stateless Termination Detection. In: Proceedings of the 16th Symposium on Distributed Computing (DISC), Toulouse, France, pp. 163–172 (2002)
Khokhar, A.A., Hambrusch, S.E., Kocalar, E.: Termination Detection in Data-Driven Parallel Computations/Applications. Journal of Parallel and Distributed Computing (JPDC) 63(3), 312–326 (2003)
Mittal, N., Venkatesan, S., Peri, S.: Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies. In: Proceedings of the 18th Symposium on Distributed Computing (DISC), Amsterdam, The Netherlands, pp. 290–304 (2004)
Matocha, J., Camp, T.: A Taxonomy of Distributed Termination Detection Algorithms. The Journal of Systems and Software 43(3), 207–221 (1998)
Venkatesan, S.: Reliable Protocols for Distributed Termination Detection. IEEE Transactions on Reliability 38(1), 103–110 (1989)
Lai, T.H., Wu, L.F.: An (N − 1)-Resilient Algorithm for Distributed Termination Detection. IEEE Transactions on Parallel and Distributed Systems (TPDS) 6(1), 63–78 (1995)
Tseng, Y.C.: Detecting Termination by Weight-Throwing in a Faulty Distributed System. Journal of Parallel and Distributed Computing (JPDC) 25(1), 7–15 (1995)
Hélary, J.M., Murfin, M., Mostefaoui, A., Raynal, M., Tronel, F.: Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors. IEEE Transactions on Parallel and Distributed Systems (TPDS) 11(9), 897–909 (2000)
Mittal, N., Freiling, F.C., Venkatesan, S., Penso, L.D.: Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System. In: Proceedings of the 19th Symposium on Distributed Computing (DISC), pp. 93–107 (2005)
Wu, L.F., Lai, T.H., Tseng, Y.C.: Consensus and Termination Detection in the Presence of Faulty Processes. In: Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, Taiwan, pp. 267–274 (1992)
Majuntke, M.: Termination Detection in Systems Where Processes May Crash and Recover. Master’s thesis, RWTH Aachen University (2006)
Mittal, N., Phaneesh, K.L., Freiling, F.C.: Safe Termination Detection in an Asynchronous Distributed System when Processes may Crash and Recover. Technical Report UTDCS-41-06, Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA (2006)
Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM (CACM) 21(7), 558–565 (1978)
Aguilera, M.K., Chen, W., Toueg, S.: Failure Detection and Consensus in the Crash Recovery Model. Distributed Computing (DC) 13(2), 99–125 (2000)
Basu, A., Charron-Bost, B., Toueg, S.: Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes. In: Babaoğlu, Ö., Marzullo, K. (eds.) WDAG 1996. LNCS, vol. 1151, pp. 105–122. Springer, Heidelberg (1996)
Chandra, T.D., Toueg, S.: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM 43(2), 225–267 (1996)
Larrea, M., Fernández, A., Arévalo, S.: On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems. IEEE Transactions on Computers 53(7), 815–828 (2004)
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: A Realistic Look At Failure Detectors. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN), Washington, DC, USA, pp. 345–353 (2002)
Mattern, F.: Virtual Time and Global States of Distributed Systems. In: Bermond, J.-C., Raynal, M. (eds.) WDAG 1989. LNCS, vol. 392, pp. 215–226. Springer, Heidelberg (1989)
Fidge, C.J.: Logical Time in Distributed Computing Systems. IEEE Computer 24(8), 28–33 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mittal, N., Phaneesh, K.L., Freiling, F.C. (2006). Safe Termination Detection in an Asynchronous Distributed System When Processes May Crash and Recover. In: Shvartsman, M.M.A.A. (eds) Principles of Distributed Systems. OPODIS 2006. Lecture Notes in Computer Science, vol 4305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11945529_10
Download citation
DOI: https://doi.org/10.1007/11945529_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49990-9
Online ISBN: 978-3-540-49991-6
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.