Safe Termination Detection in an Asynchronous Distributed System When Processes May Crash and Recover

Mittal, Neeraj; Phaneesh, Kuppahalli L.; Freiling, Felix C.

doi:10.1007/11945529_10

Neeraj Mittal¹⁷,
Kuppahalli L. Phaneesh¹⁷ &
Felix C. Freiling¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4305))

Included in the following conference series:

International Conference On Principles Of Distributed Systems

635 Accesses
5 Citations

Abstract

The termination detection problem involves detecting whether an ongoing distributed computation has ceased all its activities. We investigate the termination detection problem in an asynchronous distributed system under crash-recovery model. It has been shown that the problem is impossible to solve under crash-recovery model in general. We identify two conditions under which the termination detection problem can be solved in a safe manner. We also propose algorithms to detect termination under the conditions identified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Asynchronous Crash Fault Tolerance

Resilient Optimistic Termination Detection for the Async-Finish Model

Improving Simplicity by Discovering Nested Groups in Declarative Models

References

Tel, G.: Distributed Control for AI. Technical Report UU-CS-1998-17, Information and Computing Sciences, Utrecht University, The Netherlands (1998)
Google Scholar
Dijkstra, E.W., Scholten, C.S.: Termination Detection for Diffusing Computations. Information Processing Letters (IPL) 11(1), 1–4 (1980)
Article MATH MathSciNet Google Scholar
Francez, N.: Distributed Termination. ACM Transactions on Programming Languages and Systems (TOPLAS) 2(1), 42–55 (1980)
Article MATH Google Scholar
Stupp, G.: Stateless Termination Detection. In: Proceedings of the 16th Symposium on Distributed Computing (DISC), Toulouse, France, pp. 163–172 (2002)
Google Scholar
Khokhar, A.A., Hambrusch, S.E., Kocalar, E.: Termination Detection in Data-Driven Parallel Computations/Applications. Journal of Parallel and Distributed Computing (JPDC) 63(3), 312–326 (2003)
Article MATH Google Scholar
Mittal, N., Venkatesan, S., Peri, S.: Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies. In: Proceedings of the 18th Symposium on Distributed Computing (DISC), Amsterdam, The Netherlands, pp. 290–304 (2004)
Google Scholar
Matocha, J., Camp, T.: A Taxonomy of Distributed Termination Detection Algorithms. The Journal of Systems and Software 43(3), 207–221 (1998)
Article Google Scholar
Venkatesan, S.: Reliable Protocols for Distributed Termination Detection. IEEE Transactions on Reliability 38(1), 103–110 (1989)
Article Google Scholar
Lai, T.H., Wu, L.F.: An (N − 1)-Resilient Algorithm for Distributed Termination Detection. IEEE Transactions on Parallel and Distributed Systems (TPDS) 6(1), 63–78 (1995)
Article Google Scholar
Tseng, Y.C.: Detecting Termination by Weight-Throwing in a Faulty Distributed System. Journal of Parallel and Distributed Computing (JPDC) 25(1), 7–15 (1995)
Article Google Scholar
Hélary, J.M., Murfin, M., Mostefaoui, A., Raynal, M., Tronel, F.: Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors. IEEE Transactions on Parallel and Distributed Systems (TPDS) 11(9), 897–909 (2000)
Article Google Scholar
Mittal, N., Freiling, F.C., Venkatesan, S., Penso, L.D.: Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System. In: Proceedings of the 19th Symposium on Distributed Computing (DISC), pp. 93–107 (2005)
Google Scholar
Wu, L.F., Lai, T.H., Tseng, Y.C.: Consensus and Termination Detection in the Presence of Faulty Processes. In: Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, Taiwan, pp. 267–274 (1992)
Google Scholar
Majuntke, M.: Termination Detection in Systems Where Processes May Crash and Recover. Master’s thesis, RWTH Aachen University (2006)
Google Scholar
Mittal, N., Phaneesh, K.L., Freiling, F.C.: Safe Termination Detection in an Asynchronous Distributed System when Processes may Crash and Recover. Technical Report UTDCS-41-06, Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA (2006)
Google Scholar
Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM (CACM) 21(7), 558–565 (1978)
Article MATH Google Scholar
Aguilera, M.K., Chen, W., Toueg, S.: Failure Detection and Consensus in the Crash Recovery Model. Distributed Computing (DC) 13(2), 99–125 (2000)
Article Google Scholar
Basu, A., Charron-Bost, B., Toueg, S.: Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes. In: Babaoğlu, Ö., Marzullo, K. (eds.) WDAG 1996. LNCS, vol. 1151, pp. 105–122. Springer, Heidelberg (1996)
Google Scholar
Chandra, T.D., Toueg, S.: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM 43(2), 225–267 (1996)
Article MATH MathSciNet Google Scholar
Larrea, M., Fernández, A., Arévalo, S.: On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems. IEEE Transactions on Computers 53(7), 815–828 (2004)
Article Google Scholar
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: A Realistic Look At Failure Detectors. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN), Washington, DC, USA, pp. 345–353 (2002)
Google Scholar
Mattern, F.: Virtual Time and Global States of Distributed Systems. In: Bermond, J.-C., Raynal, M. (eds.) WDAG 1989. LNCS, vol. 392, pp. 215–226. Springer, Heidelberg (1989)
Google Scholar
Fidge, C.J.: Logical Time in Distributed Computing Systems. IEEE Computer 24(8), 28–33 (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Texas at Dallas, Richardson, TX, 75083, USA
Neeraj Mittal & Kuppahalli L. Phaneesh
Department of Computer Science, University of Mannheim, D-68131, Mannheim, Germany
Felix C. Freiling

Authors

Neeraj Mittal
View author publications
Search author on:PubMed Google Scholar
Kuppahalli L. Phaneesh
View author publications
Search author on:PubMed Google Scholar
Felix C. Freiling
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Artificial Intelligence Laboratory, MIT, USA
Mariam Momenzadeh Alexander A. Shvartsman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mittal, N., Phaneesh, K.L., Freiling, F.C. (2006). Safe Termination Detection in an Asynchronous Distributed System When Processes May Crash and Recover. In: Shvartsman, M.M.A.A. (eds) Principles of Distributed Systems. OPODIS 2006. Lecture Notes in Computer Science, vol 4305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11945529_10

Download citation

DOI: https://doi.org/10.1007/11945529_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49990-9
Online ISBN: 978-3-540-49991-6
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics