Skip to main content

Safe Termination Detection in an Asynchronous Distributed System When Processes May Crash and Recover

  • Conference paper
Principles of Distributed Systems (OPODIS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4305))

Included in the following conference series:

  • 635 Accesses

  • 5 Citations

Abstract

The termination detection problem involves detecting whether an ongoing distributed computation has ceased all its activities. We investigate the termination detection problem in an asynchronous distributed system under crash-recovery model. It has been shown that the problem is impossible to solve under crash-recovery model in general. We identify two conditions under which the termination detection problem can be solved in a safe manner. We also propose algorithms to detect termination under the conditions identified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Tel, G.: Distributed Control for AI. Technical Report UU-CS-1998-17, Information and Computing Sciences, Utrecht University, The Netherlands (1998)

    Google Scholar 

  2. Dijkstra, E.W., Scholten, C.S.: Termination Detection for Diffusing Computations. Information Processing Letters (IPL) 11(1), 1–4 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  3. Francez, N.: Distributed Termination. ACM Transactions on Programming Languages and Systems (TOPLAS) 2(1), 42–55 (1980)

    Article  MATH  Google Scholar 

  4. Stupp, G.: Stateless Termination Detection. In: Proceedings of the 16th Symposium on Distributed Computing (DISC), Toulouse, France, pp. 163–172 (2002)

    Google Scholar 

  5. Khokhar, A.A., Hambrusch, S.E., Kocalar, E.: Termination Detection in Data-Driven Parallel Computations/Applications. Journal of Parallel and Distributed Computing (JPDC) 63(3), 312–326 (2003)

    Article  MATH  Google Scholar 

  6. Mittal, N., Venkatesan, S., Peri, S.: Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies. In: Proceedings of the 18th Symposium on Distributed Computing (DISC), Amsterdam, The Netherlands, pp. 290–304 (2004)

    Google Scholar 

  7. Matocha, J., Camp, T.: A Taxonomy of Distributed Termination Detection Algorithms. The Journal of Systems and Software 43(3), 207–221 (1998)

    Article  Google Scholar 

  8. Venkatesan, S.: Reliable Protocols for Distributed Termination Detection. IEEE Transactions on Reliability 38(1), 103–110 (1989)

    Article  Google Scholar 

  9. Lai, T.H., Wu, L.F.: An (N − 1)-Resilient Algorithm for Distributed Termination Detection. IEEE Transactions on Parallel and Distributed Systems (TPDS) 6(1), 63–78 (1995)

    Article  Google Scholar 

  10. Tseng, Y.C.: Detecting Termination by Weight-Throwing in a Faulty Distributed System. Journal of Parallel and Distributed Computing (JPDC) 25(1), 7–15 (1995)

    Article  Google Scholar 

  11. Hélary, J.M., Murfin, M., Mostefaoui, A., Raynal, M., Tronel, F.: Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors. IEEE Transactions on Parallel and Distributed Systems (TPDS) 11(9), 897–909 (2000)

    Article  Google Scholar 

  12. Mittal, N., Freiling, F.C., Venkatesan, S., Penso, L.D.: Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System. In: Proceedings of the 19th Symposium on Distributed Computing (DISC), pp. 93–107 (2005)

    Google Scholar 

  13. Wu, L.F., Lai, T.H., Tseng, Y.C.: Consensus and Termination Detection in the Presence of Faulty Processes. In: Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, Taiwan, pp. 267–274 (1992)

    Google Scholar 

  14. Majuntke, M.: Termination Detection in Systems Where Processes May Crash and Recover. Master’s thesis, RWTH Aachen University (2006)

    Google Scholar 

  15. Mittal, N., Phaneesh, K.L., Freiling, F.C.: Safe Termination Detection in an Asynchronous Distributed System when Processes may Crash and Recover. Technical Report UTDCS-41-06, Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA (2006)

    Google Scholar 

  16. Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM (CACM) 21(7), 558–565 (1978)

    Article  MATH  Google Scholar 

  17. Aguilera, M.K., Chen, W., Toueg, S.: Failure Detection and Consensus in the Crash Recovery Model. Distributed Computing (DC) 13(2), 99–125 (2000)

    Article  Google Scholar 

  18. Basu, A., Charron-Bost, B., Toueg, S.: Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes. In: Babaoğlu, Ö., Marzullo, K. (eds.) WDAG 1996. LNCS, vol. 1151, pp. 105–122. Springer, Heidelberg (1996)

    Google Scholar 

  19. Chandra, T.D., Toueg, S.: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM 43(2), 225–267 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  20. Larrea, M., Fernández, A., Arévalo, S.: On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems. IEEE Transactions on Computers 53(7), 815–828 (2004)

    Article  Google Scholar 

  21. Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: A Realistic Look At Failure Detectors. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN), Washington, DC, USA, pp. 345–353 (2002)

    Google Scholar 

  22. Mattern, F.: Virtual Time and Global States of Distributed Systems. In: Bermond, J.-C., Raynal, M. (eds.) WDAG 1989. LNCS, vol. 392, pp. 215–226. Springer, Heidelberg (1989)

    Google Scholar 

  23. Fidge, C.J.: Logical Time in Distributed Computing Systems. IEEE Computer 24(8), 28–33 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mittal, N., Phaneesh, K.L., Freiling, F.C. (2006). Safe Termination Detection in an Asynchronous Distributed System When Processes May Crash and Recover. In: Shvartsman, M.M.A.A. (eds) Principles of Distributed Systems. OPODIS 2006. Lecture Notes in Computer Science, vol 4305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11945529_10

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics