Skip to main content

Data dependence path reduction with tunneling load instructions

  • II System Architecture
  • Conference paper
  • First Online:
High Performance Computing (ISHPC 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1336))

Included in the following conference series:

  • 103 Accesses

  • 1 Citation

Abstract

The technique for reducing the length of the data dependence path is presented. This technique, named tunneling-load, utilizes the register specifier buffer in order to hide the load latency, and thus reduces the length of the data dependence path. True data dependences can not be removed by any techniques such as register renaming, and are the unavoidable obstacle limiting the instruction level parallelism. The length of the data dependence path including the load instructions is longer than those of other instructions, because the latency of the load instruction is longer than those of other instructions. In order to reduce the dependence path length including the load instructions, we propose the tunneling-load technique. We have evaluated the effects of the tunneling-load, and found that in an in-order-issue superscalar plat-form the instruction level parallelism is increased by over 10%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. T.M.Austin et al., “Streamlining data cache access with fast address calculation”, Proc. of ISCA22, pp.369–380, 1995.

    Google Scholar 

  2. T.M.Austin et al., “Zero-cycle loads: microarchitecture support for reducing load latency”, Proc. of MICRO28, pp.82–92, 1995.

    Google Scholar 

  3. D.Burger et al., “Evaluating future microprocessors: the SimpleScalar tool set”, Technical Report CS-TR-96-1308, University of Wisconsin Madison, July 1996.

    Google Scholar 

  4. A.Capitanio et al., “Partitioned register files for VLIWs: a preliminary analysis of tradeoffs”, Proc. of MICRO25, pp.292–300, 1992.

    Google Scholar 

  5. P.P.Chang et al., “IMPACT: an architectural framework for multiple-instruction-issue processors”, Proc. of ISCA18, pp.266–275, 1991.

    Google Scholar 

  6. P.P.Chang et al., “Comparing static and dynamic code scheduling for multiple-instruction-issue processors”, Proc. of MICRO24, pp.25–33, 1991.

    Google Scholar 

  7. T-F.Chen et al., “Effective hardware-based data prefetching for high-performance processors”, IEEE Trans. Computers, vol.44, no.5, pp.609–623, May 1995.

    Article  Google Scholar 

  8. R.J.Eickemeyer et al., “A load-instruction unit for pipelined processors”, IBM J. Res. Develop., Vol.37, No.4, pp.547–564, July 1993.

    Google Scholar 

  9. K.I.Farkas et al., “Complexity /performance tradeoffs with non-blocking loads”, Proc. of ISCA21, pp.211–222, 1994.

    Google Scholar 

  10. M.Franklin et al., “Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors”, Proc. of MICRO25, pp.236–245, 1992.

    Google Scholar 

  11. M.Golden et al., “Hardware support for hiding cache latency”, Technical Report CSE-TR-152-93, University of Michigan, Feb. 1993.

    Google Scholar 

  12. R.M.Keller, “Look-ahead processors”, ACM Computing Surveys, vol.7, No.4, pp.177–195, Dec. 1975.

    Article  Google Scholar 

  13. S.McFarling, “Combining branch predictors”, WRL Technical Note TN-36, Digital Western Research Laboratory, 1993.

    Google Scholar 

  14. T.C.Mowry et al., “Design and evaluation of a compiler algorithm for prefetching”, Proc. of ASPLOS V, pp.62–73, 1992.

    Google Scholar 

  15. T.Sato et al., “Hiding data cache latency with load address prediction”, IEICE Trans. Inf. & Syst., vol.E79-D, no.11, pp.1523–1532, Nov. 1996.

    Google Scholar 

  16. T.Sato, “Data dependence speculation combining memory disambiguation with address prediction”, Proc. of SWoPP'97 (IPSJ SIG Notes, Aug. 1997.

    Google Scholar 

  17. S.Wallace et al., “A scalable register file architecture for dynamically scheduled processors”, Proc. of PACT'96, pp.179–184, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Constantine Polychronopoulos Kazuki Joe Keijiro Araki Makoto Amamiya

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sato, T. (1997). Data dependence path reduction with tunneling load instructions. In: Polychronopoulos, C., Joe, K., Araki, K., Amamiya, M. (eds) High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024210

Download citation

  • DOI: https://doi.org/10.1007/BFb0024210

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63766-0

  • Online ISBN: 978-3-540-69644-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics