Data dependence path reduction with tunneling load instructions

Sato, Toshinori

doi:10.1007/BFb0024210

Toshinori Sato¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1336))

Included in the following conference series:

International Symposium on High Performance Computing

103 Accesses
1 Citation

Abstract

The technique for reducing the length of the data dependence path is presented. This technique, named tunneling-load, utilizes the register specifier buffer in order to hide the load latency, and thus reduces the length of the data dependence path. True data dependences can not be removed by any techniques such as register renaming, and are the unavoidable obstacle limiting the instruction level parallelism. The length of the data dependence path including the load instructions is longer than those of other instructions, because the latency of the load instruction is longer than those of other instructions. In order to reduce the dependence path length including the load instructions, we propose the tunneling-load technique. We have evaluated the effects of the tunneling-load, and found that in an in-order-issue superscalar plat-form the instruction level parallelism is increased by over 10%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Methodical Approach to Parallel IO Analysis in Distributed Deep Learning Applications

Dynamically Changing Parallelism with Asynchronous Sequential Data Flows

Article 01 December 2021

A Minimally Intrusive Low-Memory Approach to Resilience for Existing Transient Solvers

Article Open access 12 July 2018

References

T.M.Austin et al., “Streamlining data cache access with fast address calculation”, Proc. of ISCA22, pp.369–380, 1995.
Google Scholar
T.M.Austin et al., “Zero-cycle loads: microarchitecture support for reducing load latency”, Proc. of MICRO28, pp.82–92, 1995.
Google Scholar
D.Burger et al., “Evaluating future microprocessors: the SimpleScalar tool set”, Technical Report CS-TR-96-1308, University of Wisconsin Madison, July 1996.
Google Scholar
A.Capitanio et al., “Partitioned register files for VLIWs: a preliminary analysis of tradeoffs”, Proc. of MICRO25, pp.292–300, 1992.
Google Scholar
P.P.Chang et al., “IMPACT: an architectural framework for multiple-instruction-issue processors”, Proc. of ISCA18, pp.266–275, 1991.
Google Scholar
P.P.Chang et al., “Comparing static and dynamic code scheduling for multiple-instruction-issue processors”, Proc. of MICRO24, pp.25–33, 1991.
Google Scholar
T-F.Chen et al., “Effective hardware-based data prefetching for high-performance processors”, IEEE Trans. Computers, vol.44, no.5, pp.609–623, May 1995.
Article Google Scholar
R.J.Eickemeyer et al., “A load-instruction unit for pipelined processors”, IBM J. Res. Develop., Vol.37, No.4, pp.547–564, July 1993.
Google Scholar
K.I.Farkas et al., “Complexity /performance tradeoffs with non-blocking loads”, Proc. of ISCA21, pp.211–222, 1994.
Google Scholar
M.Franklin et al., “Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors”, Proc. of MICRO25, pp.236–245, 1992.
Google Scholar
M.Golden et al., “Hardware support for hiding cache latency”, Technical Report CSE-TR-152-93, University of Michigan, Feb. 1993.
Google Scholar
R.M.Keller, “Look-ahead processors”, ACM Computing Surveys, vol.7, No.4, pp.177–195, Dec. 1975.
Article Google Scholar
S.McFarling, “Combining branch predictors”, WRL Technical Note TN-36, Digital Western Research Laboratory, 1993.
Google Scholar
T.C.Mowry et al., “Design and evaluation of a compiler algorithm for prefetching”, Proc. of ASPLOS V, pp.62–73, 1992.
Google Scholar
T.Sato et al., “Hiding data cache latency with load address prediction”, IEICE Trans. Inf. & Syst., vol.E79-D, no.11, pp.1523–1532, Nov. 1996.
Google Scholar
T.Sato, “Data dependence speculation combining memory disambiguation with address prediction”, Proc. of SWoPP'97 (IPSJ SIG Notes, Aug. 1997.
Google Scholar
S.Wallace et al., “A scalable register file architecture for dynamically scheduled processors”, Proc. of PACT'96, pp.179–184, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Microelectronics Engineering Lab., Toshiba Corp., 210, Kawasaki, Japan
Toshinori Sato

Authors

Toshinori Sato
View author publications
Search author on:PubMed Google Scholar

Editor information

Constantine Polychronopoulos Kazuki Joe Keijiro Araki Makoto Amamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sato, T. (1997). Data dependence path reduction with tunneling load instructions. In: Polychronopoulos, C., Joe, K., Araki, K., Amamiya, M. (eds) High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024210

Download citation

DOI: https://doi.org/10.1007/BFb0024210
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63766-0
Online ISBN: 978-3-540-69644-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics