Abstract
The technique for reducing the length of the data dependence path is presented. This technique, named tunneling-load, utilizes the register specifier buffer in order to hide the load latency, and thus reduces the length of the data dependence path. True data dependences can not be removed by any techniques such as register renaming, and are the unavoidable obstacle limiting the instruction level parallelism. The length of the data dependence path including the load instructions is longer than those of other instructions, because the latency of the load instruction is longer than those of other instructions. In order to reduce the dependence path length including the load instructions, we propose the tunneling-load technique. We have evaluated the effects of the tunneling-load, and found that in an in-order-issue superscalar plat-form the instruction level parallelism is increased by over 10%.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
T.M.Austin et al., “Streamlining data cache access with fast address calculation”, Proc. of ISCA22, pp.369–380, 1995.
T.M.Austin et al., “Zero-cycle loads: microarchitecture support for reducing load latency”, Proc. of MICRO28, pp.82–92, 1995.
D.Burger et al., “Evaluating future microprocessors: the SimpleScalar tool set”, Technical Report CS-TR-96-1308, University of Wisconsin Madison, July 1996.
A.Capitanio et al., “Partitioned register files for VLIWs: a preliminary analysis of tradeoffs”, Proc. of MICRO25, pp.292–300, 1992.
P.P.Chang et al., “IMPACT: an architectural framework for multiple-instruction-issue processors”, Proc. of ISCA18, pp.266–275, 1991.
P.P.Chang et al., “Comparing static and dynamic code scheduling for multiple-instruction-issue processors”, Proc. of MICRO24, pp.25–33, 1991.
T-F.Chen et al., “Effective hardware-based data prefetching for high-performance processors”, IEEE Trans. Computers, vol.44, no.5, pp.609–623, May 1995.
R.J.Eickemeyer et al., “A load-instruction unit for pipelined processors”, IBM J. Res. Develop., Vol.37, No.4, pp.547–564, July 1993.
K.I.Farkas et al., “Complexity /performance tradeoffs with non-blocking loads”, Proc. of ISCA21, pp.211–222, 1994.
M.Franklin et al., “Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors”, Proc. of MICRO25, pp.236–245, 1992.
M.Golden et al., “Hardware support for hiding cache latency”, Technical Report CSE-TR-152-93, University of Michigan, Feb. 1993.
R.M.Keller, “Look-ahead processors”, ACM Computing Surveys, vol.7, No.4, pp.177–195, Dec. 1975.
S.McFarling, “Combining branch predictors”, WRL Technical Note TN-36, Digital Western Research Laboratory, 1993.
T.C.Mowry et al., “Design and evaluation of a compiler algorithm for prefetching”, Proc. of ASPLOS V, pp.62–73, 1992.
T.Sato et al., “Hiding data cache latency with load address prediction”, IEICE Trans. Inf. & Syst., vol.E79-D, no.11, pp.1523–1532, Nov. 1996.
T.Sato, “Data dependence speculation combining memory disambiguation with address prediction”, Proc. of SWoPP'97 (IPSJ SIG Notes, Aug. 1997.
S.Wallace et al., “A scalable register file architecture for dynamically scheduled processors”, Proc. of PACT'96, pp.179–184, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sato, T. (1997). Data dependence path reduction with tunneling load instructions. In: Polychronopoulos, C., Joe, K., Araki, K., Amamiya, M. (eds) High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024210
Download citation
DOI: https://doi.org/10.1007/BFb0024210
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63766-0
Online ISBN: 978-3-540-69644-5
eBook Packages: Springer Book Archive

