{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T19:33:59Z","timestamp":1774121639914,"version":"3.50.1"},"reference-count":76,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T00:00:00Z","timestamp":1675123200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"MUR projects \u201cDSURF\u201d","award":["PRIN 2015B8TRFM"],"award-info":[{"award-number":["PRIN 2015B8TRFM"]}]},{"name":"\u201cT-LADIES\u201d","award":["PRIN 2020TL3X8X"],"award-info":[{"award-number":["PRIN 2020TL3X8X"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2023,1,31]]},"abstract":"<jats:p>We introduce a novel approach to source code representation to be used in combination with neural networks. Such a representation is designed to permit the production of a continuous vector for each code statement. In particular, we present how the representation is produced in the case of Java source code. We test our representation for three tasks:<jats:italic>code summarization<\/jats:italic>,<jats:italic>statement separation<\/jats:italic>, and<jats:italic>code search<\/jats:italic>. We compare with the state-of-the-art<jats:italic>non-autoregressive<\/jats:italic>and<jats:italic>end-to-end<\/jats:italic>models for these tasks. We conclude that all tasks benefit from the proposed representation to boost their performance in terms of F1-score, accuracy, and mean reciprocal rank, respectively. Moreover, we show how models trained on code summarization and models trained on statement separation can be combined to address methods with tangled responsibilities, meaning that these models can be used to detect code misconduct.<\/jats:p>","DOI":"10.1145\/3514232","type":"journal-article","created":{"date-parts":[[2022,4,6]],"date-time":"2022-04-06T10:04:10Z","timestamp":1649239450000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Fold2Vec: Towards a Statement-Based Representation of Code for Code Comprehension"],"prefix":"10.1145","volume":"32","author":[{"given":"Francesco","family":"Bertolotti","sequence":"first","affiliation":[{"name":"Universit\u00e0 degli Studi di Milano, Milan, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4652-8113","authenticated-orcid":false,"given":"Walter","family":"Cazzola","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Milano, Milan, Italy"}]}],"member":"320","published-online":{"date-parts":[[2023,2,13]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"38","volume-title":"Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC\/FSE\u201915)","author":"Allamanis Miltiadis","year":"2015","unstructured":"Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC\/FSE\u201915). ACM, New York, NY, 38\u201349."},{"key":"e_1_3_2_3_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Allamanis Miltiadis","year":"2018","unstructured":"Miltiadis Allamanis, Marc Brockschmmmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918). Vancouver, BC, Canada."},{"key":"e_1_3_2_4_2","first-page":"2091","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916)","author":"Allamanis Miltiadis","year":"2016","unstructured":"Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A convolutional attention network for extreme summarization of source code. In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916). PMLR, New York, NY, USA, 2091\u20132100."},{"key":"e_1_3_2_5_2","first-page":"207","volume-title":"Proceedings of the 10th Working Conference on Mining Software Repositories (MSR\u201913)","author":"Allamanis Miltiadis","year":"2013","unstructured":"Miltiadis Allamanis and Charles Sutton. 2013. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR\u201913). IEEE, Los Alamitos, CA, 207\u2013216."},{"key":"e_1_3_2_6_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)","author":"Alon Uri","year":"2019","unstructured":"Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919). New Orleans, LA, USA."},{"key":"e_1_3_2_7_2","first-page":"404","volume-title":"Proceedings of the 39th ACM Conference on Programming Language Design and Implementation (PLDI\u201918)","author":"Alon Uri","year":"2018","unstructured":"Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. In Proceedings of the 39th ACM Conference on Programming Language Design and Implementation (PLDI\u201918). ACM, New York, NY, 404\u2013419."},{"key":"e_1_3_2_8_2","volume-title":"Proceedings of the 46th Annual Symposium on Principles of Programming Languages (POPL\u201919)","author":"Alon Uri","year":"2019","unstructured":"Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. In Proceedings of the 46th Annual Symposium on Principles of Programming Languages (POPL\u201919). ACM, New York, NY."},{"key":"e_1_3_2_9_2","article-title":"Neural attribute machines for programming generation","volume":"1705","author":"Amodio Matthew","year":"2017","unstructured":"Matthew Amodio, Swarat Chaudhuri, and Thomas Reps. 2017. Neural attribute machines for programming generation. arXiv e-prints arXiv:1705.09231 (2017).","journal-title":"arXiv e-prints"},{"key":"e_1_3_2_10_2","article-title":"Layer normalization","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hilton. 2016. Layer normalization. ArXiv e-prints arXiv:1607.06450 (2016).","journal-title":"ArXiv e-prints arXiv:1607.06450"},{"key":"e_1_3_2_11_2","volume-title":"Proceedings of the British Machine Vision Conference (BMVC\u201916)","author":"Balntas Vassileios","year":"2016","unstructured":"Vassileios Balntas, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British Machine Vision Conference (BMVC\u201916). Article 119, 11 pages."},{"key":"e_1_3_2_12_2","first-page":"3589","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS\u201918)","author":"Ben-Nun Tal","year":"2018","unstructured":"Tal Ben-Nun, Alice Shashana Jakobovits, and Torsten Hoefler. 2018. Neural code comprehension: A learnable representation of code semantics. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS\u201918). 3589\u20133601."},{"issue":"2","key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/72.279181","article-title":"Learning long-term dependencies with gradient descent is difficult","volume":"5","author":"Bengio Yoshua","year":"1994","unstructured":"Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transaction on Neural Networks 5, 2 (March1994), 157\u2013166.","journal-title":"IEEE Transaction on Neural Networks"},{"key":"e_1_3_2_14_2","first-page":"310","volume-title":"Proceedings of the 19th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201904)","author":"Breu Silvia","year":"2004","unstructured":"Silvia Breu and Jens Krinke. 2004. Aspect mining using event traces. In Proceedings of the 19th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201904). IEEE, Los Alamitos, CA, 310\u2013315."},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919, poster session)","author":"Brockschmidt Marc","year":"2019","unstructured":"Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, and Oleksandr Polozov. 2019. Generative code modeling with graphs. In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919, poster session). New Orleans, LA, USA."},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","unstructured":"Simon Butler Michel Wermelinger Yijun Yu and Helen Sharp. 2011. Improving the tokenisation of identifier names. In ECOOP 2011\u2014Object-Oriented Programming . Lecture Notes in Computer Science Vol. 6813. Springer 130\u2013154.","DOI":"10.1007\/978-3-642-22655-7_7"},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of the 32nd Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201917)","author":"Celik Ahmet","year":"2017","unstructured":"Ahmet Celik, Pai Sreepathi, Sarfraz Khurshid, and Milos Gligoric. 2017. Bounded exhaustive test-input generation on GPUs. In Proceedings of the 32nd Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201917). ACM, New York, NY."},{"key":"e_1_3_2_18_2","article-title":"Evaluating large language models trained on code","volume":"2107","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, et\u00a0al. 2021. Evaluating large language models trained on code. arXiv e-prints arXiv:2107.03374 (2021).","journal-title":"arXiv e-prints"},{"key":"e_1_3_2_19_2","first-page":"826","volume-title":"Proceedings of the 33rd IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201918)","author":"Chen Qingying","year":"2018","unstructured":"Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201918). ACM, New York, NY, 826\u2013831."},{"key":"e_1_3_2_20_2","first-page":"2552","volume-title":"Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS\u201918)","author":"Chen Xinyun","year":"2018","unstructured":"Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS\u201918). 2552\u20132562."},{"issue":"2","key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","article-title":"The regression analysis of binary sequences","volume":"20","author":"Cox David R.","year":"1958","unstructured":"David R. Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society 20, 2 (July1958), 215\u2013232.","journal-title":"Journal of the Royal Statistical Society"},{"issue":"12","key":"e_1_3_2_22_2","first-page":"46","article-title":"Neural networks primer, part I","volume":"2","author":"Cudill Maureen","year":"1987","unstructured":"Maureen Cudill. 1987. Neural networks primer, part I. AI Expert 2, 12 (Dec.1987), 46\u201352.","journal-title":"AI Expert"},{"key":"e_1_3_2_23_2","first-page":"46","volume-title":"Proceedings of the 16th International Conference on Mining Software Repositories (MSR\u201919)","author":"Dam Hoa Khanh","year":"2019","unstructured":"Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Tryuen Tran, John Grundy, Aditya Ghose, Kim Taeksu, and Chul-Joo Kim. 2019. Lessons learned from using a deep tree-based model for software defect prediction in practice. In Proceedings of the 16th International Conference on Mining Software Repositories (MSR\u201919). IEEE, Los Alamitos, CA, 46\u201357."},{"key":"e_1_3_2_24_2","volume-title":"Proceedings of the 32nd Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201917)","author":"Donaldson Alaistair","year":"2017","unstructured":"Alaistair Donaldson, Hugues Evrard, Andrei Lascu, and Paul Thomson. 2017. Bounded exhaustive test-input generation on GPUs. In Proceedings of the 32nd Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201917). ACM, New York, NY."},{"key":"e_1_3_2_25_2","first-page":"205","volume-title":"Proceedings of the 14th International Conference on Natural Language Processing (ICON\u201917)","author":"Dwivedi Vijay Prakash","year":"2017","unstructured":"Vijay Prakash Dwivedi and Manish Shrivastava. 2017. Beyond Word2Vec: Embedding words and phrases in same vector space. In Proceedings of the 14th International Conference on Natural Language Processing (ICON\u201917). 205\u2013211."},{"key":"e_1_3_2_26_2","first-page":"1536","volume-title":"Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201920)","author":"Feng Zhangyin","year":"2020","unstructured":"Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, et\u00a0al. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201920). Association for Computational Linguistics, 1536\u20131547."},{"issue":"10","key":"e_1_3_2_27_2","first-page":"451","article-title":"Learning to forget: Continual prediction with LSTM","volume":"12","author":"Gers Felix A.","year":"2000","unstructured":"Felix A. Gers, J\u00fcrgen A. Schmidhuber, and Fred A. Cummins. 2000. Learning to forget: Continual prediction with LSTM. Neural Computation 12, 10 (Oct.2000), 451\u20132471.","journal-title":"Neural Computation"},{"key":"e_1_3_2_28_2","first-page":"933","volume-title":"Proceedings of the 40th International Conference on Software Engineering (ICSE\u201918)","author":"Gu Xiaodong","year":"2018","unstructured":"Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering (ICSE\u201918). ACM, New York, NY, 933\u2013944."},{"key":"e_1_3_2_29_2","first-page":"223","volume-title":"Proceedings of the 32nd ACM\/IEEE International Conference on Software Engineering (ICSE\u201910)","author":"Haiduc Sonia","year":"2010","unstructured":"Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In Proceedings of the 32nd ACM\/IEEE International Conference on Software Engineering (ICSE\u201910). IEEE, Los Alamitos, CA, 223\u2013226."},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MIS.2009.36","article-title":"The unreasonable effectiveness of data","volume":"24","author":"Halevy Alon","year":"2009","unstructured":"Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24 (March\/April2009), 8\u201312.","journal-title":"IEEE Intelligent Systems"},{"key":"e_1_3_2_31_2","first-page":"763","volume-title":"Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC\/FSE\u201917)","author":"Hellendoorn Vincent J.","year":"2017","unstructured":"Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural network the best choice for modeling source code? In Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC\/FSE\u201917). ACM, New York, NY, 763\u2013773."},{"issue":"6","key":"e_1_3_2_32_2","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1007\/s10664-013-9261-0","article-title":"An empirical study of identifier splitting techniques","volume":"19","author":"Hill Emily","year":"2014","unstructured":"Emily Hill, David Binkley, Dawn Lawrie, Lori Pollock, and K. Vujay-Shanker. 2014. An empirical study of identifier splitting techniques. Empirical Software Engineering 19, 6 (Dec.2014), 1754\u20131780.","journal-title":"Empirical Software Engineering"},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1145\/3377811.3380361","volume-title":"Proceedings of the 42nd ACM\/IEEE International Conference on Software Engineering (ICSE\u201920)","author":"Hoang Thong","year":"2020","unstructured":"Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. CC2Vec: Distributed representations of code changes. In Proceedings of the 42nd ACM\/IEEE International Conference on Software Engineering (ICSE\u201920). ACM, New York, NY, 518\u2013529."},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","unstructured":"Han Hu Qiuyuan Chen and Zhaoyi Liu. 2019. Code generation from supervised code embeddings. In Proceedings of the 26th International Conference on Neural Information Processing (ICONIP\u201919) . Communications in Computer and Information Science Vol. 1142 Tom Gedeon Kok Wai Wong and Minho Lee (Eds.). Springer Sydney Australia 388\u2013396.","DOI":"10.1007\/978-3-030-36808-1_42"},{"key":"e_1_3_2_35_2","first-page":"200","volume-title":"Proceedings of the 26th Conference on Program Comprehension (ICPC\u201918)","author":"Hu Xing","year":"2018","unstructured":"Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension (ICPC\u201918). ACM, New York, NY, 200\u2013210."},{"issue":"24","key":"e_1_3_2_36_2","first-page":"Article 653, 3","article-title":"Spiral: Splitters for identifiers in source code files","volume":"3","author":"Hucka Michael","year":"2018","unstructured":"Michael Hucka. 2018. Spiral: Splitters for identifiers in source code files. Journal of Open Source Software 3, 24 (April2018), Article 653, 3 pages.","journal-title":"Journal of Open Source Software"},{"key":"e_1_3_2_37_2","article-title":"CodeSearchNet Challenge: Evaluating the state of semantic code search","author":"Husain Hamel","year":"2019","unstructured":"Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).","journal-title":"arXiv preprint arXiv:1909.09436 (2019)"},{"key":"e_1_3_2_38_2","first-page":"2073","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL\u201916)","author":"Iyer Srinivasan","year":"2016","unstructured":"Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL\u201916). 2073\u20132083."},{"key":"e_1_3_2_39_2","first-page":"602","volume-title":"Proceedings of the 34th International Conference on Automated Software Engineering (ASE\u201919)","author":"Jiang Lin","year":"2019","unstructured":"Lin Jiang, Huai Liu, and He Jiang. 2019. Machine learning based recommendation of method names: How far are we? In Proceedings of the 34th International Conference on Automated Software Engineering (ASE\u201919). IEEE, Los Alamitos, CA, 602\u2013614."},{"key":"e_1_3_2_40_2","first-page":"1161","volume-title":"Proceedings of the 43rd ACM\/IEEE International Conference on Software Engineering (ICSE\u201921)","author":"Jiang Nan","year":"2021","unstructured":"Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-aware neural machine translation for automatic program repair. In Proceedings of the 43rd ACM\/IEEE International Conference on Software Engineering (ICSE\u201921). IEEE, Los Alamitos, CA, 1161\u20131173."},{"key":"e_1_3_2_41_2","first-page":"135","volume-title":"Proceedings of the 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201917)","author":"Jiang Siyuan","year":"2017","unstructured":"Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201917). IEEE, Los Alamitos, CA, 135\u2013146."},{"key":"e_1_3_2_42_2","first-page":"1","volume-title":"Proceedings of the 34th International Conference on Automated Software Engineering (ASE\u201919)","author":"Kang Hong Jin","year":"2019","unstructured":"Hong Jin Kang, Tegawend\u00e9 F. Bissyand\u00e9, and David Lo. 2019. Assessing the generalizability of code2vec token embeddings. In Proceedings of the 34th International Conference on Automated Software Engineering (ASE\u201919). IEEE, Los Alamitos, CA, 1\u201312."},{"issue":"1","key":"e_1_3_2_43_2","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/TSE.2013.45","article-title":"Variability mining: Consistent semi-automatic detection of product-line features","volume":"40","author":"K\u00e4stner Christian","year":"2014","unstructured":"Christian K\u00e4stner, Alexander Dreiling, and Klaus Ostermann. 2014. Variability mining: Consistent semi-automatic detection of product-line features. IEEE Transactions on Software Engineering 40, 1 (Jan.2014), 67\u201382.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Jan Keim Angelika Kaplan Anne Koziolek and Mehdi Mirakhorli. 2020. Does BERT understand code?\u2014An exploratory study on the detection of architectural tactics in code. In Software Architecture . Lecture Notes in Computer Science Vol. 12292. Springer 220\u2013228.","DOI":"10.1007\/978-3-030-58923-3_15"},{"key":"e_1_3_2_45_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201915)","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR\u201915). San Diego, CA, USA."},{"key":"e_1_3_2_46_2","first-page":"3294","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915)","author":"Kiros Ryan","year":"2015","unstructured":"Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. Skip-thought vectors. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915). MIT Press, Montr\u00e9al, Canada, 3294\u20133302."},{"key":"e_1_3_2_47_2","article-title":"Data-driven program completion","volume":"1705","author":"Lu Yanxin","year":"2017","unstructured":"Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-driven program completion. arXiv e-prints arXiv:1705.09042 (2017).","journal-title":"arXiv e-prints"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","first-page":"1412","DOI":"10.18653\/v1\/D15-1166","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201915)","author":"Luong Minh-Thang","year":"2015","unstructured":"Minh-Thang Luong, Hicu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201915). Association for Computational Linguistics, Lisbon, Portugal, 1412\u20131421."},{"key":"e_1_3_2_49_2","first-page":"649","volume-title":"Proceedings of the 31st International Conference on Machine Learning (ICML\u201914)","author":"Maddison Chris","year":"2014","unstructured":"Chris Maddison and Daniel Tarlow. 2014. Structured generative models of natural source code. In Proceedings of the 31st International Conference on Machine Learning (ICML\u201914). PMLR, Beijing, China, 649\u2013657."},{"key":"e_1_3_2_50_2","volume-title":"Agile Software Development: Principles, Patterns and Practices","author":"Martin Robert C.","year":"2003","unstructured":"Robert C. Martin, James W. Newkirk, and Robert S. Koss. 2003. Agile Software Development: Principles, Patterns and Practices. Prentice Hall, Upper Saddle River, NJ."},{"key":"e_1_3_2_51_2","first-page":"396","volume-title":"Proceedings of the 30th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201915)","author":"Martinez Jabier","year":"2015","unstructured":"Jabier Martinez, Tewfik Ziadi, Tegawend\u00e9 F. Bissyand\u00e9, Jacques Klein, and Yves le Traon. 2015. Automating the extraction of model-based software product lines from model variants. In Proceedings of the 30th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201915). IEEE, Los Alamitos, CA, 396\u2013406."},{"key":"e_1_3_2_52_2","first-page":"3111","volume-title":"Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS\u201913)","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS\u201913). Curran Associates Inc., Lake Tahoe, NV, USA, 3111\u20133119."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1145\/2384616.2384689","volume-title":"Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA\u201912)","author":"Mishne Alon","year":"2012","unstructured":"Alon Mishne, Sharon Shoham, and Eran Yahav. 2012. Typestate-based semantic code search over partial programs. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA\u201912). ACM, New York, NY, 997\u20131016."},{"key":"e_1_3_2_54_2","first-page":"36","volume-title":"Proceedings of the 2nd Workshop on Linking Aspect Technology and Evolution (LATE\u201906)","author":"Moldovan Grigoreta Sofia","year":"2006","unstructured":"Grigoreta Sofia Moldovan and Gabriela \u015eerban. 2006. Aspect mining using a vector-space model based clustering approach. In Proceedings of the 2nd Workshop on Linking Aspect Technology and Evolution (LATE\u201906). CWI, Bonn, Germany, 36\u201340."},{"key":"e_1_3_2_55_2","first-page":"35","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL\u201913)","author":"Movshovitz-Attias Dana","year":"2013","unstructured":"Dana Movshovitz-Attias and William W. Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL\u201913). Association for Computational Linguistics, Sofia, Bulgaria, 35\u201340."},{"key":"e_1_3_2_56_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Murali Vijayaraghavan","year":"2018","unstructured":"Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermain. 2018. Neural sketch learning for conditional program generation. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918), Vancouver, BC, Canada."},{"key":"e_1_3_2_57_2","first-page":"219","volume-title":"Proceedings of the 36th International Conference on Computer Software and Applications (COMPSAC\u201912)","author":"Niu Nan","year":"2012","unstructured":"Nan Niu, Juha Savolainen, Tanmay Bhowmik, Anas Mahmoud, and Sandeep Reddivari. 2012. A framework for examining topical locality in object-oriented software. In Proceedings of the 36th International Conference on Computer Software and Applications (COMPSAC\u201912). IEEE, Los Alamitos, CA, 219\u2013224."},{"key":"e_1_3_2_58_2","first-page":"574","volume-title":"Proceedings of the 30th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201915)","author":"Oda Yusuke","year":"2015","unstructured":"Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation. In Proceedings of the 30th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201915). IEEE, Los Alamitos, CA, 574\u2013584."},{"key":"e_1_3_2_59_2","first-page":"854","volume-title":"Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANPL\u201919)","author":"Omote Yutaro","year":"2019","unstructured":"Yutaro Omote, Akihiro Tamura, and Takashi Ninomiya. 2019. Dependency-based relative positional encoding for transformer NMT. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANPL\u201919). Varna, Bulgaria, 854\u2013861."},{"key":"e_1_3_2_60_2","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML\u201913)","author":"Pascanu Razvan","year":"2013","unstructured":"Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913). Atlanta, GA, USA, III-1310\u2013III-1318."},{"key":"e_1_3_2_61_2","first-page":"1532","volume-title":"Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201914)","author":"Pennington Jeffrey","year":"2014","unstructured":"Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201914). Association for Computational Linguistics, Doha, Qatar, 1532\u20131543."},{"key":"e_1_3_2_62_2","first-page":"1093","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915)","author":"Piech Chris","year":"2015","unstructured":"Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, and Leonidas Guibas. 2015. Learning program embeddings to propagate feedback on student code. In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915). PMLR, Lille, France, 1093\u20131102."},{"key":"e_1_3_2_63_2","first-page":"564","volume-title":"Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD\u201907)","author":"Qu Liping","year":"2007","unstructured":"Liping Qu and Daxin Liu. 2007. Extending dynamic aspect mining using formal concept analysis. In Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD\u201907). IEEE, Los Alamitos, CA, 564\u2013567."},{"key":"e_1_3_2_64_2","article-title":"On the generalizability of neural program models with respect to semantic-preserving program transformations","volume":"135","author":"Rabin Rafiqul Islam","year":"2021","unstructured":"Rafiqul Islam Rabin, Nghi D. Q. Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, and Mohammad Amin Alipour. 2021. On the generalizability of neural program models with respect to semantic-preserving program transformations. Information and Software Technology 135 (Feb.2021), 106552.","journal-title":"Information and Software Technology"},{"key":"e_1_3_2_65_2","first-page":"29","volume-title":"Proceedings of the 1st International Workshop on Representation Learning for Software Engineering and Program Languages (RL+SE&PL\u201920)","author":"Rabin Rafiqul Islam","year":"2020","unstructured":"Rafiqul Islam Rabin, Arjun Mukherjee, Omprakash Gnawali, and Mohammad Amin Alipour. 2020. Towards demystifying dimensions of source code embeddings. In Proceedings of the 1st International Workshop on Representation Learning for Software Engineering and Program Languages (RL+SE&PL\u201920). ACM, New York, NY, 29\u201338."},{"key":"e_1_3_2_66_2","first-page":"111","volume-title":"Proceedings of the 42nd Annual Symposium on Principles of Programming Languages (POPL\u201915)","author":"Raychev Veselin","year":"2015","unstructured":"Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from \u2018big code.\u2019 In Proceedings of the 42nd Annual Symposium on Principles of Programming Languages (POPL\u201915)ACM, New York, NY, 111\u2013124."},{"key":"e_1_3_2_67_2","first-page":"419","volume-title":"Proceedings of the 35th ACM Conference on Programming Language Design and Implementation (PLDI\u201914)","author":"Raychev Veselin","year":"2014","unstructured":"Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the 35th ACM Conference on Programming Language Design and Implementation (PLDI\u201914). ACM, New York, NY, 419\u2013428."},{"key":"e_1_3_2_68_2","doi-asserted-by":"crossref","first-page":"4631","DOI":"10.18653\/v1\/D18-1492","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201918)","author":"Shi Haoyue","year":"2018","unstructured":"Haoyue Shi, Hao Zhou, Jiaze Chen, and Lei Li. 2018. On tree-based neural sentence modeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201918). Association for Computational Linguistics, Brussels, Belgium, 4631\u20134641."},{"key":"e_1_3_2_69_2","article-title":"PathPair2Vec: An AST path pair-based code representation method for defect prediction","volume":"59","author":"Shi Ke","year":"2020","unstructured":"Ke Shi, Yang Lu, Jingfei Chang, and Zhen Weu. 2020. PathPair2Vec: An AST path pair-based code representation method for defect prediction. Journal of Computer Languages 59 (Aug.2020), 100979.","journal-title":"Journal of Computer Languages"},{"key":"e_1_3_2_70_2","first-page":"1556","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL\/IJCNLP\u201915)","author":"Tai Kai Sheng","year":"2015","unstructured":"Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL\/IJCNLP\u201915). 1556\u20131566."},{"key":"e_1_3_2_71_2","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1109\/WCRE.2004.13","volume-title":"Proceedings of the 11th Working Conference on Reverse Engineering (WCRE\u201904)","author":"Tonella Paolo","year":"2004","unstructured":"Paolo Tonella and Mariano Ceccato. 2004. Aspect mining through the formal concept analysis of execution traces. In Proceedings of the 11th Working Conference on Reverse Engineering (WCRE\u201904). IEEE, Los Alamitos, CA, 112\u2013121."},{"issue":"4","key":"e_1_3_2_72_2","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1109\/TSE.2011.57","article-title":"A semi-automatic approach for extracting software product-lines","volume":"38","author":"Valente Marco Tullio","year":"2012","unstructured":"Marco Tullio Valente, Virgilio Borges, and Leonardo Passos. 2012. A semi-automatic approach for extracting software product-lines. IEEE Transactions on Software Engineering 38, 4 (July\u2013Aug.2012), 737\u2013754.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_73_2","first-page":"6000","volume-title":"Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan M. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917). Curran Associates, Inc., Long Beach, CA, USA, 6000\u20136010."},{"key":"e_1_3_2_74_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Wang Ke","year":"2018","unstructured":"Ke Wang, Rishabh Sing, and Zhendong Su. 2018. Dynamic neural program embedding for program repair. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918), Vancouver, BC, Canada."},{"key":"e_1_3_2_75_2","article-title":"Linformer: Self-attention with linear complexity","volume":"2006","author":"Wang Sinong","year":"2017","unstructured":"Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, and Hao Ma. 2017. Linformer: Self-attention with linear complexity. arXiv e-prints arXiv:2006.04768v3 (2017).","journal-title":"arXiv e-prints"},{"key":"e_1_3_2_76_2","article-title":"Reinforcement-learning-guided source code summarization via hierarchical attention","author":"Wang Wenhua","year":"2022","unstructured":"Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Yu, Philip Yu, and Guandong Xy. 2022. Reinforcement-learning-guided source code summarization via hierarchical attention. IEEE Transactions on Software Engineering 48, 1 (2022), 102\u2013119.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_77_2","first-page":"1","volume-title":"Proceedings of the 35th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201920)","author":"Wang Yu","year":"2020","unstructured":"Yu Wang, Ke Wang, Fengjuan Gao, and Linzhang Wang. 2020. Learning semantic program embeddings with graph interval neural network. In Proceedings of the 35th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA\u201920). ACM, New York, NY, 1\u201327."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514232","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3514232","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:14Z","timestamp":1750183814000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514232"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,31]]},"references-count":76,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,31]]}},"alternative-id":["10.1145\/3514232"],"URL":"https:\/\/doi.org\/10.1145\/3514232","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,31]]},"assertion":[{"value":"2021-05-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-26","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}