Characterizing and classifying developer forum posts with their intentions

Wu, Xingfang; Laufer, Eric; Li, Heng; Khomh, Foutse; Srinivasan, Santhosh; Luo, Jayden

doi:10.1007/s10664-024-10487-z

Characterizing and classifying developer forum posts with their intentions

Published: 05 June 2024

Volume 29, article number 84, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

318 Accesses
2 Citations
Explore all metrics

Abstract

With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. Most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author’s intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Automatically identifying the function and intent of posts in underground forums

Article Open access 29 November 2018

Mining of Relevant and Informative Posts from Text Forums

Disambiguating usernames across platforms: the GeekMAN approach

Article 31 August 2024

Data Availability Statements (DAS)

We have released our annotated dataset and code in the supplementary material package, hosted on a GitHub repository, which can be accessed at: https://github.com/mooselab/suppmaterial-TechnicalPostIntention.

Notes

https://stackexchange.com/sites?view=list
https://stackoverflow.com/tags
A question-and-answer website that covers a wide range of topics and domains. The data dump only contains contents from selected technical subforums.
A forum software developed by Lithium Technologies.
An open-source forum software.

References

Al-Kofahi JM, Tamrawi A, Nguyen TT, Nguyen HA, Nguyen TN (2010) Fuzzy set approach for automatic tagging in evolving software. In: 2010 IEEE international conference on software maintenance, pp 1–10. IEEE
Allamanis M, Sutton C (2013) Why, when, and what: analyzing stack overflow questions by topic, type, and code. In: 2013 10th Working conference on mining software repositories (MSR), pp 53–56. IEEE
Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in stack overflow. Empir Softw Eng 19(3):619–654
Article Google Scholar
Beyer S, Pinzger M (2014) A manual categorization of android app development issues on stack overflow. In: 2014 IEEE international conference on software maintenance and evolution, pp 531–535. IEEE
Beyer S, Macho C, Di Penta M, Pinzger M (2017) Analyzing the relationships between android api classes and their references on stack overflow. Technical Report
Beyer S, Macho C, Di Penta M, Pinzger M (2020) What kind of questions do developers ask on stack overflow? a comparison of automated approaches to classify posts into question categories. Empir Softw Eng 25(3):2258–2301
Article Google Scholar
Boslaugh S (2012) Statistics in a nutshell: A desktop quick reference. " O’Reilly Media, Inc."
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229. Springer
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Chen H, Coogle J, Damevski K (2019) Modeling stack overflow tags and topics as a hierarchy of concepts. J Syst Softw 156:283–299
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. Preprint arXiv:2010.11929
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D et al (2020) Codebert: A pre-trained model for programming and natural languages. Preprint arXiv:2002.08155
Greco C, Haden T, Damevski K (2018) Stackintheflow: behavior-driven recommendation system for stack overflow posts. In: Proceedings of the 40th international conference on software engineering: companion proceedings, pp 5–8
Guo J, Xu S, Bao S, Yu Y (2008) Tapping on the potential of q &a community by recommending answer providers. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp 921–930
Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186
Article Google Scholar
He J, Xu B, Yang Z, Han D, Yang C, Lo D (2022) Ptm4tag: Sharpening tag recommendation of stack overflow posts with pre-trained models. Preprint arXiv:2203.10965
Hong B, Kim Y, Lee SH (2017) An efficient tag recommendation method using topic modeling approaches. In: Proceedings of the international conference on research in adaptive and convergent systems, pp 56–61
Huang C, Yao L, Wang X, Benatallah B, Sheng QZ (2017) Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow. In: 2017 IEEE international conference on web services (ICWS), pp 317–324. IEEE
Huang J, Tang D, Shou L, Gong M, Xu K, Jiang D, Zhou M, Duan N (2021) Cosqa: 20,000+ web queries for code search and question answering. Preprint arXiv:2105.13239
Huang Q, Xia X, Lo D, Murphy GC (2020) Automating intention mining. IEEE Trans Softw Eng 46(10):1098–1119
Article Google Scholar
Jin D, Jin Z, Zhou JT, Szolovits P (2020) Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI conference on artificial intelligence, vol 34–05, pp 8018–8025
Khandkar SH (2009) Open coding. University of Calgary, vol 23(2009)
Krippendorff K (2011) Computing krippendorff’s alpha-reliability. Computing 1:25
Google Scholar
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. Preprint arXiv:1909.11942
Li C, Xu L, Yan M, Lei Y (2020) Tagdc: A tag recommendation method for software information sites with a combination of deep learning and collaborative filtering. J Syst Softw 170:110783
Article Google Scholar
Liu J, Zhou P, Yang Z, Liu X, Grundy J (2018) Fasttagrec: fast tag recommendation for software information sites. Autom Softw Eng 25(4):675–701
Article Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. Preprint arXiv:1907.11692
Lu J, Wu Y, Pei J, Qin Z, Huang S, Deng C (2022) Miar: A context-aware approach for app review intention mining. Int J Softw Eng Knowl Eng 32(11n12):1689–1708
Article Google Scholar
Maity SK, Panigrahi A, Ghosh S, Banerjee A, Goyal P, Mukherjee A (2019) Deeptagrec: A content-cum-user based tag recommendation framework for stack overflow. In: Advances in information retrieval: 41st European conference on IR research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part II 41, pp 125–131. Springer
Mashhadi E, Hemmati H (2021) Applying codebert for automated program repair of java simple bugs. In: 2021 IEEE/ACM 18th international conference on mining software repositories (MSR), pp 505–509. IEEE
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Qiao Y, Xiong C, Liu Z, Liu Z (2019) Understanding the behaviors of bert in ranking. Preprint arXiv:1904.07531
Reimers N, Gurevych I (2019) Sentence-bert: Sentence embeddings using siamese bert-networks. Preprint arXiv:1908.10084
Rosen C, Shihab E (2016) What are mobile developers asking about? a large scale study using stack overflow. Empir Softw Eng 21(3):1192–1223
Article Google Scholar
Sahare M, Gupta H (2012) A review of multi-class classification for imbalanced data. Int J Adv Comput Res 2(3):160
Google Scholar
Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. Preprint arXiv:1910.01108
StackOverflow (2022). Best practices for tag lifecycle management: Applying tags
Stol K-J, Fitzgerald B (2018) The abc of software engineering research. ACM Trans Softw Eng Methodol (TOSEM) 27(3):1–51
Article Google Scholar
Tabassum J, Maddela M, Xu W, Ritter A (2020) Code and named entity recognition in stackoverflow. In: Proceedings of the 58th annual meeting of the association for computational linguistics (ACL)
Treude C, Barzilay O, Storey M-A (2011) How do programmers ask and answer questions on the web?(nier track). In: Proceedings of the 33rd international conference on software engineering, pp 804–807
Von der Mosel J, Trautsch A, Herbold S (2022) On the validity of pre-trained transformers for natural language processing in the software engineering domain. IEEE Trans Softw Eng
Wang S, Lo D, Vasilescu B, Serebrenik A (2018) Entagrec++: An enhanced tag recommendation system for software information sites. Empir Softw Eng 23:800–832
Article Google Scholar
Wang X-Y, Xia X, Lo D (2015) Tagcombine: Recommending tags to contents in software information sites. J Comput Sci Technol 30(5):1017–1035
Article Google Scholar
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2019) Huggingface’s transformers: State-of-the-art natural language processing. Preprint arXiv:1910.03771
Yang C, Xu B, Khan JY, Uddin G, Han D, Yang Z, Lo D (2022) Aspect-based api review classification: How far can pre-trained transformer model go. In: 2022 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE Computer Society
Yazdaninia M, Lo D, Sami A (2021) Characterization and prediction of questions without accepted answers on stack overflow. In: 2021 IEEE/ACM 29th international conference on program comprehension (ICPC), pp 59–70. IEEE
Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER), pp 272–282. IEEE
Zhou P, Liu J, Liu X, Yang Z, Grundy J (2019) Is deep learning better than traditional approaches in tag recommendation for software information sites? Inf Softw Technol 109:1–13
Article Google Scholar

Download references

Acknowledgements

We would like to gratefully acknowledge the Mitacs-Accelerate program and the Natural Sciences and Engineering Research Council of Canada(NSERC) for funding this project.

Author information

Authors and Affiliations

Department of Computer Engineering and Software Engineering, Polytechnique Montréal, Montréal, Québec, Canada
Xingfang Wu, Heng Li & Foutse Khomh
Peritus.ai Canada Inc., Montréal, Québec, Canada
Eric Laufer & Jayden Luo
Peritus.ai, Inc., Palo Alto, CA, 94301, USA
Santhosh Srinivasan

Authors

Xingfang Wu
View author publications
Search author on:PubMed Google Scholar
Eric Laufer
View author publications
Search author on:PubMed Google Scholar
Heng Li
View author publications
Search author on:PubMed Google Scholar
Foutse Khomh
View author publications
Search author on:PubMed Google Scholar
Santhosh Srinivasan
View author publications
Search author on:PubMed Google Scholar
Jayden Luo
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Xingfang Wu.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare relevant to this article’s content.

Additional information

Communicated by: Christoph Treude.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, X., Laufer, E., Li, H. et al. Characterizing and classifying developer forum posts with their intentions. Empir Software Eng 29, 84 (2024). https://doi.org/10.1007/s10664-024-10487-z

Download citation

Accepted: 11 April 2024
Published: 05 June 2024
Version of record: 05 June 2024
DOI: https://doi.org/10.1007/s10664-024-10487-z

Keywords

Profiles

Xingfang Wu View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Characterizing and classifying developer forum posts with their intentions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatically identifying the function and intent of posts in underground forums

Mining of Relevant and Informative Posts from Text Forums

Disambiguating usernames across platforms: the GeekMAN approach

Explore related subjects

Data Availability Statements (DAS)

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now