{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T10:20:44Z","timestamp":1776680444889,"version":"3.51.2"},"reference-count":166,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,2,11]],"date-time":"2022-02-11T00:00:00Z","timestamp":1644537600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>In recent years, the exponential growth of digital documents has been met by rapid progress in text classification techniques. Newly proposed machine learning algorithms leverage the latest advancements in deep learning methods, allowing for the automatic extraction of expressive features. The swift development of these methods has led to a plethora of strategies to encode natural language into machine-interpretable data. The latest language modelling algorithms are used in conjunction with ad hoc preprocessing procedures, of which the description is often omitted in favour of a more detailed explanation of the classification step. This paper offers a concise review of recent text classification models, with emphasis on the flow of data, from raw text to output labels. We highlight the differences between earlier methods and more recent, deep learning-based methods in both their functioning and in how they transform input data. To give a better perspective on the text classification landscape, we provide an overview of datasets for the English language, as well as supplying instructions for the synthesis of two new multilabel datasets, which we found to be particularly scarce in this setting. Finally, we provide an outline of new experimental results and discuss the open research challenges posed by deep learning-based language models.<\/jats:p>","DOI":"10.3390\/info13020083","type":"journal-article","created":{"date-parts":[[2022,2,13]],"date-time":"2022-02-13T21:08:43Z","timestamp":1644786523000},"page":"83","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":202,"title":["A Survey on Text Classification Algorithms: From Text to Predictions"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4986-0442","authenticated-orcid":false,"given":"Andrea","family":"Gasparetto","sequence":"first","affiliation":[{"name":"Department of Management, Ca\u2019 Foscari University, 30123 Venice, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0451-4899","authenticated-orcid":false,"given":"Matteo","family":"Marcuzzo","sequence":"additional","affiliation":[{"name":"Department of Management, Ca\u2019 Foscari University, 30123 Venice, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3634-6607","authenticated-orcid":false,"given":"Alessandro","family":"Zangari","sequence":"additional","affiliation":[{"name":"Department of Management, Ca\u2019 Foscari University, 30123 Venice, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3659-5099","authenticated-orcid":false,"given":"Andrea","family":"Albarelli","sequence":"additional","affiliation":[{"name":"Department of Environmental Sciences, Informatics and Statistics, Ca\u2019 Foscari University, 30123 Venice, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,11]]},"reference":[{"key":"ref_1","unstructured":"Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P.S., and He, L. (2020). A Survey on Text Classification: From Shallow to Deep Learning. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 10.","DOI":"10.3390\/info10040150"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3439726","article-title":"Deep Learning\u2013Based Text Classification: A Comprehensive Review","volume":"54","author":"Minaee","year":"2021","journal-title":"Acm Comput. Surv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Graves, A. (2013). Generating Sequences With Recurrent Neural Networks. arXiv.","DOI":"10.1007\/978-3-642-24797-2_3"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Manning, C.D., Raghavan, P., and Sch\u00fctze, H. (2008). Introduction to Information Retrieval, Cambridge University Press.","DOI":"10.1017\/CBO9780511809071"},{"key":"ref_6","unstructured":"Mielke, S.J., Alyafeai, Z., Salesky, E., Raffel, C., Dey, M., Gall\u00e9, M., Raja, A., Si, C., Lee, W.Y., and Sagot, B. (2021). Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP. arXiv."},{"key":"ref_7","unstructured":"Saif, H., Fernandez, M., He, Y., and Alani, H. (2014, January 26\u201331). On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914), Reykjavik, Iceland."},{"key":"ref_8","first-page":"1930","article-title":"A Comparative Study of Stemming Algorithms","volume":"2","author":"Jivani","year":"2011","journal-title":"Int. J. Comput. Technol. Appl."},{"key":"ref_9","unstructured":"Plisson, J., Lavrac, N., and Mladenic, D. (2004, January 11\u201315). A rule based approach to word lemmatization. Proceedings of the 7th International Multiconference on Information Society (IS04), Ljubljana, Slovenia."},{"key":"ref_10","first-page":"23","article-title":"A New Algorithm for Data Compression","volume":"12","author":"Gage","year":"1994","journal-title":"C Users J."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., and Birch, A. (2016, January 7\u201312). Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1162"},{"key":"ref_12","first-page":"9","article-title":"Language Models are Unsupervised Multitask Learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_13","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_14","first-page":"9154","article-title":"Neural Machine Translation with Byte-Level Subwords","volume":"34","author":"Wang","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Schuster, M., and Nakajima, K. (2012, January 25\u201330). Japanese and Korean voice search. Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan.","DOI":"10.1109\/ICASSP.2012.6289079"},{"key":"ref_16","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minnesota, MN, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kudo, T. (2018, January 15\u201320). Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1007"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Kudo, T., and Richardson, J. (November, January 31). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-2012"},{"key":"ref_19","unstructured":"Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., and Le, Q.V. (2019, January 8\u201314). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020, January 5\u201310). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online event.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"ref_21","unstructured":"Pratikakis, I., Dupont, F., and Ovsjanikov, M. (2017, January 23\u201324). Deformable Shape Retrieval with Missing Parts. Proceedings of the Eurographics Workshop on 3D Object Retrieval, Lyon, France."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Gasparetto, A., Minello, G., and Torsello, A. (2015, January 19\u201322). Non-parametric Spectral Model for Shape Retrieval. Proceedings of the 2015 International Conference on 3D Vision, Lyon, France.","DOI":"10.1109\/3DV.2015.46"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1016\/j.optlaseng.2019.05.006","article-title":"Robust phase unwrapping by probabilistic consensus","volume":"121","author":"Pistellato","year":"2019","journal-title":"Opt. Lasers Eng."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1108\/eb026526","article-title":"A statistical interpretation of term specificity and its application in retrieval","volume":"28","author":"Jones","year":"1972","journal-title":"J. Doc."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1016\/0098-3004(93)90090-R","article-title":"Principal components analysis (PCA)","volume":"19","author":"Ratajczak","year":"1993","journal-title":"Comput. Geosci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"169","DOI":"10.3233\/AIC-170729","article-title":"Linear discriminant analysis: A detailed tutorial","volume":"30","author":"Tharwat","year":"2017","journal-title":"Ai Commun."},{"key":"ref_27","unstructured":"Tsuge, S., Shishibori, M., Kuroiwa, S., and Kita, K. (2001, January 7\u201310). Dimensionality reduction using non-negative matrix factorization for information retrieval. Proceedings of the 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), Tucson, AZ, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1109\/5.880083","article-title":"Two decades of statistical language modeling: Where do we go from here?","volume":"88","author":"Rosenfeld","year":"2000","journal-title":"Proc. IEEE"},{"key":"ref_29","unstructured":"Jurafsky, D., and Martin, J. (2021, December 28). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Available online: https:\/\/web.stanford.edu\/~jurafsky\/slp3\/ed3book.pdf."},{"key":"ref_30","unstructured":"Huang, E.H., Socher, R., Manning, C.D., and Ng, A.Y. (2012, January 8\u201314). Improving Word Representations via Global Context and Multiple Word Prototypes. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea."},{"key":"ref_31","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient Estimation of Word Representations in Vector Space. Proceedings of the 1st International Conference on Learning Representations (ICLR 2013), Scottsdale, AZ, USA."},{"key":"ref_32","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., and Dean, J. (2013, January 5\u201310). Distributed Representations of Words and Phrases and their Compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 25\u201329). GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Baroni, M., Dinu, G., and Kruszewski, G. (2014, January 22\u201327). Don\u2019t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA.","DOI":"10.3115\/v1\/P14-1023"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching Word Vectors with Subword Information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Xu, S., Li, Y., and Wang, Z. (2017). Bayesian Multinomial Na\u00efve Bayes Classifier to Text Classification. Advanced Multimedia and Ubiquitous Engineering, Springer.","DOI":"10.1007\/978-981-10-5041-1_57"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Van den Bosch, A. (2017). Hidden Markov Models. Encyclopedia of Machine Learning and Data Mining, Springer.","DOI":"10.1007\/978-1-4899-7687-1_124"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1561\/2200000013","article-title":"An Introduction to Conditional Random Fields","volume":"4","author":"Sutton","year":"2012","journal-title":"Found. Trends\u00ae Mach. Learn."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest Neighbor pattern classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_40","unstructured":"Li, B., Yu, S., and Lu, Q. (2003). An Improved k-Nearest Neighbor Algorithm for Text Categorization. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s42452-019-1356-9","article-title":"Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets","volume":"1","author":"Ali","year":"2019","journal-title":"SN Appl. Sci."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1126\/science.153.3731.34","article-title":"Dynamic Programming","volume":"153","author":"Bellman","year":"1966","journal-title":"Science"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-Vector Networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Boser, B.E., Guyon, I.M., and Vapnik, V.N. (1992, January 27\u201329). A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA.","DOI":"10.1145\/130385.130401"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/21.97458","article-title":"A survey of decision tree classifier methodology","volume":"21","author":"Safavian","year":"1991","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_46","unstructured":"Ho, T.K. (1995, January 14\u201315). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Islam, M.Z., Liu, J., Li, J., Liu, L., and Kang, W. (2019, January 3\u20137). A Semantics Aware Random Forest for Text Classification. Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM \u201919), Beijing, China.","DOI":"10.1145\/3357384.3357891"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1198\/004017007000000245","article-title":"Large-Scale Bayesian Logistic Regression for Text Categorization","volume":"49","author":"Genkin","year":"2007","journal-title":"Technometrics"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"957","DOI":"10.1109\/TPAMI.2005.127","article-title":"Sparse multinomial logistic regression: Fast algorithms and generalization bounds","volume":"27","author":"Krishnapuram","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"2004","journal-title":"Mach. Learn."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF00116037","article-title":"The Strength of Weak Learnability","volume":"5","author":"Schapire","year":"1990","journal-title":"Mach. Learn."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1006\/jcss.1997.1504","article-title":"A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting","volume":"55","author":"Freund","year":"1997","journal-title":"J. Comput. Syst. Sci."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (2017, January 3\u20137). Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain.","DOI":"10.18653\/v1\/E17-2068"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"103699","DOI":"10.1016\/j.jbi.2021.103699","article-title":"GHS-NET a generic hybridized shallow neural network for multi-label biomedical text classification","volume":"116","author":"Ibrahim","year":"2021","journal-title":"J. Biomed. Inform."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daum\u00e9, H. (2015, January 27\u201331). Deep Unordered Composition Rivals Syntactic Methods for Text Classification. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China.","DOI":"10.3115\/v1\/P15-1162"},{"key":"ref_56","unstructured":"Le, Q., and Mikolov, T. (2014, January 21\u201326). Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_57","unstructured":"Mikolov, T., Le, Q.V., and Sutskever, I. (2013). Exploiting Similarities among Languages for Machine Translation. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Tai, K.S., Socher, R., and Manning, C.D. (2015, January 27\u201331). Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China.","DOI":"10.3115\/v1\/P15-1150"},{"key":"ref_60","unstructured":"Dieng, A.B., Wang, C., Gao, J., and Paisley, J. (2016). TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Howard, J., and Ruder, S. (2018, January 15\u201320). Universal Language Model Fine-tuning for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1031"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Wang, B. (2018, January 15\u201320). Disconnected Recurrent Neural Networks for Text Categorization. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1215"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Cho, K., van Merri\u00ebnboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014, January 25\u201329). Learning Phrase Representations using RNN Encoder\u2013Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018, January 1\u20136). Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1202"},{"key":"ref_66","unstructured":"Zhang, Y., and Wallace, B.C. (2017, January 27\u201330). A Sensitivity Analysis of (and Practitioners\u2019 Guide to) Convolutional Neural Networks for Sentence Classification. Proceedings of the Eighth International Joint Conference on Natural Language Processing (IJCNLP), Taipei, Taiwan."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Stone, A., Wang, H., Stark, M., Liu, Y., Phoenix, D., and George, D. (2017, January 21\u201326). Teaching Compositionality to CNNs. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.85"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Pistellato, M., Cosmo, L., Bergamasco, F., Gasparetto, A., and Albarelli, A. (2018, January 20\u201324). Adaptive Albedo Compensation for Accurate Phase-Shift Coding. Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China.","DOI":"10.1109\/ICPR.2018.8545465"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Kim, Y. (2014, January 25\u201329). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1181"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Lea, C., Flynn, M.D., Vidal, R., Reiter, A., and Hager, G.D. (2017, January 21\u201326). Temporal Convolutional Networks for Action Segmentation and Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.113"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Abreu, J., Fred, L., Mac\u00eado, D., and Zanchettin, C. (2019). Hierarchical Attentional Hybrid Neural Networks for Document Classification. Artificial Neural Networks and Machine Learning\u2014ICANN 2019: Workshop and Special Sessions, Springer International Publishing.","DOI":"10.1007\/978-3-030-30493-5_39"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"8055","DOI":"10.1038\/s41598-020-65070-5","article-title":"Temporal Convolutional Networks for the Advance Prediction of ENSO","volume":"10","author":"Yan","year":"2020","journal-title":"Sci. Rep."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Conneau, A., Schwenk, H., Barrault, L., and Lecun, Y. (2017, January 3\u20137). Very Deep Convolutional Networks for Text Classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain.","DOI":"10.18653\/v1\/E17-1104"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Duque, A.B., Santos, L.L.J., Mac\u00eado, D., and Zanchettin, C. (2019). Squeezed Very Deep Convolutional Neural Networks for Text Classification. Artificial Neural Networks and Machine Learning\u2014ICANN 2019: Theoretical Neural Computation, Springer International Publishing.","DOI":"10.1007\/978-3-030-30487-4_16"},{"key":"ref_75","first-page":"3104","article-title":"Sequence to Sequence Learning with Neural Networks","volume":"Volume 2","author":"Sutskever","year":"2014","journal-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems"},{"key":"ref_76","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv."},{"key":"ref_77","first-page":"1310","article-title":"On the difficulty of training recurrent neural networks","volume":"Volume 28","author":"Dasgupta","year":"2013","journal-title":"Proceedings of the 30th International Conference on Machine Learning"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Luong, T., Pham, H., and Manning, C.D. (2015, January 17\u201321). Effective Approaches to Attention-based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1166"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Bender, E.M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021, January 3\u201310). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Online event.","DOI":"10.1145\/3442188.3445922"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. (2016, January 12\u201317). Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA.","DOI":"10.18653\/v1\/N16-1174"},{"key":"ref_81","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_82","unstructured":"Liu, P.J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., and Shazeer, N. (2018). Generating Wikipedia by Summarizing Long Sequences. arXiv."},{"key":"ref_83","unstructured":"Radford, A., and Narasimhan, K. (2021, December 28). Improving Language Understanding by Generative Pre-Training. OpenAI Blog. Available online: https:\/\/www.cs.ubc.ca\/~amuham01\/LING530\/papers\/radford2018improving.pdf."},{"key":"ref_84","unstructured":"Von Platen, P. (2021, December 28). Transformers-Based Encoder-Decoder Models. Available online: https:\/\/huggingface.co\/blog\/encoder-decoder."},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. (November, January 31). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium.","DOI":"10.18653\/v1\/W18-5446"},{"key":"ref_86","unstructured":"Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020, January 6\u201312). Language Models are Few-Shot Learners. Proceedings of the 34th Annual Conference on Neural Information Processing Systems, Online event."},{"key":"ref_87","first-page":"1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_88","unstructured":"Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T.L., and Raja, A. (2021). Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv."},{"key":"ref_89","unstructured":"He, P., Liu, X., Gao, J., and Chen, W. (2021, January 4\u20138). DeBERTa: Decoding-Enhanced BERT with Disentangled Attention. Proceedings of the 2021 International Conference on Learning Representations (ICLR 2021), Vienna, Austria."},{"key":"ref_90","unstructured":"Sun, Y., Wang, S., Feng, S., Ding, S., Pang, C., Shang, J., Liu, J., Chen, X., Zhao, Y., and Lu, Y. (2021). ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv."},{"key":"ref_91","unstructured":"Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., and Salakhutdinov, R. (August, January 28). Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_92","unstructured":"Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., and Le, Q.V. (2021). Finetuned Language Models Are Zero-Shot Learners. arXiv."},{"key":"ref_93","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1007\/978-3-319-24261-3_12","article-title":"Transitive assignment kernels for structural classification","volume":"9370","author":"Schiavinato","year":"2015","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1109\/TKDE.2018.2807452","article-title":"A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications","volume":"30","author":"Cai","year":"2018","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_95","unstructured":"Battaglia, P.W., Hamrick, J.B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V.F., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., and Faulkner, R. (2018). Relational inductive biases, deep learning, and graph networks. arXiv."},{"key":"ref_96","unstructured":"Bruna, J., Zaremba, W., Szlam, A., and Lecun, Y. (2014, January 14\u201316). Spectral networks and locally connected networks on graphs. Proceedings of the International Conference on Learning Representations (ICLR 2014), Banff, AB, Canada."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1007\/978-3-662-44415-3_3","article-title":"Transitive State Alignment for the Quantum Jensen-Shannon Kernel","volume":"8621","author":"Torsello","year":"2014","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_98","first-page":"7370","article-title":"Graph Convolutional Networks for Text Classification","volume":"33","author":"Yao","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Church, K.W., and Hanks, P. (1989, January 26\u201329). Word Association Norms, Mutual Information, and Lexicography. Proceedings of the 27th Annual Meeting on Association for Computational Linguistics, Vancouver, BC, Canada.","DOI":"10.3115\/981623.981633"},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Lin, Y., Meng, Y., Sun, X., Han, Q., Kuang, K., Li, J., and Wu, F. (2021). BertGCN: Transductive Text Classification by Combining GNN and BERT. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2021.findings-acl.126"},{"key":"ref_101","first-page":"8544","article-title":"Message Passing Attention Networks for Document Understanding","volume":"34","author":"Nikolentzos","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_102","unstructured":"Gammerman, A., Vovk, V., and Vapnik, V. (1998, January 24\u201326). Learning by Transduction. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, USA."},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Huang, L., Ma, D., Li, S., Zhang, X., and Wang, H. (2019, January 3\u20137). Text Level Graph Neural Network for Text Classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1345"},{"key":"ref_104","unstructured":"Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., and Weinberger, K. (2019, January 13\u201316). Simplifying Graph Convolutional Networks. Proceedings of the 36th International Conference on Machine Learning, Irvine, CA, USA."},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Li, Q., Han, Z., and Wu, X.M. (2018, January 2\u20137). Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11604"},{"key":"ref_106","unstructured":"Chen, M., Wei, Z., Huang, Z., Ding, B., and Li, Y. (2020, January 13\u201318). Simple and Deep Graph Convolutional Networks. Proceedings of the 37th International Conference on Machine Learning, Online event."},{"key":"ref_107","unstructured":"Zhu, H., and Koniusz, P. (2021, January 4\u20138). Simple Spectral Graph Convolution. Proceedings of the 2021 International Conference on Learning Representations (ICLR 2021), Vienna, Austria."},{"key":"ref_108","doi-asserted-by":"crossref","unstructured":"Klicpera, J., Bojchevski, A., and G\u00fcnnemann, S. (2019). Predict then Propagate: Graph Neural Networks meet Personalized PageRank. arXiv.","DOI":"10.1145\/3394486.3403296"},{"key":"ref_109","doi-asserted-by":"crossref","unstructured":"Gasparetto, A., Cosmo, L., Rodola, E., Bronstein, M., and Torsello, A. (2017, January 10\u201312). Spatial Maps: From low rank spectral to sparse spatial functional representations. Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China.","DOI":"10.1109\/3DV.2017.00061"},{"key":"ref_110","doi-asserted-by":"crossref","unstructured":"Ethayarajh, K. (2019, January 3\u20137). How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1006"},{"key":"ref_111","unstructured":"Peters, M.E., Ammar, W., Bhagavatula, C., and Power, R. (August, January 31). Semi-supervised sequence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada."},{"key":"ref_112","unstructured":"McCann, B., Bradbury, J., Xiong, C., and Socher, R. (2017, January 4\u20139). Learned in Translation: Contextualized Word Vectors. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_113","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv."},{"key":"ref_114","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_115","doi-asserted-by":"crossref","unstructured":"Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., and Liu, Q. (2020, January 16\u201320). TinyBERT: Distilling BERT for Natural Language Understanding. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online event.","DOI":"10.18653\/v1\/2020.findings-emnlp.372"},{"key":"ref_116","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv."},{"key":"ref_117","unstructured":"Clark, K., Luong, M.T., Le, Q.V., and Manning, C.D. (May, January 26). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. Proceedings of the ICLR 2020: Eighth International Conference on Learning Representations, Online event."},{"key":"ref_118","doi-asserted-by":"crossref","unstructured":"Liu, J., Chang, W.C., Wu, Y., and Yang, Y. (2017, January 7\u201311). Deep Learning for Extreme Multi-Label Text Classification. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku Tokyo, Japan. SIGIR \u201917.","DOI":"10.1145\/3077136.3080834"},{"key":"ref_119","doi-asserted-by":"crossref","unstructured":"Zhang, W., Yan, J., Wang, X., and Zha, H. (2018, January 11\u201314). Deep Extreme Multi-Label Learning. Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, Yokohama, Japan.","DOI":"10.1145\/3206025.3206030"},{"key":"ref_120","first-page":"361","article-title":"RCV1: A New Benchmark Collection for Text Categorization Research","volume":"5","author":"Lewis","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_121","unstructured":"(2021, December 28). Wikipedia:Portal. Available online: https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Portal."},{"key":"ref_122","unstructured":"(2021, December 28). AG\u2019s Corpus of News Articles. Available online: http:\/\/groups.di.unipi.it\/~gulli\/AG_corpus_of_news_articles.html."},{"key":"ref_123","unstructured":"(2021, December 28). The 20 Newsgroups Data Set. Available online: http:\/\/qwone.com\/~jason\/20Newsgroups."},{"key":"ref_124","unstructured":"(2021, December 28). Ohsumed-R8-R52. Available online: https:\/\/www.kaggle.com\/weipengfei\/ohr8r52."},{"key":"ref_125","unstructured":"Zhang, X., Zhao, J., and LeCun, Y. (2015, January 7\u201312). Character-Level Convolutional Networks for Text Classification. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_126","unstructured":"(2021, December 28). Yelp Open Dataset: An all-Purpose Dataset for Learning. Available online: https:\/\/www.yelp.com\/dataset."},{"key":"ref_127","unstructured":"Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., and Potts, C. (2011, January 19\u201324). Learning Word Vectors for Sentiment Analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_128","doi-asserted-by":"crossref","unstructured":"Pang, B., Lee, L., and Vaithyanathan, S. (2002, January 6\u20137). Thumbs up? Sentiment Classification Using Machine Learning Techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA.","DOI":"10.3115\/1118693.1118704"},{"key":"ref_129","unstructured":"Li, X., and Roth, D. (September, January 24). Learning Question Classifiers. Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan."},{"key":"ref_130","doi-asserted-by":"crossref","unstructured":"Joachims, T. (1998, January 21\u201323). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany.","DOI":"10.1007\/BFb0026683"},{"key":"ref_131","unstructured":"(2021, December 28). Text Categorization Corpora. Available online: https:\/\/disi.unitn.it\/moschitti\/corpora.htm."},{"key":"ref_132","doi-asserted-by":"crossref","unstructured":"Sun, C., Qiu, X., Xu, Y., and Huang, X. (2019). How to Fine-Tune BERT for Text Classification?. Chinese Computational Linguistics, Springer International Publishing.","DOI":"10.1007\/978-3-030-32381-3_16"},{"key":"ref_133","unstructured":"Adhikari, A., Ram, A., Tang, R., and Lin, J. (2019). DocBERT: BERT for Document Classification. arXiv."},{"key":"ref_134","unstructured":"Xie, Q., Dai, Z., Hovy, E., Luong, M.T., and Le, Q.V. (2020). Unsupervised Data Augmentation for Consistency Training. arXiv."},{"key":"ref_135","first-page":"6940","article-title":"Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function","volume":"33","author":"Sachan","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_136","unstructured":"Le, H., Tran, T., and Venkatesh, S. (2019). Learning to Remember More with Less Memorization. arXiv."},{"key":"ref_137","doi-asserted-by":"crossref","unstructured":"Prabhu, A., Dognin, C., and Singh, M. (2019, January 3\u20137). Sampling Bias in Deep Active Classification: An Empirical Study. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1417"},{"key":"ref_138","doi-asserted-by":"crossref","unstructured":"Cer, D., Yang, Y., Kong, S.-y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., and Tar, C. (2018). Universal Sentence Encoder. arXiv.","DOI":"10.18653\/v1\/D18-2029"},{"key":"ref_139","doi-asserted-by":"crossref","unstructured":"Shin, B., Yang, H., and Choi, J.D. (2019, January 10\u201316). The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, Macao, China.","DOI":"10.24963\/ijcai.2019\/477"},{"key":"ref_140","doi-asserted-by":"crossref","unstructured":"Ionescu, R.T., and Butnaru, A. (2019, January 2\u20137). Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.","DOI":"10.18653\/v1\/N19-1033"},{"key":"ref_141","doi-asserted-by":"crossref","unstructured":"Yadav, R.K., Jiao, L., Granmo, O.C., and Goodwin, M. (2021, January 11). Enhancing Interpretable Clauses Semantically using Pretrained Word Representation. Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.blackboxnlp-1.19"},{"key":"ref_142","doi-asserted-by":"crossref","unstructured":"Ding, S., Shang, J., Wang, S., Sun, Y., Tian, H., Wu, H., and Wang, H. (2021, January 1\u20136). ERNIE-Doc: A Retrospective Long-Document Modeling Transformer. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online event.","DOI":"10.18653\/v1\/2021.acl-long.227"},{"key":"ref_143","unstructured":"Zaheer, M., Guruganesh, G., Dubey, A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., and Yang, L. (2020). Big Bird: Transformers for Longer Sequences. arXiv."},{"key":"ref_144","unstructured":"Thongtan, T., and Phienthrakul, T. (August, January 28). Sentiment Classification Using Document Embeddings Trained with Cosine Similarity. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy."},{"key":"ref_145","unstructured":"Sun, Z., Fan, C., Sun, X., Meng, Y., Wu, F., and Li, J. (2020). Neural Semi-supervised Learning for Text Classification Under Large-Scale Pretraining. arXiv."},{"key":"ref_146","unstructured":"Kowsari, K., Heidarysafa, M., Brown, D.E., Meimandi, K.J., and Barnes, L.E. (2018, January 9\u201311). RMDL: Random Multimodel Deep Learning for Classification. Proceedings of the 2nd International Conference on Information System and Data Mining, Lakeland, FL, USA."},{"key":"ref_147","unstructured":"Lu, H., Huang, S.H., Ye, T., and Guo, X. (2019). Graph Star Net for Generalized Multi-Task Learning. arXiv."},{"key":"ref_148","unstructured":"Johnson, R., and Zhang, T. (August, January 31). Deep Pyramid Convolutional Neural Networks for Text Categorization. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada."},{"key":"ref_149","doi-asserted-by":"crossref","unstructured":"Adhikari, A., Ram, A., Tang, R., and Lin, J. (2019, January 2\u20137). Rethinking Complex Neural Network Architectures for Document Classification. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.","DOI":"10.18653\/v1\/N19-1408"},{"key":"ref_150","unstructured":"Ren, H., and Lu, H. (2018). Compositional coding capsule network with k-means routing for text classification. arXiv."},{"key":"ref_151","unstructured":"Wang, S., Fang, H., Khabsa, M., Mao, H., and Ma, H. (2021). Entailment as Few-Shot Learner. arXiv."},{"key":"ref_152","doi-asserted-by":"crossref","unstructured":"Khodak, M., Saunshi, N., Liang, Y., Ma, T., Stewart, B.M., and Arora, S. (2018, January 15\u201320). A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1002"},{"key":"ref_153","unstructured":"Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., and Xu, B. (, January 11\u201316). Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling. Proceedings of the COLING 2016, the 26th International Conference on Computational: Technical Papers, Osaka, Japan. Technical Papers."},{"key":"ref_154","unstructured":"Johnson, R., and Zhang, T. (2016, January 19\u201324). Supervised and Semi-Supervised Text Categorization Using LSTM for Region Embeddings. Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_155","unstructured":"Han, K., Chen, J., Zhang, H., Xu, H., Peng, Y., Wang, Y., Ding, N., Deng, H., Gao, Y., and Guo, T. (2019). DELTA: A DEep learning based Language Technology plAtform. arXiv."},{"key":"ref_156","first-page":"2030","article-title":"Improving Document Classification with Multi-Sense Embeddings","volume":"325","author":"Gupta","year":"2020","journal-title":"Front. Artif. Intell. Appl."},{"key":"ref_157","unstructured":"Guidotti, E., and Ferrara, A. (2021). An Explainable Probabilistic Classifier for Categorical Data Inspired to Quantum Physics. arXiv."},{"key":"ref_158","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020, January 16\u201320). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online event.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_159","first-page":"145","article-title":"On the Stratification of Multi-label Data","volume":"Volume 6913","author":"Sechidis","year":"2011","journal-title":"Machine Learning and Knowledge Discovery in Databases"},{"key":"ref_160","unstructured":"Lu\u00eds Torgo, P.B., and Moniz, N. (2017, January 22). A Network Perspective on Stratification of Multi-Label Data. Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, Skopje, Macedonia."},{"key":"ref_161","first-page":"8018","article-title":"Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment","volume":"34","author":"Jin","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_162","unstructured":"Wang, B., Pan, B., Li, X., and Li, B. (2020). Towards Evaluating the Robustness of Chinese BERT Classifiers. arXiv."},{"key":"ref_163","unstructured":"Wang, B., Xu, C., Wang, S., Gan, Z., Cheng, Y., Gao, J., Awadallah, A.H., and Li, B. (2021, January 6\u201314). Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. Proceedings of the 35th Annual Conference on Neural Information Processing System (NeurIPS 2021), Online event."},{"key":"ref_164","unstructured":"Wang, B., Wang, S., Cheng, Y., Gan, Z., Jia, R., Li, B., and Liu, J. (2020). InfoBERT: Improving Robustness of Language Models from an Information Theoretic Perspective. arXiv."},{"key":"ref_165","unstructured":"Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., van den Driessche, G., Lespiau, J., Damoc, B., and Clark, A. (2021). Improving language models by retrieving from trillions of tokens. arXiv."},{"key":"ref_166","unstructured":"Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., and Saunders, W. (2021). WebGPT: Browser-assisted question-answering with human feedback. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/2\/83\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:17:39Z","timestamp":1760134659000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/2\/83"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,11]]},"references-count":166,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["info13020083"],"URL":"https:\/\/doi.org\/10.3390\/info13020083","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,11]]}}}