{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,3,8]],"date-time":"2024-03-08T00:47:27Z","timestamp":1709858847803},"reference-count":42,"publisher":"Wiley","issue":"6","license":[{"start":{"date-parts":[[2012,4,20]],"date-time":"2012-04-20T00:00:00Z","timestamp":1334880000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Am Soc Inf Sci Tec"],"published-print":{"date-parts":[[2012,6]]},"abstract":"<jats:p>This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the <jats:styled-content style=\"fixed-case\">L<\/jats:styled-content>inguistic <jats:styled-content style=\"fixed-case\">I<\/jats:styled-content>nquiry and <jats:styled-content style=\"fixed-case\">W<\/jats:styled-content>ord <jats:styled-content style=\"fixed-case\">C<\/jats:styled-content>ount (<jats:styled-content style=\"fixed-case\">LIWC<\/jats:styled-content>) tool, that have not previously been applied to the task of author profiling. As <jats:styled-content style=\"fixed-case\">LIWC<\/jats:styled-content> is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both as a single type feature set because of its psycholinguistic basis, and in combination with other feature sets, because it should be sufficiently different to add insight rather than redundancy. It was found that <jats:styled-content style=\"fixed-case\">LIWC<\/jats:styled-content> features were competitive with previously used feature types in identifying the first language of an author, and that combined feature sets including <jats:styled-content style=\"fixed-case\">LIWC<\/jats:styled-content> features consistently showed better accuracy rates and average <jats:italic><jats:styled-content style=\"fixed-case\">F<\/jats:styled-content><\/jats:italic> measures than were achieved by the same feature sets without the <jats:styled-content style=\"fixed-case\">LIWC<\/jats:styled-content> features. As a secondary issue, this study also examined how effectively first language classification scaled up to a larger number of possible languages. It was found that the classification scheme scaled up effectively to the entire 16 language collection from the <jats:styled-content style=\"fixed-case\">I<\/jats:styled-content>nternational <jats:styled-content style=\"fixed-case\">C<\/jats:styled-content>orpus of <jats:styled-content style=\"fixed-case\">L<\/jats:styled-content>earner <jats:styled-content style=\"fixed-case\">E<\/jats:styled-content>nglish, when compared with results achieved on just 5 languages in previous research.<\/jats:p>","DOI":"10.1002\/asi.22627","type":"journal-article","created":{"date-parts":[[2012,4,20]],"date-time":"2012-04-20T15:49:22Z","timestamp":1334936962000},"page":"1256-1269","source":"Crossref","is-referenced-by-count":7,"title":["Using psycholinguistic features for profiling first language of authors"],"prefix":"10.1002","volume":"63","author":[{"given":"Rosemary","family":"Torney","sequence":"first","affiliation":[{"name":"Internet Commerce Security Lab University of Ballarat  Australia"}]},{"given":"Peter","family":"Vamplew","sequence":"additional","affiliation":[{"name":"School of Science, Information Technology and Engineering University of Ballarat  Australia"}]},{"given":"John","family":"Yearwood","sequence":"additional","affiliation":[{"name":"School of Science, Information Technology and Engineering University of Ballarat  Australia"}]}],"member":"311","published-online":{"date-parts":[[2012,4,20]]},"reference":[{"key":"e_1_2_9_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2005.81"},{"key":"e_1_2_9_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1344411.1344413"},{"issue":"4","key":"e_1_2_9_4_1","first-page":"401","article-title":"Lexical predictors of personality type","volume":"17","author":"Argamon S.","year":"2005","journal-title":"Literary and Linguistic Computing"},{"key":"e_1_2_9_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1461928.1461959"},{"key":"e_1_2_9_6_1","first-page":"1","volume-title":"Proceedings of the AAAI Workshop of Learning for Text Categorization","author":"Argamon\u2010Engelson S.","year":"1998"},{"key":"e_1_2_9_7_1","unstructured":"Baayen H. vanHalteren H. Neijt A. &Tweedie F.(2002).An experiment in authorship attribution. InProceedings of JADT '02(pp.29\u201337). doi:10.1.1.131.6139"},{"key":"e_1_2_9_8_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.0956-7976.2004.00741.x"},{"key":"e_1_2_9_9_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139524599"},{"key":"e_1_2_9_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00994018"},{"key":"e_1_2_9_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/604264.604272"},{"key":"e_1_2_9_12_1","unstructured":"Estival D. Gaustad T. Pham S.B. Radford W. &Hutchinson B.(2007).TAT: An author profiling tool with application to Arabic emails. InProceedings of the Australasian Language Technology Workshop 2007(pp.21\u201330).http:\/\/www.aclweb.org\/anthology\u2010new\/U\/U07\/U07\u20101.pdf#page=31"},{"key":"e_1_2_9_13_1","first-page":"13","volume-title":"Discrete Methods in Epidemiology, DIMACS Series in Discrete Mathematics and Theoretical Computer Science","author":"Fradkin D.","year":"2006"},{"key":"e_1_2_9_14_1","volume-title":"International Corpus of Learner English","author":"Granger S.","year":"2002"},{"key":"e_1_2_9_15_1","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqm020"},{"issue":"4","key":"e_1_2_9_16_1","first-page":"35","article-title":"More effective web search using bigrams and trigrams","volume":"3","author":"Johnson D.","year":"2006","journal-title":"Webology"},{"key":"e_1_2_9_17_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000005"},{"key":"e_1_2_9_18_1","doi-asserted-by":"crossref","unstructured":"Kerremans K. Tang Y. Temmerman R. &Zhao G.(2005).Towards Ontology\u2010based E\u2010mail Fraud Detection. In:C.Bento A.Cardoso &G.Dias(Eds.) Proceedings of EPIA 2005 BAOSW Workshop of 12th Portuguese conference on AI Covilha Portugal pp.106\u2013111.http:\/\/ieeexplore.ieee.org\/xpls\/abs_all.jsp?arnumber=4145934","DOI":"10.1109\/EPIA.2005.341275"},{"issue":"1","key":"e_1_2_9_19_1","first-page":"2","article-title":"Computational methods in authorship attribution","volume":"60","author":"Koppel M.","year":"2008","journal-title":"Journal of the American Society for Information Science and Technology"},{"key":"e_1_2_9_20_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20961"},{"key":"e_1_2_9_21_1","unstructured":"Koppel M. Schler J. &Argamon S.(2010).Authorship attribution in the wild. doi:10.1007\/s10579\u2010009\u20109111\u20102"},{"key":"e_1_2_9_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081947"},{"key":"e_1_2_9_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1121949.1121951"},{"key":"e_1_2_9_24_1","first-page":"41","volume-title":"English with an accent: Language, ideology, and discrimination in the United States","author":"Lippi\u2010Green R.","year":"1997"},{"key":"e_1_2_9_25_1","doi-asserted-by":"crossref","unstructured":"Luyckx K. &Daelemans W.(2008).Authorship attribution and verification with many authors and limited data. InProceedings of the 22nd International Conference on Computational Linguistics Manchester England(pp.513\u2013520).http:\/\/dl.acm.org\/citation.cfm?id=1599146","DOI":"10.3115\/1599081.1599146"},{"key":"e_1_2_9_26_1","doi-asserted-by":"crossref","unstructured":"Ma L. Ofoghi B.Watters P. &Brown S.(2009). \u201cDetecting phishing emails using hybrid features\u201d inProceedings of the Cybercrime and Trustworthy Computing Workshop (CTC\u20102009) Brisbane Australia 2009.http:\/\/ieeexplore.ieee.org\/xpls\/abs_all.jsp?arnumber=5319188","DOI":"10.1109\/UIC-ATC.2009.103"},{"key":"e_1_2_9_27_1","unstructured":"Mason O.(2006).QTag. Retrieved fromhttp:\/\/phrasys.net\/uob\/om\/software"},{"key":"e_1_2_9_28_1","doi-asserted-by":"publisher","DOI":"10.1037\/0022\u20103514.84.4.857"},{"key":"e_1_2_9_29_1","unstructured":"Mukherjee A. &Liu B.(2010).Improving gender classification of blog authors. InProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (Cambridge USA October 9\u201311 2010). EMNLP '10. Association for Computational Linguistics Stroudsburg PA USA 207\u2013217.http:\/\/www.aclweb.org\/anthology\/D10\u20101021"},{"key":"e_1_2_9_30_1","doi-asserted-by":"publisher","DOI":"10.1177\/0146167203029005010"},{"key":"e_1_2_9_31_1","volume-title":"Linguistic Inquiry and Word Count","author":"Pennebaker J.W.","year":"2007"},{"key":"e_1_2_9_32_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.psych.54.101601.145041"},{"key":"e_1_2_9_33_1","unstructured":"Platt J.C.(1998).Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR\u2010TR\u201098\u201014.http:\/\/www.bradblock.com\/Sequential_Minimal_Optimization_A_Fast_Algorithm_for_Training_Support_Vector_Machine.pdf"},{"key":"e_1_2_9_34_1","first-page":"1","article-title":"The state of non\u2010traditional authorship attribution studies\u20142010: Some problems and solutions","author":"Rudman J.","year":"2010","journal-title":"Digital Humanities"},{"key":"e_1_2_9_35_1","unstructured":"Spracklin L.M. Inkpen D.Z. &Nayak A.(2008).Using the complexity of the distribution of lexical elements as a feature in authorship attribution.LREC 2008 Proceedings Marrakech Morocco.http:\/\/www.mercubuana.com\/03\/892_paper.pdf"},{"key":"e_1_2_9_36_1","doi-asserted-by":"crossref","unstructured":"Tsur O. &Rappoport A.(2007).Using classifier features for studying the effect of native language on the choice of written second language words.Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition Table of Contents Prague Czech Republic pp.9\u201316.http:\/\/dl.acm.org\/citation.cfm?id=1629797","DOI":"10.3115\/1629795.1629797"},{"key":"e_1_2_9_37_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218981"},{"key":"e_1_2_9_38_1","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/11.3.121"},{"key":"e_1_2_9_39_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139524735"},{"key":"e_1_2_9_40_1","doi-asserted-by":"publisher","DOI":"10.1108\/10662249910265025"},{"key":"e_1_2_9_41_1","volume-title":"Data mining: Practical machine learning tools and techniques with JAVA implementations","author":"Witten I.H.","year":"2005"},{"key":"e_1_2_9_42_1","unstructured":"Wong S.J. &Dras M.(2009).Contrastive analysis and native language identification.Proceedings of the Australasian Language Technology Association Workshop 2009 53\u201361.http:\/\/sia\u2010online1.mercubuana.ac.id\/48\/U09\u20101.pdf#page=60"},{"key":"e_1_2_9_43_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20316"}],"container-title":["Journal of the American Society for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fasi.22627","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/asi.22627","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,31]],"date-time":"2023-10-31T13:05:47Z","timestamp":1698757547000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/asi.22627"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4,20]]},"references-count":42,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2012,6]]}},"alternative-id":["10.1002\/asi.22627"],"URL":"https:\/\/doi.org\/10.1002\/asi.22627","archive":["Portico"],"relation":{},"ISSN":["1532-2882","1532-2890"],"issn-type":[{"value":"1532-2882","type":"print"},{"value":"1532-2890","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,4,20]]}}}