{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T18:47:39Z","timestamp":1772563659558,"version":"3.50.1"},"reference-count":59,"publisher":"PeerJ","license":[{"start":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T00:00:00Z","timestamp":1640304000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0"}],"funder":[{"DOI":"10.13039\/501100011898","name":"Marianne and Marcus Wallenberg Foundation","doi-asserted-by":"crossref","award":["MMW 2018.0059"],"award-info":[{"award-number":["MMW 2018.0059"]}],"id":[{"id":"10.13039\/501100011898","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100009244","name":"Stockholm University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100009244","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>\n                    We investigated emotion classification from brief video recordings from the GEMEP database wherein actors portrayed 18 emotions. Vocal features consisted of acoustic parameters related to frequency, intensity, spectral distribution, and durations. Facial features consisted of facial action units. We first performed a series of person-independent supervised classification experiments. Best performance (AUC = 0.88) was obtained by merging the output from the best unimodal vocal (Elastic Net, AUC = 0.82) and facial (Random Forest, AUC = 0.80) classifiers using a late fusion approach and the product rule method. All 18 emotions were recognized with above-chance recall, although recognition rates varied widely across emotions (\n                    <jats:italic>e.g<\/jats:italic>\n                    ., high for amusement, anger, and disgust; and low for shame). Multimodal feature patterns for each emotion are described in terms of the vocal and facial features that contributed most to classifier performance. Next, a series of exploratory unsupervised classification experiments were performed to gain more insight into how emotion expressions are organized. Solutions from traditional clustering techniques were interpreted using decision trees in order to explore which features underlie clustering. Another approach utilized various dimensionality reduction techniques paired with inspection of data visualizations. Unsupervised methods did not cluster stimuli in terms of emotion categories, but several explanatory patterns were observed. Some could be interpreted in terms of valence and arousal, but actor and gender specific aspects also contributed to clustering. Identifying explanatory patterns holds great potential as a meta-heuristic when unsupervised methods are used in complex classification tasks.\n                  <\/jats:p>","DOI":"10.7717\/peerj-cs.804","type":"journal-article","created":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T04:15:34Z","timestamp":1640319334000},"page":"e804","source":"Crossref","is-referenced-by-count":13,"title":["Comparing supervised and unsupervised approaches to multimodal emotion recognition"],"prefix":"10.7717","volume":"7","author":[{"given":"Marcos","family":"Fern\u00e1ndez Carbonell","sequence":"first","affiliation":[{"name":"Department of Software and Computer Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7949-1815","authenticated-orcid":true,"given":"Magnus","family":"Boman","sequence":"additional","affiliation":[{"name":"Department of Software and Computer Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden"},{"name":"Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8771-6818","authenticated-orcid":true,"given":"Petri","family":"Laukka","sequence":"additional","affiliation":[{"name":"Department of Psychology, Stockholm University, Stockholm, Sweden"}]}],"member":"4443","published-online":{"date-parts":[[2021,12,24]]},"reference":[{"key":"10.7717\/peerj-cs.804\/ref-1","first-page":"420","article-title":"On the surprising behavior of distance metrics in high dimensional space","volume-title":"Database theory \u2013 ICDT","author":"Aggarwal","year":"2001"},{"issue":"6","key":"10.7717\/peerj-cs.804\/ref-2","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1007\/s00530-010-0182-0","article-title":"Multimodal fusion for multimedia analysis: a survey","volume":"16","author":"Atrey","year":"2010","journal-title":"Multimedia Systems"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-3","doi-asserted-by":"publisher","first-page":"20284","DOI":"10.1038\/s41598-020-77117-8","article-title":"Comparing supervised and unsupervised approaches to emotion categorization in the human brain, body, and subjective experience","volume":"10","author":"Azari","year":"2020","journal-title":"Scientific Reports"},{"key":"10.7717\/peerj-cs.804\/ref-4","first-page":"1","article-title":"Cross-dataset learning and person-specific normalisation for automatic Action Unit detection","author":"Baltru\u0161aitis","year":"2015"},{"key":"10.7717\/peerj-cs.804\/ref-5","first-page":"59","article-title":"Openface 2.0: facial behavior analysis toolkit","author":"Baltru\u0161aitis","year":"2018"},{"issue":"5","key":"10.7717\/peerj-cs.804\/ref-6","doi-asserted-by":"publisher","first-page":"1161","DOI":"10.1037\/a0025827","article-title":"Introducing the Geneva multimodal expression corpus for experimental research on emotion perception","volume":"12","author":"B\u00e4nziger","year":"2012","journal-title":"Emotion"},{"key":"10.7717\/peerj-cs.804\/ref-7","first-page":"271","article-title":"Introducing the Geneva Multimodal Emotion Portrayal (GEMEP) corpus","volume-title":"Blueprint for Affective Computing: A Sourcebook","author":"B\u00e4nziger","year":"2010"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1177\/1529100619832930","article-title":"Emotional expressions reconsidered: challenges to inferring emotion from human facial movements","volume":"20","author":"Barrett","year":"2019","journal-title":"Psychological Science in the Public Interest"},{"key":"10.7717\/peerj-cs.804\/ref-9","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2021.3071503","article-title":"Exploring the contextual factors affecting multimodal emotion recognition in videos","author":"Bhattacharya","year":"2021","journal-title":"IEEE Transactions on Affective Computing"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/03610927408827101","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Calinski","year":"1974","journal-title":"Communications in Statistics"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-11","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1037\/emo0000302","article-title":"Universals and cultural variations in 22 emotional expressions across five cultures","volume":"18","author":"Cordaro","year":"2018","journal-title":"Emotion"},{"issue":"4","key":"10.7717\/peerj-cs.804\/ref-12","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1038\/s41562-019-0533-6","article-title":"The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures","volume":"3","author":"Cowen","year":"2019","journal-title":"Nature Human Behaviour"},{"key":"10.7717\/peerj-cs.804\/ref-13","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1038\/s41467-017-02597-8","article-title":"Cooperating with machines","volume":"9","author":"Crandall","year":"2018","journal-title":"Nature Communications"},{"issue":"3","key":"10.7717\/peerj-cs.804\/ref-14","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1145\/2682899","article-title":"A review and meta-analysis of multimodal affect detection systems","volume":"47","author":"D\u2019Mello","year":"2015","journal-title":"ACM Computing Surveys"},{"issue":"10","key":"10.7717\/peerj-cs.804\/ref-15","doi-asserted-by":"publisher","first-page":"881","DOI":"10.14778\/2732951.2732962","article-title":"From data fusion to knowledge fusion","volume":"7","author":"Dong","year":"2015","journal-title":"Proceedings of the VLDB Endowment"},{"key":"10.7717\/peerj-cs.804\/ref-16","volume-title":"Emotions revealed","author":"Ekman","year":"2003"},{"key":"10.7717\/peerj-cs.804\/ref-17","volume-title":"Facial action coding system: a technique for the measurement of facial movement","author":"Ekman","year":"1978"},{"issue":"2","key":"10.7717\/peerj-cs.804\/ref-18","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1037\/0033-2909.128.2.203","article-title":"On the universality and cultural specificity of emotion recognition: a meta-analysis","volume":"128","author":"Elfenbein","year":"2002","journal-title":"Psychological Bulletin"},{"issue":"2","key":"10.7717\/peerj-cs.804\/ref-19","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1109\/TAFFC.2015.2457417","article-title":"The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for voice research and affective computing","volume":"7","author":"Eyben","year":"2016","journal-title":"IEEE Transactions on Affective Computing"},{"key":"10.7717\/peerj-cs.804\/ref-20","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1145\/2502081.2502224","article-title":"Recent developments in openSMILE, the Munich open-source multimedia feature extractor","volume-title":"Proceedings of the 21st ACM International Conference on Multimedia","author":"Eyben","year":"2013"},{"key":"10.7717\/peerj-cs.804\/ref-21","first-page":"575","article-title":"Predicting treatment outcome from patient texts: The case of internet-based cognitive behavioural therapy","author":"Gogoulou","year":"2021"},{"key":"10.7717\/peerj-cs.804\/ref-22","volume-title":"Emotion in therapy: from science to practice","author":"Hofmann","year":"2016"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-23","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/34.824819","article-title":"Statistical pattern recognition: a review","volume":"22","author":"Jain","year":"2000","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"10.7717\/peerj-cs.804\/ref-24","first-page":"245","article-title":"Facing imbalanced data recommendations for the use of performance metrics","author":"Jeni","year":"2013"},{"key":"10.7717\/peerj-cs.804\/ref-25","volume-title":"Emotions and affect in human factors and human-computer interaction","author":"Jeon","year":"2017"},{"issue":"5","key":"10.7717\/peerj-cs.804\/ref-26","doi-asserted-by":"publisher","first-page":"770","DOI":"10.1037\/0033-2909.129.5.770","article-title":"Communication of emotion in vocal expression and music performance: different channels, same code?","volume":"129","author":"Juslin","year":"2003","journal-title":"Psychological Bulletin"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10919-017-0268-x","article-title":"The mirror to our soul? Comparisons of spontaneous and posed vocal expression of emotion","volume":"42","author":"Juslin","year":"2018","journal-title":"Journal of Nonverbal Behavior"},{"key":"10.7717\/peerj-cs.804\/ref-28","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511974175","volume-title":"Multivariable analysis: a practical guide for clinicians and public health researchers","author":"Katz","year":"2011"},{"issue":"2","key":"10.7717\/peerj-cs.804\/ref-29","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1037\/emo0000712","article-title":"Emotion recognition from posed and spontaneous dynamic expressions: human observers versus machine analysis","volume":"21","author":"Krumhuber","year":"2021a","journal-title":"Emotion"},{"issue":"2","key":"10.7717\/peerj-cs.804\/ref-30","doi-asserted-by":"publisher","first-page":"686","DOI":"10.3758\/s13428-020-01443-y","article-title":"Human and machine validation of 14 databases of dynamic facial expressions","volume":"53","author":"Krumhuber","year":"2021b","journal-title":"Behavior Research Methods"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-31","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1177\/1754073919897295","article-title":"Cross-cultural emotion recognition and in-group advantage in vocal expression: a meta-analysis","volume":"13","author":"Laukka","year":"2021","journal-title":"Emotion Review"},{"key":"10.7717\/peerj-cs.804\/ref-32","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2020.2981446","article-title":"Deep facial expression recognition: a survey","author":"Li","year":"2020","journal-title":"IEEE Transactions on Affective Computing"},{"issue":"4","key":"10.7717\/peerj-cs.804\/ref-33","doi-asserted-by":"publisher","first-page":"410","DOI":"10.1109\/TAFFC.2016.2635124","article-title":"Asynchronous and event-based fusion systems for affect recognition on naturalistic data in comparison to conventional approaches","volume":"9","author":"Lingenfelser","year":"2018","journal-title":"IEEE Transactions on Affective Computing"},{"key":"10.7717\/peerj-cs.804\/ref-34","first-page":"911","article-title":"Understanding of internal clustering validation measures","author":"Liu","year":"2010"},{"key":"10.7717\/peerj-cs.804\/ref-35","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2020.3000510","article-title":"Multi-fusion residual memory network for multimodal human sentiment comprehension","author":"Mai","year":"2020","journal-title":"IEEE Transactions on Affective Computing"},{"key":"10.7717\/peerj-cs.804\/ref-36","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1007\/978-3-030-16272-6_11","article-title":"Survey on AI based multimodal methods for emotion detection","volume-title":"High-performance Modelling and Simulation for Big Data Applications","author":"Marechal","year":"2019"},{"issue":"3","key":"10.7717\/peerj-cs.804\/ref-37","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1109\/TAFFC.2017.2731763","article-title":"Automatic analysis of facial actions: a survey","volume":"10","author":"Martinez","year":"2019","journal-title":"IEEE Transactions on Affective Computing"},{"key":"10.7717\/peerj-cs.804\/ref-38","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2018"},{"key":"10.7717\/peerj-cs.804\/ref-39","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1016\/j.inffus.2017.02.003","article-title":"A review of affective computing: from unimodal analysis to multimodal fusion","volume":"37","author":"Poria","year":"2017","journal-title":"Information Fusion"},{"key":"10.7717\/peerj-cs.804\/ref-40","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"Journal of Computational and Applied Mathematics"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-41","doi-asserted-by":"publisher","first-page":"329","DOI":"10.1146\/annurev.psych.54.101601.145102","article-title":"Facial and vocal expressions of emotion","volume":"54","author":"Russell","year":"2003","journal-title":"Annual Review of Psychology"},{"key":"10.7717\/peerj-cs.804\/ref-42","article-title":"TreeInterpreter","author":"Saabas","year":"2015"},{"key":"10.7717\/peerj-cs.804\/ref-43","first-page":"145","article-title":"Emotion theories and concepts (psychological perspectives)","volume-title":"Oxford Companion to Emotion and the Affective Sciences","author":"Scherer","year":"2009"},{"issue":"5","key":"10.7717\/peerj-cs.804\/ref-44","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1145\/3129340","article-title":"Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends","volume":"61","author":"Schuller","year":"2018","journal-title":"Communications of the ACM"},{"issue":"6","key":"10.7717\/peerj-cs.804\/ref-45","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1016\/j.csl.2018.02.004","article-title":"Affective and behavioural computing: lessons learnt from the first computational paralinguistics challenge","volume":"53","author":"Schuller","year":"2019","journal-title":"Computer Speech and Language"},{"key":"10.7717\/peerj-cs.804\/ref-46","article-title":"Hierarchical clustering (scipy.cluster.hierarchy.linkage)","author":"SciPy","year":"2019"},{"key":"10.7717\/peerj-cs.804\/ref-47","article-title":"A tutorial on principal component analysis","author":"Shlens","year":"2014","journal-title":"ArXiv"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-48","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1631\/FITEE.1700826","article-title":"From Eliza to XiaoIce: challenges and opportunities with social chatbots","volume":"19","author":"Shum","year":"2018","journal-title":"Frontiers of Information Technology and Electronic Engineering"},{"key":"10.7717\/peerj-cs.804\/ref-49","doi-asserted-by":"publisher","first-page":"176274","DOI":"10.1109\/ACCESS.2020.3026823","article-title":"Multimodal emotion recognition with transformer-based self supervised feature fusion","volume":"8","author":"Siriwardhana","year":"2020","journal-title":"IEEE Access"},{"key":"10.7717\/peerj-cs.804\/ref-50","volume-title":"Large scale machine learning with Python","author":"Sjardin","year":"2016"},{"issue":"3","key":"10.7717\/peerj-cs.804\/ref-51","doi-asserted-by":"publisher","first-page":"707","DOI":"10.1109\/TAFFC.2018.2887267","article-title":"Cross-cultural and cultural-specific production and perception of facial expressions of emotion in the wild","volume":"12","author":"Srinivasan","year":"2021","journal-title":"IEEE Transactions on Affective Computing"},{"issue":"8","key":"10.7717\/peerj-cs.804\/ref-52","doi-asserted-by":"publisher","first-page":"1301","DOI":"10.1109\/JSTSP.2017.2764438","article-title":"End-to-end multimodal emotion recognition using deep neural networks","volume":"11","author":"Tzirakis","year":"2017","journal-title":"IEEE Journal of Selected Topics in Signal Processing"},{"issue":"4","key":"10.7717\/peerj-cs.804\/ref-53","doi-asserted-by":"publisher","first-page":"966","DOI":"10.1109\/TSMCB.2012.2200675","article-title":"Meta-analysis of the first facial expression recognition challenge","volume":"42","author":"Valstar","year":"2012","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)"},{"key":"10.7717\/peerj-cs.804\/ref-54","first-page":"2579","article-title":"Visualizing data using t-sne","volume":"9","author":"van der Maaten","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"10.7717\/peerj-cs.804\/ref-55","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems 30 (NIPS 2017)","author":"Vaswani","year":"2017"},{"issue":"2","key":"10.7717\/peerj-cs.804\/ref-56","doi-asserted-by":"publisher","first-page":"324","DOI":"10.1016\/j.neucom.2020.01.017","article-title":"Joint low rank embedded multiple features learning for audio-visual emotion recognition","volume":"388","author":"Wang","year":"2020","journal-title":"Neurocomputing"},{"issue":"2","key":"10.7717\/peerj-cs.804\/ref-57","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1016\/j.imavis.2012.03.001","article-title":"LSTM-modeling of continuous emotions in an audiovisual affect recognition framework","volume":"31","author":"W\u00f6llmer","year":"2013","journal-title":"Image and Vision Computing"},{"issue":"1","key":"10.7717\/peerj-cs.804\/ref-58","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TVCG.2017.2744878","article-title":"Visualizing dataflow graphs of deep learning models in TensorFlow","volume":"24","author":"Wongsuphasawat","year":"2018","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"key":"10.7717\/peerj-cs.804\/ref-59","doi-asserted-by":"publisher","first-page":"97515","DOI":"10.1109\/ACCESS.2019.2928625","article-title":"Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition","volume":"7","author":"Zhao","year":"2019","journal-title":"IEEE Access"}],"container-title":["PeerJ Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/peerj.com\/articles\/cs-804.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-804.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-804.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-804.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T04:15:50Z","timestamp":1640319350000},"score":1,"resource":{"primary":{"URL":"https:\/\/peerj.com\/articles\/cs-804"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,24]]},"references-count":59,"alternative-id":["10.7717\/peerj-cs.804"],"URL":"https:\/\/doi.org\/10.7717\/peerj-cs.804","archive":["CLOCKSS","LOCKSS","Portico"],"relation":{"has-review":[{"id-type":"doi","id":"10.7287\/peerj-cs.804v0.1\/reviews\/2","asserted-by":"object"},{"id-type":"doi","id":"10.7287\/peerj-cs.804v0.1\/reviews\/3","asserted-by":"object"},{"id-type":"doi","id":"10.7287\/peerj-cs.804v0.1\/reviews\/1","asserted-by":"object"},{"id-type":"doi","id":"10.7287\/peerj-cs.804v0.2\/reviews\/3","asserted-by":"object"}]},"ISSN":["2376-5992"],"issn-type":[{"value":"2376-5992","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,24]]},"article-number":"e804"}}