{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T22:04:14Z","timestamp":1777413854596,"version":"3.51.4"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,5,22]],"date-time":"2020-05-22T00:00:00Z","timestamp":1590105600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2020,5,31]]},"abstract":"<jats:p>Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image\/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image\/text group can be viewed as a class. So the network can learn the fine granularity from every image\/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this article constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language-based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.<\/jats:p>","DOI":"10.1145\/3383184","type":"journal-article","created":{"date-parts":[[2020,5,25]],"date-time":"2020-05-25T22:07:21Z","timestamp":1590444441000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":477,"title":["Dual-path Convolutional Image-Text Embeddings with Instance Loss"],"prefix":"10.1145","volume":"16","author":[{"given":"Zhedong","family":"Zheng","sequence":"first","affiliation":[{"name":"University of Technology Sydney, Ultimo NSW, Australia"}]},{"given":"Liang","family":"Zheng","sequence":"additional","affiliation":[{"name":"The Australian National University, Australia"}]},{"given":"Michael","family":"Garrett","sequence":"additional","affiliation":[{"name":"CingleVue International Australia and Edith Cowan University, Joondalup WA, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0512-880X","authenticated-orcid":false,"given":"Yi","family":"Yang","sequence":"additional","affiliation":[{"name":"University of Technology Sydney, Ultimo NSW, Australia"}]},{"given":"Mingliang","family":"Xu","sequence":"additional","affiliation":[{"name":"Zhengzhou University, Zhengzhou, Henan, China"}]},{"given":"Yi-Dong","family":"Shen","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2020,5,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.321"},{"key":"e_1_2_1_2_1","volume-title":"Yuille","author":"Chen Liang-Chieh","year":"2016","unstructured":"Liang-Chieh Chen , George Papandreou , Iasonas Kokkinos , Kevin Murphy , and Alan L . Yuille . 2016 . Deeplab : Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2016. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1017"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3177745"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2508146"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2903661"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2607421"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.201"},{"key":"e_1_2_1_10_1","volume-title":"Jamie Ryan Kiros, and Sanja Fidler","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J. Fleet , Jamie Ryan Kiros, and Sanja Fidler . 2018 . VSE++: Improved visual-semantic embeddings. In Proceeding of BMVC ( 2018). Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE++: Improved visual-semantic embeddings. In Proceeding of BMVC (2018)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243316"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808205"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the NIPS.","author":"Frome Andrea","year":"2013","unstructured":"Andrea Frome , Greg S. Corrado , Jon Shlens , Samy Bengio , Jeff Dean , Tomas Mikolov , 2013 . Devise: A deep visual-semantic embedding model . In Proceedings of the NIPS. Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et al. 2013. Devise: A deep visual-semantic embedding model. In Proceedings of the NIPS."},{"key":"e_1_2_1_14_1","volume-title":"Dauphin","author":"Gehring Jonas","year":"2017","unstructured":"Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , and Yann N . Dauphin . 2017 . Convolutional sequence to sequence learning. In Proceedings of the ICML. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the ICML."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the AISTAT.","author":"Glorot Xavier","year":"2010","unstructured":"Xavier Glorot and Yoshua Bengio . 2010 . Understanding the difficulty of training deep feedforward neural networks . In Proceedings of the AISTAT. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the AISTAT."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the PETS.","author":"Gray Douglas","year":"2007","unstructured":"Douglas Gray , Shane Brennan , and Hai Tao . 2007 . Evaluating appearance models for recognition, reacquisition, and tracking . In Proceedings of the PETS. Douglas Gray, Shane Brennan, and Hai Tao. 2007. Evaluating appearance models for recognition, reacquisition, and tracking. In Proceedings of the PETS."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00750"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766042321814"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2466106"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2017.10.018"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2558463"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/2566972.2566993"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the NIPS.","author":"Hu Baotian","year":"2014","unstructured":"Baotian Hu , Zhengdong Lu , Hang Li , and Qingcai Chen . 2014 . Convolutional neural network architectures for matching natural language sentences . In Proceedings of the NIPS. Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Proceedings of the NIPS."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2760101"},{"key":"e_1_2_1_27_1","volume-title":"Ng","author":"Huang Eric H.","year":"2012","unstructured":"Eric H. Huang , Richard Socher , Christopher D. Manning , and Andrew Y . Ng . 2012 . Improving word representations via global context and multiple word prototypes. In Proceedings of the ACL. Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the ACL."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.767"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00645"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the NIPS.","author":"Karpathy Andrej","year":"2014","unstructured":"Andrej Karpathy , Armand Joulin , and Fei Fei F. Li . 2014 . Deep fragment embeddings for bidirectional image sentence mapping . In Proceedings of the NIPS. Andrej Karpathy, Armand Joulin, and Fei Fei F. Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In Proceedings of the NIPS."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299073"},{"key":"e_1_2_1_34_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Proceedings of the NIPS. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the NIPS."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_50"},{"key":"e_1_2_1_37_1","first-page":"2","article-title":"Learning label preserving binary codes for multimedia retrieval: A general approach","volume":"14","author":"Li Kai","year":"2017","unstructured":"Kai Li , Guo-Jun Qi , and Kien A. Hua . 2017 . Learning label preserving binary codes for multimedia retrieval: A general approach . ACM Trans. Multimedia Comput. Commun. Appl. 14 , 1 (2017), 2 . Kai Li, Guo-Jun Qi, and Kien A. Hua. 2017. Learning label preserving binary codes for multimedia retrieval: A general approach. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 2.","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.209"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.551"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.27"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the ECCV.","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C. Lawrence Zitnick . 2014. Microsoft coco: Common objects in context . In Proceedings of the ECCV. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the ECCV."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_17"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.06.006"},{"key":"e_1_2_1_44_1","first-page":"1","article-title":"Modality-invariant image-text embedding for image-sentence matching","volume":"15","author":"Liu Ruoyu","year":"2019","unstructured":"Ruoyu Liu , Yao Zhao , Shikui Wei , Liang Zheng , and Yi Yang . 2019 . Modality-invariant image-text embedding for image-sentence matching . ACM Trans. Multimedia Comput. Commun. Appl. 15 , 1 (2019), 1 -- 19 . DOI:https:\/\/doi.org\/10.1145\/3300939 10.1145\/3300939 Ruoyu Liu, Yao Zhao, Shikui Wei, Liang Zheng, and Yi Yang. 2019. Modality-invariant image-text embedding for image-sentence matching. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 1--19. DOI:https:\/\/doi.org\/10.1145\/3300939","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2593344"},{"key":"e_1_2_1_46_1","volume-title":"Lew","author":"Liu Yu","year":"2017","unstructured":"Yu Liu , Yanming Guo , Erwin M. Bakker , and Michael S . Lew . 2017 . Learning a recurrent residual fusion network for multimodal matching. In Proceedings of the ICCV. Yu Liu, Yanming Guo, Erwin M. Bakker, and Michael S. Lew. 2017. Learning a recurrent residual fusion network for multimodal matching. In Proceedings of the ICCV."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.301"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the ICLR.","author":"Mao Junhua","year":"2015","unstructured":"Junhua Mao , Wei Xu , Yi Yang , Jiang Wang , Zhiheng Huang , and Alan Yuille . 2015 . Deep captioning with multimodal recurrent neural networks (m-rnn) . In Proceedings of the ICLR. Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In Proceedings of the ICLR."},{"key":"e_1_2_1_49_1","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.  Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2010-343"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.232"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.208"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.577"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the NAACL HLT. Association for Computational Linguistics, 139--147","author":"Rashtchian Cyrus","year":"2010","unstructured":"Cyrus Rashtchian , Peter Young , Micah Hodosh , and Julia Hockenmaier . 2010 . Collecting image annotations using Amazon\u2019s mechanical turk . In Proceedings of the NAACL HLT. Association for Computational Linguistics, 139--147 . Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon\u2019s mechanical turk. In Proceedings of the NAACL HLT. Association for Computational Linguistics, 139--147."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873987"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.13"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_58_1","volume-title":"Jacobs","author":"Sharma Abhishek","year":"2012","unstructured":"Abhishek Sharma , Abhishek Kumar , Hal Daume , and David W . Jacobs . 2012 . Generalized multiview analysis: A discriminative latent space. In Proceedings of the CVPR. Abhishek Sharma, Abhishek Kumar, Hal Daume, and David W. Jacobs. 2012. Generalized multiview analysis: A discriminative latent space. In Proceedings of the CVPR."},{"key":"e_1_2_1_59_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556."},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the ACM MM.","author":"Vedaldi A.","unstructured":"A. Vedaldi and K. Lenc . 2015. MatConvNet\u2014Convolutional neural networks for MATLAB . In Proceedings of the ACM MM. A. Vedaldi and K. Lenc. 2015. MatConvNet\u2014Convolutional neural networks for MATLAB. In Proceedings of the ACM MM."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2587640"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3115432"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2592800"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.261"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.541"},{"key":"e_1_2_1_67_1","unstructured":"Liwei Wang Yin Li and Svetlana Lazebnik. 2017. Learning two-branch neural networks for image-text matching tasks. arXiv:1704.03470.  Liwei Wang Yin Li and Svetlana Lazebnik. 2017. Learning two-branch neural networks for image-text matching tasks. arXiv:1704.03470."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732296.2732301"},{"key":"e_1_2_1_69_1","first-page":"449","article-title":"Cross-modal retrieval with cnn visual features: A new baseline","volume":"47","author":"Wei Yunchao","year":"2017","unstructured":"Yunchao Wei , Yao Zhao , Canyi Lu , Shikui Wei , Luoqi Liu , Zhenfeng Zhu , and Shuicheng Yan . 2017 . Cross-modal retrieval with cnn visual features: A new baseline . IEEE Trans. Cybernet. 47 , 2 (2017), 449 -- 460 . Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2017. Cross-modal retrieval with cnn visual features: A new baseline. IEEE Trans. Cybernet. 47, 2 (2017), 449--460.","journal-title":"IEEE Trans. Cybernet."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/2775109"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502097"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2891895"},{"key":"e_1_2_1_73_1","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey etal 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.  Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et al. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298966"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2602938"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2861991"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.170"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00166"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2627806"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10590-1_54"},{"key":"e_1_2_1_81_1","volume-title":"Proceedings of the NIPS.","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang , Junbo Zhao , and Yann LeCun . 2015 . Character-level convolutional networks for text classification . In Proceedings of the NIPS. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Proceedings of the NIPS."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.122"},{"key":"e_1_2_1_83_1","volume-title":"Hauptmann","author":"Zheng Liang","year":"2016","unstructured":"Liang Zheng , Yi Yang , and Alexander G . Hauptmann . 2016 . Person re-identification: Past , present, and future. arXiv:1610.02984. Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present, and future. arXiv:1610.02984."},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159171"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1033-7"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3383184","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3383184","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:00Z","timestamp":1750197720000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3383184"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,22]]},"references-count":85,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,5,31]]}},"alternative-id":["10.1145\/3383184"],"URL":"https:\/\/doi.org\/10.1145\/3383184","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,22]]},"assertion":[{"value":"2018-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}