{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T21:59:18Z","timestamp":1775253558189,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"name":"FAIR\u2014Future Artificial Intelligence Research"},{"name":"European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR)\u2014MISSIONE 4 COMPONENTE 2, INVESTIMENTO","award":["1.3\u2014D.D. 1555 11\/10\/2022, PE00000013"],"award-info":[{"award-number":["1.3\u2014D.D. 1555 11\/10\/2022, PE00000013"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>To promote the responsible development and use of data-driven technologies\u2014such as machine learning and artificial intelligence\u2014principles of trustworthiness, accountability, and fairness should be followed. The quality of the dataset on which these applications rely is crucial to achieve compliance with the required ethical principles. Quantitative approaches to measure data quality are abundant in the literature and among practitioners; however, they are not sufficient to cover all the principles and ethical challenges involved.<\/jats:p>\n          <jats:p>In this article, we show that complementing data quality with measurable dimensions of data documentation and of data balance helps to cover a wider range of ethical challenges connected to the use of datasets in algorithms. A synthetic report of the metrics applied (the Extended Data Brief) and a set of Risk Labels for the Ethical Challenges provide a practical overview of the potential ethical harms due to data composition. We believe that the proposed data labeling scheme will enable practitioners to improve the overall quality of datasets and to build more responsible data-driven software systems.<\/jats:p>","DOI":"10.1145\/3726872","type":"journal-article","created":{"date-parts":[[2025,3,29]],"date-time":"2025-03-29T09:56:45Z","timestamp":1743242205000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Experience: Bridging Data Measurement and Ethical Challenges with Extended Data Briefs"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-8819-3623","authenticated-orcid":false,"given":"Marco","family":"Rondina","sequence":"first","affiliation":[{"name":"DAUIN-Department of Control and Computer Engineering, Politecnico di Torino","place":["Torino, Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2027-3308","authenticated-orcid":false,"given":"Antonio","family":"Vetr\u00f2","sequence":"additional","affiliation":[{"name":"DAUIN-Department of Control and Computer Engineering, Politecnico di Torino","place":["Torino, Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6108-9940","authenticated-orcid":false,"given":"Alessandro","family":"Fabris","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Cyber Security and Privacy","place":["Bochum, Germany"]},{"name":"Department of Mathematics Informatics and Geoscience, Universit\u00e0 degli Studi di Trieste","place":["Bochum, Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4970-4554","authenticated-orcid":false,"given":"Gianmaria","family":"Silvello","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, Universit\u00e0 degli Studi di Padova","place":["Padova, Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5739-9639","authenticated-orcid":false,"given":"Gian Antonio","family":"Susto","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, Universit\u00e0 degli Studi di Padova","place":["Padova, Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5328-368X","authenticated-orcid":false,"given":"Marco","family":"Torchiano","sequence":"additional","affiliation":[{"name":"DAUIN-Department of Control and Computer Engineering, Politecnico di Torino","place":["Torino, Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7867-1926","authenticated-orcid":false,"given":"Juan Carlos","family":"De Martin","sequence":"additional","affiliation":[{"name":"DAUIN-Department of Control and Computer Engineering, Politecnico di Torino","place":["Torino, Italy"]}]}],"member":"320","published-online":{"date-parts":[[2025,6,24]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","unstructured":"Jack Bandy and Nicholas Vincent. 2021. Addressing \u201cDocumentation Debt\u201d in Machine Learning Research: A Retrospective Datasheet for BookCorpus. DOI:10.48550\/arXiv.2105.05241arxiv:2105.05241 [cs]","DOI":"10.48550\/arXiv.2105.05241"},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","unstructured":"Michelle Bao Angela Zhou Samantha Zottola Brian Brubach Sarah Desmarais Aaron Horowitz Kristian Lum and Suresh Venkatasubramanian. 2022. It\u2019s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks. DOI:10.48550\/arXiv.2106.05498arxiv:2106.05498 [cs]","DOI":"10.48550\/arXiv.2106.05498"},{"key":"e_1_3_4_4_2","first-page":"19","volume-title":"Data Quality","author":"Batini Carlo","year":"2006","unstructured":"Carlo Batini and Monica Scannapieca. 2006. Data quality dimensions. In Data Quality. Springer, Berlin, 19\u201349. https:\/\/link.springer.com\/chapter\/10.1007\/3-540-33173-5_2"},{"key":"e_1_3_4_5_2","doi-asserted-by":"publisher","unstructured":"Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 2018 6 (2018) 587\u2013604. DOI:10.1162\/tacl_a_00041","DOI":"10.1162\/tacl_a_00041"},{"key":"e_1_3_4_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","unstructured":"Elena Beretta Antonio Santangelo Bruno Lepri Antonio Vetr\u00f2 and Juan Carlos De Martin. 2019. The Invisible Power of Fairness. How Machine Learning Shapes Democracy. DOI:10.48550\/arXiv.1903.09493arxiv:1903.09493 [cs stat]","DOI":"10.48550\/arXiv.1903.09493"},{"key":"e_1_3_4_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445940"},{"key":"e_1_3_4_9_2","first-page":"77","volume-title":"Proceedings of the 1st Conference on Fairness, Accountability, and Transparency","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability, and Transparency. PMLR, New York, NY, USA, 77\u201391. https:\/\/proceedings.mlr.press\/v81\/buolamwini18a.html"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40300-016-0088-5"},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","unstructured":"Zhenpeng Chen Jie M. Zhang Federica Sarro and Mark Harman. 2023. A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers. DOI:10.48550\/arXiv.2207.03277arxiv:2207.03277 [cs]","DOI":"10.48550\/arXiv.2207.03277"},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","DOI":"10.2307\/j.ctv1ghv45t"},{"key":"e_1_3_4_13_2","volume-title":"Ethics Guidelines for Trustworthy AI | Shaping Europe\u2019s Digital Future","author":"Commission European","year":"2019","unstructured":"European Commission. 2019. Ethics Guidelines for Trustworthy AI | Shaping Europe\u2019s Digital Future. Technical Report. European Commission. https:\/\/digital-strategy.ec.europa.eu\/en\/library\/ethics-guidelines-trustworthy-ai"},{"key":"e_1_3_4_14_2","unstructured":"European Union Agency for Fundamental Rights. 2015. EU Charter of Fundamental Rights\u2014Title III: Quality\u2014Article 21\u2014Non-discrimination. http:\/\/fra.europa.eu\/en\/eu-charter\/article\/21-non-discrimination"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-022-00854-z"},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3362121"},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","unstructured":"Jessica Fjeld Nele Achten Hannah Hilligoss Adam Nagy and Madhulika Srikumar. 2020. Principled Artificial Intelligence: Mapping Consensus in Ethical and Rights-Based Approaches to Principles for AI. DOI:10.2139\/ssrn.3518482","DOI":"10.2139\/ssrn.3518482"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11023-018-9482-5"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458723"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.239"},{"key":"e_1_3_4_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533184"},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","unstructured":"Sarah Holland Ahmed Hosny Sarah Newman Joshua Joseph and Kasia Chmielinski. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. DOI:10.48550\/arXiv.1805.03677arxiv:1805.03677","DOI":"10.48550\/arXiv.1805.03677"},{"key":"e_1_3_4_23_2","unstructured":"ISO. 2008. Software Engineering\u2014Software Product Quality Requirements and Evaluation (SQuaRE)\u2014Data Quality Model (ISO-IEC 25012-2008). https:\/\/www.iso.org\/standard\/35736.html"},{"key":"e_1_3_4_24_2","unstructured":"ISO. 2014. Systems and Software Engineering\u2014Systems and Software Quality Requirements and Evaluation (SQuaRE)\u2014Guide to SQuaRE (ISO-IEC 25000-2014). https:\/\/www.iso.org\/standard\/64764.html"},{"key":"e_1_3_4_25_2","unstructured":"ISO. 2015. Systems and Software Engineering\u2014Systems and Software Quality Requirements and Evaluation (SQuaRE)\u2014Measurement of Data Quality (ISO-IEC 25024-2015). https:\/\/www.iso.org\/standard\/35749.html"},{"key":"e_1_3_4_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406477"},{"key":"e_1_3_4_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3372829"},{"key":"e_1_3_4_28_2","doi-asserted-by":"publisher","DOI":"10.1177\/1833358318774357"},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","unstructured":"Bernard Koch Emily Denton Alex Hanna and Jacob G. Foster. 2021. Reduced Reused and Recycled: The Life of a Dataset in Machine Learning Research. DOI:10.48550\/arXiv.2112.01716arxiv:2112.01716 [cs stat]","DOI":"10.48550\/arXiv.2112.01716"},{"key":"e_1_3_4_30_2","doi-asserted-by":"publisher","DOI":"10.1108\/DPRG-03-2021-0047"},{"key":"e_1_3_4_31_2","unstructured":"Jeff Larson Surya Mattu Lauren Kirchner and Julia Angwin. 2016. How We Analyzed the COMPAS Recidivism Algorithm. https:\/\/www.propublica.org\/article\/how-we-analyzed-the-compas-recidivism-algorithm"},{"key":"e_1_3_4_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3252370"},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData52589.2021.9671443"},{"key":"e_1_3_4_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3530787"},{"key":"e_1_3_4_35_2","doi-asserted-by":"publisher","unstructured":"Margaret Mitchell Alexandra Sasha Luccioni Nathan Lambert Marissa Gerchick Angelina McMillan-Major Ezinwanne Ozoani Nazneen Rajani Tristan Thrush Yacine Jernite and Douwe Kiela. 2023. Measuring Data. DOI:10.48550\/arXiv.2212.05129arxiv:2212.05129 [cs]","DOI":"10.48550\/arXiv.2212.05129"},{"key":"e_1_3_4_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3287560.3287596"},{"key":"e_1_3_4_37_2","doi-asserted-by":"publisher","DOI":"10.1177\/2053951716679679"},{"key":"e_1_3_4_38_2","volume-title":"Artificial Intelligence in Society","year":"2019","unstructured":"OECD. 2019. Artificial Intelligence in Society. Organisation for Economic Co-operation and Development, Paris, France. https:\/\/www.oecd-ilibrary.org\/science-and-technology\/artificial-intelligence-in-society_eedfee77-en"},{"key":"e_1_3_4_39_2","volume-title":"Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy","author":"O\u2019Neil Cathy","year":"2016","unstructured":"Cathy O\u2019Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group."},{"key":"e_1_3_4_40_2","unstructured":"Pew Research Center. 2018. Public Attitudes Toward Computer Algorithms. https:\/\/www.pewresearch.org\/internet\/2018\/11\/16\/public-attitudes-toward-computer-algorithms\/"},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","unstructured":"Giada Pistilli Carlos Munoz Ferrandis Yacine Jernite and Margaret Mitchell. 2023. Stronger Together: On the Articulation of Ethical Charters Legal Tools and Technical Documentation in ML. DOI:10.1145\/3593013.3594002arxiv:2305.18615 [cs]","DOI":"10.1145\/3593013.3594002"},{"key":"e_1_3_4_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533231"},{"key":"e_1_3_4_43_2","doi-asserted-by":"publisher","DOI":"10.4301\/S1807-1775202017003"},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","unstructured":"John Richards David Piorkowski Michael Hind Stephanie Houde and Aleksandra Mojsilovi\u0107. 2020. A Methodology for Creating AI FactSheets. DOI:10.48550\/arXiv.2006.13796arxiv:2006.13796","DOI":"10.48550\/arXiv.2006.13796"},{"key":"e_1_3_4_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-49008-8_7"},{"key":"e_1_3_4_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445518"},{"key":"e_1_3_4_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/InfRKM.2012.6204995"},{"key":"e_1_3_4_48_2","unstructured":"Antonio Vetr\u00f2. 2021. Imbalanced data as risk factor of discriminating automated decisions: A measurement-based approach. Journal of Intellectual Property Information Technology and Electronic Commerce Law 12 4 (Dec.2021) 272\u2013288. https:\/\/www.jipitec.eu\/jipitec\/article\/view\/325"},{"key":"e_1_3_4_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.giq.2021.101619"},{"key":"e_1_3_4_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3193568"},{"key":"e_1_3_4_51_2","doi-asserted-by":"publisher","unstructured":"Meike Zehlike Ke Yang and Julia Stoyanovich. 2021. Fairness in Ranking: A Survey. DOI:10.48550\/arXiv.2103.14000arxiv:2103.14000 [cs]","DOI":"10.48550\/arXiv.2103.14000"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3726872","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T12:16:24Z","timestamp":1750767384000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3726872"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,24]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3726872"],"URL":"https:\/\/doi.org\/10.1145\/3726872","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,24]]},"assertion":[{"value":"2023-11-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}