Why worry about Data Ethics?
Data ethics addresses fundamental social, political, and technological questions at the heart of data production, data infrastructures, and data journeys. These issues are becoming ever more pressing as research, government, and civil society organisations come to work more and more with increasingly complex forms of data.
(For extended version of this text, please see Hajek, K.M., Trauttmansdorff, P., Leonelli, S., Guttinger, S., Milano S. (2025) How to Foster Responsible and Resilient Data. Computer [IEEE Computer Society] https://doi.ieeecomputersociety.org/10.1109/MC.2024.3522696.)
(Click on a heading below to discover more)
Injustice and Bias
Current practices for collecting, sharing, and interpreting data frequently include forms of bias and discrimination (not always intentionally) that reinforce existing social and global hierarchies and power imbalances. This constitutes a form of injustice, as it disproportionally impacts already marginalized groups. A well-known example is the underrepresentation of people of color in data sets used to train facial recognition algorithms. This leads to increased errors and potentially fatal consequences in contexts such as law enforcement and security. People, particularly from marginalized groups, are often misrepresented, made invisible, or subject to large-scale surveillance practices. The contexts in which data, algorithms, and technological systems operate matter and have wide-ranging discriminatory implications for social, political, and economic outcomes.
How can we include diverse populations and relevant local actors in making and using data? What forms of labour go into producing and maintaining data, and how can we ensure this work is properly recognised and non-exploitative?
More insights and resources:
- Data & Society: Trustworthy Infrastructures (Independent non-profit research organization)
- Data Solidarity Glossary (University of Vienna, Digital Transformations for Health Lab)
Inequity and Access
The push toward “bigger data” and data-driven innovation does not affect all regions and populations equally but instead tends to deepen global inequalities and the existing digital divide. Large-scale databases and infrastructures are mostly concentrated in wealthy, well-resourced areas and countries, with data access primarily limited to well-funded institutions. Data monopolies mean that closed companies or organizations collect and control large amounts of data, often for commercial purposes, without being transparent about whose data is collected or how it is stored and (re)used. Access to reliable digital infrastructures is limited in many parts of the world and segmented along class, ethnic, and gender lines. What is needed is a digital environment that values regional and local autonomy, and avoids replicating colonial-era power structures. What concrete measures can help promote transparent and democratic decision-making on access to data?
More insights and resources:
- UNESCO Chair on Diversity and Inclusion in Global Science (Leiden University)
- Data Solidarity Project (University of Vienna)
Privacy and Confidentiality
Privacy and confidentiality in data work are essential for safeguarding individuals’ autonomy and control over their personal information. Without effective principles of data privacy, the risks of misuse, unauthorized tracking and sharing, data breaches and manipulation increase, potentially exposing individuals and creating significant vulnerabilities. Even when information is anonymized, machine learning practices and AI algorithms can compromise confidentiality, revealing personal details through indirect proxy-information or patterns in aggregated data. In today’s increasingly automated data environments, many data practices thus lack transparency, with digital subjects left unaware of what information is collected, in what settings, or how it will be used. What mechanisms for data privacy and consent will be effective in building trust and confidence in data?
More insights and resources:
- European Digital Rights Network (EDRi) (European network of non-governmental organizations for defending rights and freedoms online)
Openness and Ownership
Advocates for greater openness have long called for increased sharing of resources, enhanced access to data pools and infrastructures, and the re-use of data. Yet unreflective forms of openness may also have unwanted effects such as limiting epistemic diversity and fostering epistemic injustice, for instance if “open information” is harvested without informed consent or proper attribution. Bigger data sets and unlimited access do not necessarily provide better scientific or societal results. Rather, they are often less representative and less reliable than smaller datasets that are carefully curated and produced under a responsible ethos of data work. In this way, treating data ethically overlaps with efforts to improve the quality and trustworthiness of datasets for research and other purposes, and efforts to invest in intelligent openness strategized and mediated by expert data stewards. The work of creating and maintaining data needs to be recognized and rewarded adequately—all too often this does not happen.
At the same time, we note that a focus on ownership obscures the question of whether data on individuals and communities should be treated as a tradable asset at all, rather than as a “common good.” Concrete measures are needed to preserve autonomy in data usage and sharing without data becoming commercialized and dominated by existing profit-oriented economic and regulatory regimes. This is particularly urgent given the difficulties in controlling data flows, especially when it comes to personal digitalized data – which are so easily copied, traded and mobilized, that it is hard if not impossible track their travels and identify who may be accountable for the use of those data.
More insights and resources:
- Creative Commons (CC) (Non-profit organization)
- The Open Data Institute (ODI) (Non-profit company)
- The Open and Universal Science (OPUS) project (EU-funded consortium)
Misuse and Error
The rapid expansion of large datasets and the growing automation of data analysis is not accompanied by proper auditing mechanisms to ensure data quality. Mistakes and errors are hard to track, especially in AI systems, and even when they are identified, correcting them can be challenging and costly across interconnected systems. The rise of automation further increases the risk of producing mistakes at a large scale, undermining trust and harming individuals or communities affected by such errors, particularly in highly sensitive areas like healthcare, migration, or criminal justice. Even small error rates can have significant consequences for thousands of people. The risk of misuse is amplified when data are presented or interpreted out of context and without a proper understanding of the limitations of data technologies and systems.
There is, moreover, significant diversity in the expertise and practices used to produce and make sense of data. Applying one standard across the board has the potential to degrade trust in certain areas of research and even hamper scientific and technological advancement.
More insights and resources:
- The 100 Questions: Disinformation (GovLab & OECD)
- Fostering Trustworthy Information (Boumans et al. 2024) (workshop report)
Environmental Damage and Sustainability
Current data ecosystems are dramatically unsustainable both in terms of their durability—significant resources are needed for ongoing maintenance and repair—and from an environmental standpoint. Yet, these issues have been largely ignored in public discussion, with digital solutions often proposed as the “clean alternative.” The increasing volume of data storage and demand for fast processing require immense energy and material resources, and depend on technologies (such as batteries and chips) that are unevenly produced and distributed across the globe. At the same time, electronic waste piles up in landfill. The rise of AI only intensifies these demands, with current machine learning models using vast amounts of energy and producing significant carbon emissions. Future datafication must balance technical imperatives against environmental harms—all while ensuring the financial and material outlays needed to keep data infrastructures operating reliably in the long term.
What kinds of funding structures will ensure data infrastructures are maintained durably and remain reliably accessible in the longer term? How can we foster forms of datafication that safeguard future sustainability and climate impacts?
More insights and resources:
- Data & Society: Climate, Technology, and Justice (Independent nonprofit research organization)
Artificial Intelligence
AI models tend to rely on huge amounts of data originating from the internet and online platforms, which is harvested without clear regulations, attention to boundaries, or ethical guidelines, prompting significant worries around fair use, proprietary content, and proper attribution. Data mined for AI purposes is furthermore frequently stripped of contextual content or metadata, which makes it hard to distinguish reliable data from unreliable information. In turn, the uncertain data quality and representativeness of AI models and applications have ethical consequences, such as when biased or unrepresentative information patterns are replicated on a large scale and produce one-sided insights and potentially harmful outcomes.
How can we shape the development of AI technologies in ethically responsible ways? Is further regulation a productive means of addressing concerns around AI?
More insights and resources:
- Center for Responsible AI Technologies (Technical University of Munich)
- The Royal Society 2024 report: Science in the Age of AI