Can an Archaeological Approach Help Leverage Biased AI Data To Improve Medicine?

Biased AI data

Key Highlights

  • Researchers advocate taking an archaeological approach to address biased AI data for healthcare, emphasizing the role of historical and social factors. 
  • Treating biased data as informative artifacts can lead to better AI practices.

While data bias and errors might be seen as annoyances by computer scientists at first, researchers contend that they represent a concealed repository for mirroring societal values.

Challenging the Notion of “Garbage In, Garbage Out”

In an insightful article published in the New England Journal of Medicine (NEJM), computer science and bioethics experts from MIT, Johns Hopkins University, and the Alan Turing Institute have proposed a novel perspective on addressingbiased AI data, particularly in medical data. The authors challenge the conventional wisdom encapsulated in the adage “garbage in, garbage out,” stressing the value of recognizing and addressing data bias, even in medicine.

Bias in AI and Its Relevance to Medicine

The rise of artificial intelligence has accentuated the importance of addressing biased AI data models, as identified by the White House Office of Science and Technology in its Blueprint for an AI Bill of Rights. While the typical response to biased data is to gather more data from underrepresented groups or generate synthetic data, the authors argue for a broader sociotechnical perspective.

They introduce the concept of “data as artifact,” akin to how anthropologists and archaeologists view physical objects to understand past civilizations. In this context, biased clinical data can be regarded as “artifacts” that reveal societal practices, belief systems, and cultural values. These artifacts highlight the historical and social factors that have contributed to existing healthcare disparities.

  • The authors emphasize that addressing data bias requires collaboration with bioethicists and clinicians who possess expertise in discerning the social and historical factors influencing data collection. 
  • This multidisciplinary approach can lead to better-tailored AI models, acknowledging that generalized models may not work well for specific subgroups.

Dilemma of Racially Corrected Data and Self-Reported Race

The authors also address challenges related to racially corrected data and self-reported race in clinical risk scores. They argue for a case-by-case approach, emphasizing that the evidence should guide the decision on whether to include self-reported race data.

  • This new perspective aligns with the National Institutes of Health’s (NIH) focus on collecting high-quality, ethically sourced datasets. 
  • It acknowledges that ethical data collection is pivotal for developing safe and high-performance clinical AI models.
  • Treating biased datasets as artifacts offers several potential benefits, including the ability to consider local context when training algorithms and identify discriminatory practices that might be embedded in algorithms.
  • It can also lead to the development of new policies and structures aimed at eliminating the root causes of bias in datasets.

The authors advocate for a shift from a narrow technical view of data to a more comprehensive understanding of the issues surrounding AI and data bias in healthcare. This approach sets the stage for addressing current healthcare problems and creating a more equitable future.


1. What is the “artifact” approach to biased AI data in healthcare?

The artifact approach views biased clinical data as pieces of civilization, revealing social and historical influences on data collection.

2. Should self-reported race be included in clinical risk scores?

Inclusion depends on context; it’s a social construct that requires evidence-based consideration.

3. How can the artifact-based approach impact AI in healthcare?

It can lead to better algorithms, more context-aware AI, and policies eliminating bias at its root.

Skip to content