Help Students Identify Bias in AI Datasets: 3 Key Questions

Every year, countless students enroll in courses designed to teach them how to implement artificial intelligence (AI) models that assist doctors in diagnosing diseases and recommending treatments. However, many of these programs overlook a crucial aspect: training students to identify flaws in the training data used to build these models.

Leo Anthony Celi, a senior research scientist at MIT’s Institute for Medical Engineering and Science, as well as a physician at Beth Israel Deaconess Medical Center and an associate professor at Harvard Medical School, addresses these shortcomings in a recent study. He aims to encourage educators to stress the importance of thoroughly evaluating training data before integrating it into AI models. Previous research has shown that models primarily trained on clinical data from white males frequently fail to perform effectively for individuals from other demographics. In this discussion, Celi elaborates on the implications of such biases and how educators can enhance their teaching regarding AI models.

Q: How does bias enter these datasets, and how can we rectify this issue?

A: Any flaws within the data inherently affect any modeling derived from it. Historically, we have documented instances where medical devices do not work uniformly for all populations. For instance, pulse oximeters have been found to overestimate oxygen levels in people of color due to insufficient enrollment of diverse groups in the clinical trials that validated these devices. We remind our students that medical instruments are usually optimized for healthy young males and not for more vulnerable populations, such as elderly women with heart conditions. Currently, the FDA only mandates that devices work on healthy subjects.

Moreover, using electronic health records (EHRs) as foundational data for AI poses challenges. These records weren’t designed for learning algorithms, so caution is critical. Although the EHR system is expected to be overhauled eventually, that process will take time, necessitating smarter data utilization now. One promising approach we’re pursuing involves developing a transformer model that processes numeric EHR data, including lab results. This model aims to address the effects of incomplete data stemming from social determinants of health and implicit biases from providers.

Q: Why is it essential for AI courses to cover sources of potential bias? What did your analysis reveal about course content?

A: Our MIT course debuted in 2016, and we soon recognized that we were inadvertently pushing students to create models overly focused on statistical performance measures, often neglecting the flawed nature of the data they were analyzing. This led us to question: Just how prevalent is this issue in other courses?

Upon reviewing syllabi from various courses, particularly online offerings, we found a glaring absence of instruction about the need for skepticism regarding data quality. Indeed, out of 11 programs assessed, only five included any mention of data bias, with merely two offering substantial discussions on the subject.

While we appreciate the value these courses provide, we aim to highlight the necessity for enhanced curricula focusing on data analysis skills. With more individuals drawn into this multifaceted AI landscape, it is crucial to empower them with the competencies required to handle AI responsibly. We hope this study brings attention to this significant educational gap.

Q: What content should developers emphasize in AI curricula?

A: First and foremost, instructors should provide students with a checklist of questions to consider, such as: Where did this data originate? Who were the medical professionals involved in collecting it? Understanding the context of the data collection is vital—for instance, if analyzing an ICU database, one must consider the demographic factors influencing who gets admitted to intensive care. If marginalized patients lack access to timely treatment, any resultant models will falter for these groups.

I believe that at least half, if not more, of the course curriculum should focus on comprehending data, as modeling becomes significantly easier once that foundation is established. Since 2014, the MIT Critical Data consortium has hosted datathons, where healthcare professionals collaborate with data scientists to explore local health issues in context. Conventional textbooks and papers often present diseases based on studies involving limited demographics primarily from well-resourced countries.

Our current focus is on instilling critical thinking skills within students. A crucial component of this is to encourage participation from individuals across various backgrounds. Critical thinking cannot be effectively taught in a homogenous environment; when diverse teams convene during datathons, we often see naturally emerging critical discussions. Our guidance remains clear to participants: refrain from developing any models until you thoroughly understand the data, the patient demographics, and the reliability of the measurement devices.

Throughout our global events, we emphasize the importance of utilizing locally relevant datasets. Some may hesitate, fearing they will unearth issues regarding data quality. We assert that acknowledging the shortcomings is the first step toward improvement. Without recognizing these flaws, poor data practices will continue, rendering future collections ineffective. Embracing the fact that you won’t perfect data collection immediately is essential, and identifying these issues is part of the journey. The Medical Information Marked for Intensive Care (MIMIC) database, for example, took years to develop a reliable schema, primarily due to feedback about its limitations.

While we may not have answers to all questions regarding data quality, we can inspire participants to recognize and address the myriad issues it presents. It’s always uplifting to read the blog posts of datathon attendees who express how their perspectives have shifted. They leave with not just excitement about the field but also a keen awareness of the risks associated with negligent practices.

Photo credit & article inspired by: Massachusetts Institute of Technology

Leave a Reply

Your email address will not be published. Required fields are marked *