Teaching Data Bias in AI: A Lesson in Fairness and Accuracy

Facebook LinkedIn

Last week, as we celebrated culture, diversity, and inclusion, I was also teaching a lesson on AI and data bias. What began as a fun and engaging activity quickly turned into a deep exploration of how AI can be unfair if it isn’t trained on the right data. This lesson helped students understand not only the importance of fair training data but also the potential consequences of AI’s inherent biases.

Exploring Data Bias Through AI

Students got hands-on with creating their own machine learning models using Gen AI (https://tm.gen-ai.fi/image/general). This tool allows them to build models by simply dragging and dropping images, making the process interactive and easy to understand. The task was to create a model that could distinguish between apples and tomatoes, much like the AI used in Amazon Go, a checkout-free supermarket. At Amazon Go, AI recognises the items customers pick up and charges them accordingly—no queues, no checkout lines!

The Experiment: Training the Model

We began with a small set of images for training the model. Once students tested the model on unseen data, they quickly noticed that it didn’t work very well. The predictions were off, highlighting how limited training data can affect the accuracy of an AI model. To improve the results, we added more images to the training dataset. With the larger dataset, the model performed better, though we still had some interesting conversations about how confident the AI was in its predictions.

The Fun Twist

To keep the students engaged, I uploaded a photograph of myself to the model. Rather than the expected “Unexpected image in bagging area,” the model proudly declared that I was 87% apple! This amusing moment helped spark a discussion about how AI models can sometimes misinterpret data, especially when unexpected or unfamiliar data is introduced. What happens when an AI model encounters something it wasn’t trained on?

The Serious Side: The Impact of Data Bias

What started as a fun activity soon took on a more serious tone when we discussed real-world examples of AI bias. We looked at Amazon’s failed hiring algorithm, which was scrapped after it showed a strong bias against women, especially for technical roles. This happened because the model was trained on “resumes” from the company’s historical hiring data, which reflected a gender imbalance. The AI simply learned that men were preferred for tech jobs, and it started to unfairly favour male candidates. Despite attempts to fix the algorithm by adjusting it and removing gendered language, the bias couldn’t be fully eliminated because the model was learning from historical patterns that were inherently biased.

We also examined how social media algorithms are designed to show content that users have already liked, creating a filter bubble. This bias towards previously liked content often means that users miss out on discovering new and diverse perspectives.

Gender Shades: A Powerful Video on AI Discrimination

We then watched the powerful documentary “Gender Shades”, led by Dr. Joy Buolamwini, which revealed that facial recognition systems had higher error rates when identifying faces of dark-skinned individuals, particularly women, compared to light-skinned and male faces. The study showed that the lack of diversity in training datasets caused these biases. The project examined systems from companies like IBM, Microsoft, and Face++, leading these companies to improve their models after the results were published. The lesson was clear: without diverse data, AI systems can unintentionally discriminate against certain groups, and the impacts can be harmful.

The Takeaways: Fairness in AI Is Crucial

After exploring these real-world issues, we wrapped up the lesson by discussing the benefits of AI but also the risks associated with biased data. We reflected on the importance of making sure AI models are trained on fair, representative, and diverse datasets to prevent them from reinforcing harmful stereotypes or making unfair predictions. It’s also crucial to remember that AI bias can arise from more than just unbalanced data—it can also come from biased labeling (where human error or assumptions influence how data is categorised) and feature selection (deciding which factors the AI considers).

Conclusion: AI Is Only as Fair as Its Data

This lesson didn’t just teach students how to build and test an AI model; it also highlighted the ethical implications of biased data. By engaging with real-world examples and interactive activities, students gained a deeper understanding of how crucial it is to ensure fairness and transparency in AI development.

As students learn about AI, it’s essential for them to understand not only how AI works but also its ethical implications. Fields like AI ethics and data science are growing rapidly, and many professionals are dedicated to ensuring that AI benefits everyone fairly. By teaching the next generation to recognise and address AI bias, we’re empowering them to be part of the solution.

Key message: AI is powerful, but its fairness and accuracy depend entirely on the quality of the data we provide. Teaching students to recognise and address bias is an essential step in preparing them for a future where AI continues to shape our world.