Recently, many of our partners and clients have asked us: how do you handle bias in artificial intelligence and machine learning (Ai/ML)?
In our humble opinion, technology alone cannot solve the bias problem. And the reason is simple…
People often do not agree on whether bias exists or not!
Let’s look at a (made-up) computer vision example.
What is computer vision?
Computer Vision is a scientific field that enables computers to use high-level understanding from digital images or videos.
Using the images or video footage, the computer seeks to understand and automate tasks that humans can do on a much larger and faster scale. The computer identifies and classifies objects in the image or video, such as a human being. And it can also recognize individual faces.
Machine learning then helps the computer intelligently react to that information by continually learning.
In our example, we have trained a facial-recognition algorithm. We found the algorithm achieved 90% precision in recognizing male faces. But female faces? The algorithm only achieved 80% recognition in recognizing female faces.
Why is that? Is this algorithm biased?
Some people would say yes, while others may express uncertainty.
What if we said 100,000 male images and 10,000 female images were used to train the model? Now, with this information, do you think the algorithm is biased?
It’s more convincing the algorithm is biased based on the number of images used for males and females. But, say that after we add 100,000 female images to retrain the model, the model’s performance doesn’t change.
Huh. That’s interesting. The same number of images for males and females are used and yet the performance still recognizes more male faces.
Again, we ask: is the algorithm biased? Many would hesitate to answer yes or no. What do you think?
To try circumventing the bias, even more images are added into the algorithm. Would adding 1,000,000 female images to the training set change anything?
Not even that many images changed anything. It still recognized 90% male faces and 80% female faces. What is causing this disparity?
The uneven recognition may rest in technological limitations. It may simply be more difficult for the machine to recognize female faces.
At this junction, we are faced with three choices:
The right choice depends on the business context. If the performance difference may cause bias, we should be cautious and practice awareness.
From our facial recognition example, we can see there are two factors to be considered:
To answer these two questions, we need to understand the model output differences on separate slices and dices. Sound familiar? (Re)introducing Business Intelligence (BI) for data insights.
At ElectrifAi, we have rigorous quality control (QA) checks in place to ensure our machine learning models are top-notch. We established an engineering process that checks the number of training records, model output distributions, and model performance for each important feature combination (i.e., age group, income level, location, gender, race, etc.).
These features are typically not used to train machine learning models but can be used to analyze the model’s results. This not only helps us detect bias in our models but also helps us understand what areas our models can improve.
Our machine learning models only pass our QA checks after proving they can perform well even in worst-case scenarios. We then provide a model test report to our clients to help them reduce bias in the final business solutions.
You might ask… why don’t we go even further and use Ai/ML to automatically detect and reduce bias in Ai/ML models?
Yes, we can actively work towards greater Ai/ML abilities in the effort to reduce bias. But … that’s a discussion for another day as it’s still a work in progress. We are always improving ourselves to be better, stronger, and more aware.
Do you know your data? What’s interesting is that today’s machine learning models in production rarely know their data, training, or inference data. Data scientists consider that to significantly weaken the model.
To find out more about how to use your data effectively and get the best results, check out our upcoming blogs, “Know Your Data, Garbage In and You May Get Gold Out” and another blog that will reveal an exciting secret. Patents, auto-evolving machine learning models, and much more.