Fairness here does not mean being white or glowy!!!😂😂
Disclaimer: This article contains examples of unethical practices in the field of machine learning and artificial intelligence. These examples are intended solely for educational and awareness purposes. We do not endorse or condone any unethical behavior, and we acknowledge the importance of addressing these issues to build fair, unbiased, and ethical AI systems. Our goal is to promote a better understanding of the challenges within the AI industry and to encourage a collective effort to rectify and improve these aspects. Please read with the intention of fostering positive change within the AI and ML community.
- Definition of ML Fairness
- Where can the Bias Manifest into an ML pipeline?
- A few Examples to understand how we can remove the bias.
These days ML Fairness is a scorching topic. Everybody is talking about it. These days most companies want to have fair models. They are trying to make even their existing models fair. So, before we start making our model fair let us understand it.
Let’s look at these images:
- What do we see here?
We see that the first three images have labels like the ceremony, bride, groom, wedding, and so on. But the last image just has two labels person and people.
- What do we understand from this?
We understand that the model cannot recognize the Indian Hindu-style wedding. This indicates the model is being unfair to them.
- Why couldn’t the model recognize that?
The training dataset that was provided to the model didn’t have any images that had people marrying in the Indian Hindu Style.
- What could be the solution to it?
As data that was fed to the model was geographically skewed. So, we just have to add more pictures to the data and retrain our model. Finally, we will be able to see the desired results.
In machine learning, a given algorithm is said to be fair, or to have fairness, if its results are independent of given variables, especially those considered sensitive, such as the traits of individuals which should not correlate with the outcome (i.e. gender, ethnicity, sexual orientation, disability, etc).
The product, the technology, and anything that we make for the entire group of people shouldn’t fail at an individual level.
Unfairness can enter the system at any point in the ML pipeline, from data collection and handling to model training to end-use.
So, we can broadly divide the pipeline into 3 main categories:
- Fairness by Data
- Fairness by Measurement and Modelling
- Fairness by Design
- Here we can take the data from worldwide and then train our model.
- It will be difficult to do that in the first go, so we have to keep collecting and re-training our model.
An experiment was done, where two people from anywhere could have a conversation and the model was supposed to tell the toxicity in every message sent. Everything was working fine until an issue came up. The issue is given in the image below. The text inside the message is sent from one person to another person and the small square above tells about the toxicity level of that sentence.
Understanding what went wrong
- The issue was, that the person who wrote “I am Gay” was just talking about his gender, and the sentence was considered toxic which is wrong.
- Usually, words like Gay, Lesbian, Bisexual, etc. are used to abuse someone. It is ethically wrong. But more conversations used it, so our model understood the same.
- That is the reason the sentence was considered highly toxic.
To solve this we can do two things
- The first one is very simple in that we can just collect more data that has positive instances of words like gay, lesbian, etc. Finally, we can train our model again and we might get better results.
- The second one is that we can use sentiment analysis with this. It will automatically detect that it is a neutral sentence.
The word “Friend” in English is “Amigo” in Spanish and is used for male friends and “Amiga” for female friends. So if we use Google Translate and convert from English to Spanish, how will the translator know that the user is talking about a girl or a guy?
So, to tackle this issue google just added both the words which can be seen in the image below.
Sometimes we do have a solution like the above example we just need to tweak our GUI to incorporate our solution which is bias-free.
That’s it for Part 1!! Stay tuned for Part 2 as I will be explaining a research paper that is “Delayed Impact of Fair ML Model”.
Pssss….It’s a secret!! Don’t tell anyone that I will be releasing Part 3 which will be a Comic that will explain both Part 1 and Part 2!! If you found this boring then you can wait for it and go through that.