When gathering data, we need to also look at wildcards. They may not really be wild cards but they might be enough to shake our biases before they get baked into any digital solution.
I am not here writing about racism—though that can be included. Think about historic redlining affecting your ability to get a loan in an automated system. Don’t know what I mean?
Here is IBM realizing that a common bias can become part of a virtual reality and artifical intelligence just because the designers have unstated assumptions. For example, anchoring bias: the first thing we hear is most likely right. Or CS Lewis’ chronological snobbery: the assumption because we are more modern, we know more than the previous generation. Or gender bias: like when looking for an image of a “cop” or “pilot” you (wrongly) expect and (wrongly) get men or when you google “nurse” you (wrongly) expect and (wrongly) get women.
These things can go really bad if there are medical conditions that we stay quiet about or integrate symptoms around the male body as a norm. A big one like the symptoms of a heart attack. A friend of mine had a heart attack while going on a hike and they did not go to the doctor for several hours because her symptoms presented as tiredness and indigestion: which are common presenting symptoms of the female body. Or when our data gap outright ignores people groups where those differences might matter.
This is happening everywhere and you can read about it in regards to women in Caroline Criado Perez’s excellent book Invisible Women. You can get around this by purposefully including groups (like women) as part of your research and not only around roles. Here I would argue that you need a randomized wildcard that excludes the common bits you are looking for.
I am not saying this is the only answer but we need to do better lest we forget large swaths of humans in our supposedly human-centered solutions.
I’ve updated this post with newer articles.