| Hi, Dina here. Last week, the New York Daily News reported that Google, via contractor Randstad NV, was collecting images of people with darker skin using misleading methods that targeted the homeless and lied to the test subjects about the fact that their pictures were being recorded and stored. Why would Google do this? It needs more images of people of color to make sure its artificial intelligence software recognizes users regardless of skin tone. Studies have found that facial recognition software has biases that cause it to work far worse on people with darker skin, especially women. Part of that is down to a lack of diverse faces in the images that are used to train AI models. Google and Randstad have suspended the research program. "We're taking these claims seriously and investigating them," a Google spokeswoman said. "The allegations regarding truthfulness and consent are in violation of our requirements for volunteer research studies and the training that we provided." It's an issue that continues to crop up as AI systems require ever more data to cover all possibilities the software might be asked to answer questions about, classify or make predictions on. Chinese AI companies are striking deals to provide African countries with facial recognition surveillance technology, which in turn gives the companies African facial images to improve their models. In June, Microsoft pulled MS Celeb, a widely-used database of images used to train this kind of AI, when the Financial Times found that some of the images were not celebrities at all but private individuals whose images were scraped off the web. Then there are digital assistants like Amazon.com Inc.'s Alexa that need voice recordings and transcripts to improve. And medical applications that aim to match cancers to promising therapies fare best when trained and retrained on massive and varied data sets. This raises new challenges for how to obtain the information, and new land-grabs by companies for whom data is the new oil. Companies must be held accountable for obtaining information fairly and with clear consent, while finding ways to strip personally identifiable information from the data where possible. (With pictures of faces, that's difficult). The extra scrutiny may mean there's less information to train AI systems, and that could leave them incomplete or biased. New algorithms are being created that are less data-hungry, but these aren't ready yet. There is another option: Companies like Google can stop widely distributing the software until they can find an ethical way to perfect it.--Dina Bass |
Post a Comment