Artificial intelligence has made some impressive leaps forward in recent years thanks to algorithms that learn from large amounts of training data. This approach, known as machine learning, has taught algorithms to recognize faces, transcribe speech, and respond to typed or spoken prompts with impressive skill. But for all the progress the technology has made, errors as simple as mislabeling can undermine even the best AI. And MIT researchers just found some of the most historically important data sets in the field are riddled with them. If an algorithm is tested on data containing errors, it may suggest it is better, or worse, than it actually is. For instance, if an algorithm decides that an image is 70 percent likely to be a cat but the label says "spoon," then it's likely that the image is wrongly labeled and actually shows a cat. The AI algorithm is correct even if some of the data it's fed is not. But when it comes to testing the accuracy of an algorithm, the mistakes may intimate that the AI is better than it really is in practice. Similar errors may be lurking in big data sets used to develop algorithms for various industrial uses of AI. Millions of annotated images of road scenes, for example, are fed to algorithms that help autonomous vehicles perceive obstacles on the road. Vast collections of labeled medical records also help the technology predict a person's likelihood of developing a particular disease. Read more about the researchers' findings, and what the industry should do about them, here. Will Knight | Senior Writer, WIRED |
Post a Comment