Before the Facebook app could automatically tag your friends in a photo, computers had to be taught to “see.” The road to artificial intelligence must pass through perception, the ability for machines to have vision. The field of computer vision enables such sight — it’s the science of extracting information from visual data, including images, videos, and scans. It’s computer vision that helps you deposit checks at an ATM or through your mobile phone and tags your friends’ pictures on social media.
How it works
Humans make sight look effortless. A two-month-old infant can recognize caregivers. A toddler knows the difference between a four-legged chair and a four-legged dog. But computers have only recently been able to recognize things in images.
Most of today’s computer vision algorithms are based on a branch of artificial intelligence (AI) called “machine learning,” in particular, “supervised learning.” To teach the computer to recognize a chair, you label thousands of images of chairs in all kinds of angles, and images that don’t contain chairs, and feed them to a statistical model. With each example image, an algorithm trains the model, so that it can later identify a chair correctly in other images that it has never seen before.