Innovation

What is… computer vision?

A monthly tech explainer series about the technology shaping our world today, from the Garage.

By Poornima Apte — October 13, 2022

Before the Facebook app could automatically tag your friends in a photo, computers had to be taught to “see.” The road to artificial intelligence must pass through perception, the ability for machines to have vision. The field of computer vision enables such sight — it’s the science of extracting information from visual data, including images, videos, and scans. It’s computer vision that helps you deposit checks at an ATM or through your mobile phone and tags your friends’ pictures on social media.

How it works

Humans make sight look effortless. A two-month-old infant can recognize caregivers. A toddler knows the difference between a four-legged chair and a four-legged dog. But computers have only recently been able to recognize things in images.

Most of today’s computer vision algorithms are based on a branch of artificial intelligence (AI) called “machine learning,” in particular, “supervised learning.” To teach the computer to recognize a chair, you label thousands of images of chairs in all kinds of angles, and images that don’t contain chairs, and feed them to a statistical model. With each example image, an algorithm trains the model, so that it can later identify a chair correctly in other images that it has never seen before.

Illustration by Eric Chow

The a-ha moment

Computer vision has been used in some form since the 1950s, when machines could process checks. For this process, called magnetic ink character recognition (MICR), to work, the numbers had to be printed along the bottom edge of the check in a specific shape and size. The advent of more muscular computing power in the 1970s enabled computer vision algorithms called optical character recognition (OCR) to recognize characters from scanned or photographed images.

 

RELATED: What is... 5G?

 

Today’s machine learning models can recognize handwritten digits or faces in photographs. In the early 2000s, with phone cameras and the internet making a wealth of photographic images available, researchers built an online database of millions of everyday images, ImageNet. This library was used in an open contest to let researchers test the performance of their computer vision algorithms. In 2012, researchers from Canada debuted a deep learning model that aced the test and blew away the competition —  it could identify objects with nearly twice the accuracy of any other model. 

What computer vision is used for today

Today, computer vision enables the “lane assist” function on your vehicle and can find COVID-19 in X-Rays ten times faster than radiologists and with more accuracy. The technology also helps detect cancer earlier and enables automatic toll collection for vehicles without transponders by taking pictures of license plates and sending the owners an invoice in the mail. Computer vision helps HP find manufacturing defects on the production line and assemble the best photo collages before sending them to be printed.

How computer vision might change the world

Computer vision is the primary pillar for autonomous driving and will likely expand to more self-driving functions. Before we get there, scientists will have to understand how machines arrive at the decisions that they make and whether those decisions align with ours. In the future, expect drones equipped with computer vision to execute search-and-rescue missions in remote locations or help submarines track marine ecosystems autonomously. Computer vision might deliver more capable robots able to map their surroundings and perform basic functions with ease. A domestic robot butler? Why not?

 

RELATED: What is... blockchain?