Computers have started to get really good at visual recognition. They can sometimes rival humans at recognizing the objects in a series of images. But does the similar end result mean that computers are mimicking the human visual system? Answering that question would indicate if there are still some areas where computer systems can’t keep up with humans.
So, a new PNAS paper takes a look at just how different computer and human visual systems are.
The difference really boils down to the flexibility that human brains have and computers don’t. It’s much the same problem that speech recognition system face: humans can figure out that a mangled word “meant” something recognizable while a computer can’t. Likewise with images: humans can piece together what a blurry image might depict based on small clues in the picture, where a computer would be at a loss.
The authors of the PNAS paper used a set of blurry, tricky images to pinpoint the differences between computer vision models and the human brain. They used pictures called “minimal recognizable configurations” (MIRCs) that were either so small or so low-resolution that any further reduction would prevent a person from being able to recognize them.
They created this set of images by presenting a series of gradually smaller and lower-resolution pictures to thousands of people on Amazon Mechanical Turk, identifying the last level at which images could be recognized. This last recognizable level was identified as an MIRC; anything at a lower, unrecognizable level was called a sub-MIRC.
The first and most obvious comparison is whether humans and computers have similar levels of recognition of MIRCs and sub-MIRCs. To test this, the researchers identified all the MIRCs that humans could identify correctly more than 65 percent of the time and a group of sub-MIRCs identified correctly less than 20 percent of the time. The computer models didn’t perform very well on these images. They could accurately classify only around seven percent of the MIRCs and two percent of the sub-MIRCs. That’s a win for the humans.
There was also a dramatic difference in the way that computers started failing. For people, the recognition of MIRCs suddenly fell off a cliff at a particular level. The last recognizable image might be identified correctly by 93 percent of people; after a tiny change, the sub-MIRC could be identified by only three percent.
Computers didn’t show this sharp drop-off. “None of the models came close to replicating the large drop shown in human recognition,” the authors write.
The computer models did better after they were trained specifically on the MIRCs, but their accuracy was still low compared to human performance. The reason for this, the authors suggest, is that computers can’t pick out the individual components of the image whereas humans can. For instance, in a blurry picture of just the head and wings of an eagle, people could point to the smudges that represented the eyes, beak, wing, etc. This kind of interpretation is “beyond the capacities of current neural network models,” the authors write.
Overall, this means that computers can do really well at image recognition, but the processes they’re relying on to do so aren’t a very close approximation of how humans would handle the same task. They don’t use the individual components in an image to work out what it means, and so they aren’t as good as we are at figuring out an image based on minimal information.
Ultimately, we may need to figure out what’s going on in our own brains in order to get our computer models working better. It’s possible that humans first figure out what an image might be and then look for individual features that confirm or contradict this initial idea. If this is the case, then it’s clear that current computer models work very differently.
Until we figure out our own heads, however, we won’t be able to get our computers to match them.