Peripheral vision allows humans to see shapes that are not directly in our line of sight, albeit with less detail. This feature can be helpful in a variety of situations, including increasing your field of vision and detecting vehicles approaching your car from the side.
Unlike humans, AIs do not have peripheral vision. Equipping computer vision models with this capability could help them more effectively detect approaching hazards or predict whether human drivers will notice an approaching object.
Taking a step in this direction, MIT researchers developed a dataset of images that can simulate peripheral vision in a machine learning model. They found that training the model using this dataset improved the model’s ability to detect objects in the visual surroundings, but the model’s performance was still inferior to that of humans.
The results also showed that, unlike humans, the size of objects or the degree of visual clutter in a scene did not have a significant impact on the AI’s performance.
“There is something fundamental going on here. We tested a lot of different models, and while we got a little better at training them, they weren’t quite the same as humans. So the question is, what is missing from this model?” says Vasha DuTell, a postdoctoral fellow and co-author of a paper detailing the study.
Answering this question can help researchers build machine learning models that can see the world like humans do. In addition to improving driver safety, these models could be used to develop displays that are easier for people to see.
Additionally, a deeper understanding of peripheral vision in AI models could help researchers better predict human behavior, added lead author Anne Harrington MEng ’23.
“Modeling peripheral vision can help us understand the features of a visual scene that make the eyes move to gather more information, if we can actually capture the essence of what is represented around us,” she explains.
Co-authors include Mark Hamilton, a graduate student in electrical engineering and computer science; Ayush Tewari, postdoctoral fellow; Simon Stent, research manager at Toyota Research Institute; Senior author William T. Freeman, Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Ruth Rosenholtz, senior research scientist in the Department of Brain and Cognitive Sciences and CSAIL member. This research will be presented at the International Conference on Representations for Learning.
“Anytime a person interacts with a machine, whether it’s a car, robot, or user interface, it’s critical to understand what the person can see. Peripheral vision plays an important role in that understanding,” says Rosenholtz.
peripheral vision simulation
Extend your arms forward and place your thumbs up. A small area around the thumb is visible in the fovea, a small depression in the center of the retina that provides the clearest vision. Everything else you can see is in your visual periphery. The visual cortex represents scenes with less detail and reliability the further away they are from sharp focus.
Many existing approaches to modeling peripheral vision in AI represent this deteriorating detail by blurring the edges of the image, but the information loss that occurs in the optic nerve and visual cortex is much more complex.
For a more accurate approach, the MIT researchers started with techniques used to model peripheral vision in humans. Known as a texture tiling model, this method transforms images to represent the loss of human visual information.
They modified this model so that it could transform images similarly, but in a more flexible way that didn’t require humans or AI to know in advance where to direct their gaze.
“This allows us to faithfully model peripheral vision in the same way that is done in human vision studies,” Harrington says.
The researchers used this modified technique to generate a huge dataset of transformed images that appeared more textured in certain areas to reveal the loss of detail that occurs when humans look farther into their surroundings.
We then used the dataset to train several computer vision models and compared their performance with human performance on an object detection task.
“We had to be very clever in how we set up our experiments so that we could also test them on machine learning models. “We didn’t want to retrain the model on toy tasks for which it was not intended,” she says.
unique performance
Humans and models were shown identical pairs of transformed images, except that one image had a target object surrounding it. Each participant was then asked to select an image containing the target object.
“One thing that really surprised us was how well people sense objects around them. We looked at at least 10 different sets of images that are too easy. “We had to use smaller and smaller objects,” adds Harrington.
The researchers found that training the model from scratch using the dataset resulted in the greatest performance improvement, improving its ability to detect and recognize objects. The performance gains were smaller when fine-tuning the model using the dataset, a process that adapts a pre-trained model to enable it to perform new tasks.
But in all cases the machines were not as good as humans, especially at detecting objects in the distant surroundings. Their performance also did not follow the same pattern as humans.
“This may suggest that the model does not use context in the same way that humans perform these detection tasks. The strategies of the models may vary,” says Harrington.
The researchers plan to continue exploring these differences, with the goal of finding a model that can predict human performance in the visual periphery. For example, this could enable AI systems to warn drivers of hazards they may not see. They also hope to inspire other researchers to conduct additional computer vision research using publicly available datasets.
“This study contributes to our understanding that peripheral human vision should not be considered simply poor vision due to the limited number of photoreceptors we have, but rather is an optimized representation for performing real-world tasks. It’s important. -Worldwide results,” says Justin Gardner, associate professor of psychology at Stanford University. “Furthermore, this study shows that despite advances in recent years, neural network models cannot keep up with human performance, which will lead to more AI research to learn from the neuroscience of human vision. “This future research will greatly benefit from the database of images provided by the authors to mimic peripheral human vision.”
This work is supported in part by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship.