[IMAGE: GETTY IMAGES]
Artificial Intelligence is meant to be taking over the world, but along the way, it seems to be helping out humans from time to time. Facebook debuted a new AI system that can describe photos in stunning detail, making the social network even more accessible.
The tool is called Automatic Alternative Text, and it dovetails with text-to-speech engines that allow blind people to use Facebook in other ways.
Using deep neural networks, the system can identify particular objects in a photo, from cars and boats to ice cream. It can pick out particular characteristics of the people in the photo, including smiles and beards and eyeglasses. And it can analyse a photo in a more general sense, determining that a photo depicts sun or ocean waves or snow. The text-to speech engine will then read these things aloud.
The new audio photo captions will begin with describing the number of people in a photo, whether they are smiling and then lists each object detected in the photo, ordered by the algorithm’s confidence in what it is seeing. The image’s properties, such as whether its indoors, a selfie or a meme, will be announced at the end of the description.
For each image, the AI system returns a confidence score indicating how sure it is that it can identify what’s in the picture. If this is above 80 percent, an automatically-generated caption appears. According to the engineers behind the system, that target is already being hit for half of all the pictures on the social network, and the underlying technology is getting better all the time.
However, it is not perfect and will continue to learn, the company said, noting that for now, Facebook’s automatic alternative text will begin with the words “image may contain” to convey uncertainty.
The experience is currently only available on iOS devices and in English. However, Facebook plans to add the automatic alternative text option to other platforms in the future with plans to support more languages.
Facebook’s scale is enormous: each day, users upload 2 billion photos across Facebook, Instagram, Messenger, and WhatsApp. And so, the accessibility team turned to Facebook’s AI division, which is building software that recognizes images automatically. “We need a solution to that problem if people who cannot see photos and understand what’s in them are going to be part of the community and get the same enjoyment and benefit out of the platform as the people who can,” King says.
At this stage, automatic alt tags represent a fascinating demonstration of technology. But at scale, they could also represent a growth opportunity people with disabilities have been less likely to use Facebook on average, for obvious reasons. “Inclusion is really powerful and exclusion is really painful,” King says. “The impact of doing something like this is really telling people who are blind, your ability to participate in the social conversation that is going on around the world is really important to us. It is saying as a person, you matter, and we care about you. We want to include everybody and we will do what it takes to include everybody.”
Facebook is not alone in using machine learning to understand photos; it is one of a few things AI can currently do with any level of sophistication. Similar technology powers keyword searches in Google Photos and Flickr. But the technology is still prone to errors, and millions of objects have yet to be parsed. Last year, Google was forced to apologize after Photos tagged two black people as “gorillas.”