Microsoft announced today that it has developed a new image captioning system that can generate captions for images automatically. The new capability is launching first in Azure Cognitive Services today and will propagate soon to Microsoft Word, Outlook, and PowerPoint, the company said this morning.
The AI model was trained with a huge dataset of images paired with word tags, and each tag was mapped to a distinct object in an image. Then the researchers fine-tuned the pre-trained model for captioning on the already captioned images dataset. The training process helped the model to learn how to compose a sentence. The new AI model leverages the visual vocabulary to generate captions for images containing novel objects accurately.
While the researchers admit the AI isn’t perfect, but it’s two times better than the image captioning model currently being used in the company’s products and services. The researchers found that it can create captions that are more descriptive and accurate than the captions written manually by humans.
“We’re taking this AI breakthrough to Azure as a platform to serve a broader set of customers,” said Xuedong Huang, a Microsoft technical fellow and the chief technology officer of Azure AI Cognitive Services in Redmond, Washington. “It is not just a breakthrough on the research; the time it took to turn that breakthrough into production on Azure is also a breakthrough.”
Since the last couple of years, Microsoft has been working on an ambitious goal to infuse the power of AI across several of its products and services to improve productivity. With this new automatic image captioning system, Microsoft aims to help all users access the vital content in any image for people with vision impairment.
What do you think about the project, and how the researchers can improve it? Let us know in the comments down below.