Researchers use bots and artificial intelligence to automatically tag and title videos

Laurent Giret

Looking for more info on AI, Bing Chat, Chat GPT, or Microsoft's Copilots? Check out our AI / Copilot page for the latest builds from all the channels, information on the program, links, and more!

If you already tried to upload some of your pictures to OneDrive, you may be aware that Microsoft’s cloud storage service is able to automatically tag your photos and categorize them, group them by location, and more. By adding more data to user-generated content, Microsoft’s artificial intelligence tools also make it easier for OneDrive users to find relevant pictures using OneDrive’s search feature.

But could artificial intelligence accomplish the same sort of magic with video content? That’s exactly what Chia-Wen Lin and Min Sun, professors in the Electrical Engineering department of National Tsinghua University in Taiwan, are trying to do. In a new blog post on the Microsoft Research blog, the company explains that both professors partnered in 2015 with Dr. Tao Mei, lead researcher in multimedia at Microsoft Research Asia who worked on a new image recognition, segmentation, and captioning dataset called COCO (Common Objects in Context).

Using the dataset, the professors built a system that leverages bots and artificial intelligence to determine the highlights of a video, add a title to it, and suggest people with whom to share it:

Professor Sun created a video title generation method based on deep learning to automatically find the special moments—or highlights—in videos, and generate an accurate and interesting title for the highlights. In parallel, Professor Lin developed a method to detect and cluster the faces in videos to provide richer summaries of the videos and relevant suggestions about whom to share them with. Working together, their algorithms can detect highlights, generate descriptions of highlights and tag potential viewers of user-generated videos.

It’s important to note that the system is ultimately designed to improve the discoverability of video content and help creators reach a bigger audience. Professor Sun and his students have recently participated in the VideoToText challenge (sponsored by Microsoft Research) to improve the system, and the result of their work will be unveiled at the European Conference on Computer Vision which is currently underway in Amsterdam. “Our research has taken us one step closer to the holy-grail of visual intelligence, understanding visual content in user-generated videos,” explained Professor Sun.

While this new research is definitely interesting, Microsoft actually paved the way with Azure Media Services. In April, the company announced new machine learning features for its media cloud streaming offering including automatic video summarization, a speech-to-text indexer, a face recognition feature, and more.