15.9 - Learn to Segment

Project - Summary

Automatic image content description is vital problems in computer vision that artificial intelligence and natural language processing. The primary challenge towards this goal is in the design of a multi-model approach that is rich enough to aim simultaneously about contents of images and their representation in terms of words or sentences. We present a multi-model approach based on a deep learning architecture that combines recent advances in computer vision such as; salient object proposal prediction, and object detection to generate natural sentences describing an image. Leveraging recent advances in recognition of objects, their attributes and locations, however they are limited in their expressivity. Moreover, current object detection methods still suffer various problems in localization and processing time that render them unreliable and inadequate as they are still slow at test time. We target the high-level goal of annotating the contents of images based salient regions or segments of images and study the multimodal correspondence between words and images. The idea is to correctly labeling scenes, objects and regions with a fixed set of categories, while our focus is on richer and higher-level descriptions of regions. The proposed approaches can also be used in text to image search in large scale image retrieval systems.

Project - Team

Team Member Role
Moncef Gabbouj PI Not available Not available Tampere University
Serkan Kiranyaz PI Not available Not available Tampere University
Iftikhar Ahmad PI Not available Not available Tampere University
Alexandros Iosifidis PI Not available Not available Tampere University
Muhammad Adeel Waris PhD Student Not available Not available Tampere University

Project - Impact and Uses/Benefits

Industrial partner can train this system for any object classification task that it desires. Such systems can serve the companies that are dealing with computer vision problems such as camera smart object auto-focusing, advertisement assessment, face detection, and many other applications basically including any object/region detection task.

Project - Deep Dive

Project - Documents

