User Tools

Site Tools


15.8 - Let the Image Speak (Real time image captioning based on object detection and localization)

Project - Summary

Automatic image content description is vital problems in computer vision that artificial intelligence and natural language processing. The primary challenge towards this goal is in the design of a multi-model approach that is rich enough to aim simultaneously about contents of images and their representation in terms of words or sentences. We present a multi-model approach based on a deep learning architecture that combines recent advances in computer vision such as; salient object proposal prediction, and object detection to generate natural sentences describing an image. Leveraging recent advances in recognition of objects, their attributes and locations, however they are limited in their expressivity. Moreover, current object detection methods still suffer various problems in localization and processing time that render them unreliable and inadequate as they are still slow at test time. We target the high-level goal of annotating the contents of images based salient regions or segments of images and study the multimodal correspondence between words and images. The idea is to correctly labeling scenes, objects and regions with a fixed set of categories, while our focus is on richer and higher-level descriptions of regions. The proposed approaches can also be used in text to image search in large scale image retrieval systems.

Project - Team

Team Member Role Email Phone Number Academic Site/IAB
Moncef Gabbouj PI +358 400 736613 Tampere University
Serkan Kiranyaz PI 97 43 063 5600 Tampere University
Iftikhar Ahmad PI Not available Not available Tampere University
Alexandros Iosifidis PI +45 9350 8875 Tampere University
Muhammad Adeel Waris PhD Student Not available Not available Tampere University

Project - Impact and Uses/Benefits

Industrial partner can train this system for any object classification task that it desires. Such systems can serve the companies that are dealing with computer vision problems such as camera smart object auto-focusing, advertisement assessment, face detection, and many other applications basically including any object/region detection task.

Project - Deep Dive

Project - Documents

FilenameFilesizeLast modified
15.8_year_4_executive_summary.pdf160.0 KiB2019/08/22 11:50
15.8_year_4_ip_letter_combined.pdf371.0 KiB2019/08/22 11:50
15.8_year_4_final_report.pdf1.6 MiB2019/08/22 10:33
projects/year4/15.8.txt · Last modified: 2021/06/02 15:38 by sally.johnson