User Tools

Site Tools


This is an old revision of the document!

15.8 - Let the Image Speak (Real time image captioning based on object detection and localization)

Project - Summary

Automatic image content description is vital problems in computer vision that artificial intelligence and natural language processing. The primary challenge towards this goal is in the design of a multi-model approach that is rich enough to aim simultaneously about contents of images and their representation in terms of words or sentences. We present a multi-model approach based on a deep learning architecture that combines recent advances in computer vision such as; salient object proposal prediction, and object detection to generate natural sentences describing an image. Leveraging recent advances in recognition of objects, their attributes and locations, however they are limited in their expressivity. Moreover, current object detection methods still suffer various problems in localization and processing time that render them unreliable and inadequate as they are still slow at test time. We target the high-level goal of annotating the contents of images based salient regions or segments of images and study the multimodal correspondence between words and images. The idea is to correctly labeling scenes, objects and regions with a fixed set of categories, while our focus is on richer and higher-level descriptions of regions. The proposed approaches can also be used in text to image search in large scale image retrieval systems.

Project - Team

Team Member Role Email Phone Number Academic Site/IAB
Moncef Gabbouj PI Not available Not available Tampere University
Serkan Kiranyaz PI Not available Not available Tampere University
Iftikhar Ahmad PI Not available Not available Tampere University
Alexandros Iosifidis PI Not available Not available Tampere University
Muhammad Adeel Waris PhD Student Not available Not available Tampere University

Project - Impact and Uses/Benefits

The impact and benefits of our work include the following:

  • A more accurate view of data and algorithm reuse.
  • Platform to enable radical, new adaptation combinations, documenting reuse of data and algorithms.

Specific to industry, our work can help industry provide services that support better science and informed decision making. The actual impact on better science is hard to measure, although the growth in digital data and data intensive research provides opportunities to address society's grand challenges in ways that have been previously unimaginable. The cost of data gathering and software development is not trivial, and the reuse of these resources is being mandated and encouraged by federal agencies. Industry also recognizes the value of these approaches in efforts such as the recent launch of the NSF Big Data Regional Hubs. The work pursued and achieved in our CVDI project leads to a better return on investment (ROI) of resources allocated to data and software creation, use, archiving, by enabling reuse that is accurate and resourceful. The work may also procure deeper understanding sustainable knowledge of ontological connections among knowledge assets. Finally, we believe the work can lead to better effort to explore predictive capabilities in the future, although more research is needed in this area.

Project - Deep Dive

Project - Documents

[n/a: No match]
projects/year4/15.8.1566409060.txt.gz · Last modified: 2019/08/21 12:37 by sally.johnson