Please use this identifier to cite or link to this item:
/library/oar/handle/123456789/108358| Title: | A hybrid image captioning architecture with instance segmentation and saliency prediction |
| Authors: | Saliba, Chantelle (2022) |
| Keywords: | Computer vision Neural networks (Computer science) |
| Issue Date: | 2022 |
| Citation: | Saliba, C. (2022). A hybrid image captioning architecture with instance segmentation and saliency prediction (Master's dissertation). |
| Abstract: | In recent years, image captioning has increased in its populace as identified by the surge of research in this area amongst the Artificial Intelligence community. Recognising its potential as an assistive technology, a novel framework is being presented that makes use of a hybrid architecture compromising of a convolutional neural network in addition to novel image and language transformers. Reviewing the current state-of-the-art technologies a rich encoder was constructed aiming to extract information at both object and scene level. This was achieved by an amalgamation of an instance segmentation technique and a saliency predictor to identify objects within a visual scene in addition to a scene classifier to determine environmental factors. Features extracted from the concatenation of a vision hybrid transformer used for the former and a convolutional neural network used for the latter are then progressed through a dedicated image-to-sequence language transformer for the construction of the architecture. The pipeline presented influenced by rich literature is constructed argumentatively and utilises a modular framework, therefore providing an opportunity for modernisation and improvement of results. Furthermore, the discussed pipeline facilitates the future explainability of image captioning architectures in addition to focusing on a more efficient training strategy. This novel architecture was benchmarked on the Flickr8K and the Flickr30K and has managed to achieve comparable and even-so exceeding results on several metrics with the current state-of-the-art architectures while attaining the above advantages. This research strives to contribute to the improvement of image captioning and review current state-of-the-art techniques such as instance segmentation and scene classification whilst also identifying the potential of saliency prediction as an attention mechanism, in addition to focusing on the readability of the sentences generated. |
| Description: | M.Sc.(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/108358 |
| Appears in Collections: | Dissertations - FacICT - 2022 Dissertations - FacICTAI - 2022 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2219ICTICS520000010893_1.PDF | 10.09 MB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
