Main Article Content
In areas like automated captioning and image retrieval, modern techniques at the confluence of Computer Vision and Natural Language Processing have made ground-breaking advances built on recent Deep Neural Network designs. A substantial training set of photos with human annotations that characterise their visual content is required for most of these learning approaches. More complicated scenarios where written descriptions are only weakly linked to visuals are the focus of this research.
Connotative and ambiguous relationships are typically expressed in news items by the textual content, which can only be deduced via the use of connotative and ambiguous visuals. Source identification, article illustration, and article geolocation are some of the applications for an adaptable CNN architecture that shares most of its structure. A novel loss function based on Great Circle Distance is suggested for geolocation in place of Deep Canonical Correlation Analysis for article visualisation. And we've just unveiled BreakingNews, a new dataset that includes about 100,000 news pieces, each of which is enhanced with a variety of meta-data (such as GPS coordinates and user comments).
Using a variety of Deep Learning architectures, we demonstrate that this dataset is suitable for investigating all of the aforementioned challenges, and we offer a baseline performance using multiple representations of the textual and visual characteristics. As a consequence of these encouraging findings, we intend to drive advancement in the field by highlighting the limits of present state-of-the-art.
This work is licensed under a Creative Commons Attribution 4.0 International License.