Federated Learning for Vision-and-Language Grounding Problems release_uz72aopvavgnxcnzpxc4nliwh4

by Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou

Published in PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE by Association for the Advancement of Artificial Intelligence (AAAI).

2020   Volume 34, Issue 07, p11572-11579

Abstract

Recently, vision-and-language grounding problems, e.g., image captioning and visual question answering (VQA), has attracted extensive interests from both academic and industrial worlds. However, given the similarity of these tasks, the efforts to obtain better results by combining the merits of their algorithms are not well studied. Inspired by the recent success of federated learning, we propose a federated learning framework to obtain various types of image representations from different tasks, which are then fused together to form fine-grained image representations. The representations merge useful features from different vision-and-language grounding problems, and are thus much more powerful than the original representations alone in individual tasks. To learn such image representations, we propose the Aligning, Integrating and Mapping Network (aimNet). The aimNet is validated on three federated learning settings, which include horizontal federated learning, vertical federated learning, and federated transfer learning. Experiments of aimNet-based federated learning framework on two representative tasks, i.e., image captioning and VQA, demonstrate the effective and universal improvements of all metrics over the baselines. In image captioning, we are able to get 14% and 13% relative gain on the task-specific metrics CIDEr and SPICE, respectively. In VQA, we could also boost the performance of strong baselines by up to 3%.
In application/xml+jats format

Archived Files and Locations

application/pdf  926.5 kB
file_heqnquu4o5bujn4vxmxuymfpnm
aaai.org (web)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2020-04-03
Proceedings Metadata
Not in DOAJ
Not in Keepers Registry
ISSN-L:  2159-5399
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 7e512494-5224-4e6f-9a8c-d9125669eea2
API URL: JSON