Federated Learning for Vision-and-Language Grounding Problems
release_uz72aopvavgnxcnzpxc4nliwh4
by
Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou
2020 Volume 34, Issue 07, p11572-11579
Abstract
Recently, vision-and-language grounding problems, e.g., image captioning and visual question answering (VQA), has attracted extensive interests from both academic and industrial worlds. However, given the similarity of these tasks, the efforts to obtain better results by combining the merits of their algorithms are not well studied. Inspired by the recent success of federated learning, we propose a federated learning framework to obtain various types of image representations from different tasks, which are then fused together to form fine-grained image representations. The representations merge useful features from different vision-and-language grounding problems, and are thus much more powerful than the original representations alone in individual tasks. To learn such image representations, we propose the Aligning, Integrating and Mapping Network (aimNet). The aimNet is validated on three federated learning settings, which include horizontal federated learning, vertical federated learning, and federated transfer learning. Experiments of aimNet-based federated learning framework on two representative tasks, i.e., image captioning and VQA, demonstrate the effective and universal improvements of all metrics over the baselines. In image captioning, we are able to get 14% and 13% relative gain on the task-specific metrics CIDEr and SPICE, respectively. In VQA, we could also boost the performance of strong baselines by up to 3%.
In application/xml+jats
format
Archived Files and Locations
application/pdf 926.5 kB
file_heqnquu4o5bujn4vxmxuymfpnm
|
aaai.org (web) web.archive.org (webarchive) |
article-journal
Stage
published
Date 2020-04-03
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Worldcat
SHERPA/RoMEO (journal policies)
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar