Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning
release_5xlwhtz7ebddbfzfjzhzkbueye
by
Fenglin Liu, Xuancheng Ren, Guangxiang Zhao, Xu Sun, Liangyou Li
2020
Abstract
In sequence-to-sequence learning, the decoder relies on the attention
mechanism to efficiently extract information from the encoder. While it is
common practice to draw information from only the last encoder layer, recent
work has proposed to use representations from different encoder layers for
diversified levels of information. Nonetheless, the decoder still obtains only
a single view of the source sequences, which might lead to insufficient
training of the encoder layer stack due to the hierarchy bypassing problem. In
this work, we propose layer-wise cross-view decoding, where for each decoder
layer, together with the representations from the last encoder layer, which
serve as a global view, those from other encoder layers are supplemented for a
stereoscopic view of the source sequences. Systematic experiments show that we
successfully address the hierarchy bypassing problem and substantially improve
the performance of sequence-to-sequence learning with deep representations on
diverse tasks.
In text/plain
format
Archived Files and Locations
application/pdf 1.7 MB
file_4d5s4etchrdcfot5ujcrpftvem
|
arxiv.org (repository) web.archive.org (webarchive) |
2005.08081v3
access all versions, variants, and formats of this works (eg, pre-prints)