Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition release_td7rf6s2xzfm5ntagdtiyovx4m

by Jason Pelecanos and Quan Wang and Ignacio Lopez Moreno

Released as a article .

2021  

Abstract

Many neural network speaker recognition systems model each speaker using a fixed-dimensional embedding vector. These embeddings are generally compared using either linear or 2nd-order scoring and, until recently, do not handle utterance-specific uncertainty. In this work we propose scoring these representations in a way that can capture uncertainty, enroll/test asymmetry and additional non-linear information. This is achieved by incorporating a 2nd-stage neural network (known as a decision network) as part of an end-to-end training regimen. In particular, we propose the concept of decision residual networks which involves the use of a compact decision network to leverage cosine scores and to model the residual signal that's needed. Additionally, we present a modification to the generalized end-to-end softmax loss function to target the separation of same/different speaker scores. We observed significant performance gains for the two techniques.
In text/plain format

Archived Files and Locations

application/pdf  443.7 kB
file_gcjhbmmdpvhhvgbc5ljj7emkhy
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-06-16
Version   v2
Language   en ?
arXiv  2104.01989v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 74f0f532-2164-4ae3-b229-7e0ecfbb13b7
API URL: JSON