Improved Optimistic Algorithm For The Multinomial Logit Contextual Bandit release_abp3jwxqdne77bsizabococ4uy

by Priyank Agrawal, Vashist Avadhanula, Theja Tulabandhula

Released as a article .

2020  

Abstract

We consider a dynamic assortment selection problem where the goal is to offer a sequence of assortments of cardinality at most K, out of N items, to minimize the expected cumulative regret (loss of revenue). The feedback is given by a multinomial logit (MNL) choice model. This sequential decision making problem is studied under the MNL contextual bandit framework. The existing algorithms for MNL contexual bandit have frequentist regret guarantees as Õ(κ√(T)), where κ is an instance dependent constant. κ could be arbitrarily large, e.g. exponentially dependent on the model parameters, causing the existing regret guarantees to be substantially loose. We propose an optimistic algorithm with a carefully designed exploration bonus term and show that it enjoys Õ(√(T)) regret. In our bounds, the κ factor only affects the poly-log term and not the leading term of the regret bounds.
In text/plain format

Archived Files and Locations

application/pdf  784.8 kB
file_sdobkh5rf5a63mt5xxfq3t2aqq
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-11-28
Version   v1
Language   en ?
arXiv  2011.14033v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: f75a7534-7e39-4186-9b5e-1faefc91cd1b
API URL: JSON