Improved Optimistic Algorithm For The Multinomial Logit Contextual Bandit release_abp3jwxqdne77bsizabococ4uy

by Priyank Agrawal, Vashist Avadhanula, Theja Tulabandhula


NOTE: currently batch computed and may include additional references sources, or be missing recent changes, compared to entity reference list.
Fuzzy reference matching is a work in progress!
Read more about quality, completeness, and caveats in the fatcat guide.
Showing 1 - 7 of 7 references (in 91ms)

via grobid
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312-2320, 2011.

via fuzzy
Linear Thompson sampling revisited
Marc Abeille, Alessandro Lazaric
2017   Electronic Journal of Statistics
doi:10.1214/17-ejs1341si [PDF]

via fuzzy
Thompson Sampling for the MNL-Bandit [article]
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
2017    pre-print
version:v1  arXiv:1706.00977v1 [PDF]

via fuzzy
MNL-Bandit: A Dynamic Learning Approach to Assortment Selection
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
2019   Operations Research

via grobid
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397-422, 2002.

via grobid
Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4:384-414, 2010.

via grobid
S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning. Now Publishers, 2012. ISBN 9781601986269. Sinceμ i M L (·) is positive for all i ∈ [K], there we define a norm inducing positive semi-definite matrix as: