A Contextual-Bandit Approach to Online Learning to Rank for Relevance
and Diversity
release_xxkgm4ao2vgs5dge7tsczdiqly
by
Chang Li, Haoyun Feng, Maarten de Rijke
2019
Abstract
Online learning to rank (LTR) focuses on learning a policy from user
interactions that builds a list of items sorted in decreasing order of the item
utility. It is a core area in modern interactive systems, such as search
engines, recommender systems, or conversational assistants. Previous online LTR
approaches either assume the relevance of an item in the list to be independent
of other items in the list or the relevance of an item to be a submodular
function of the utility of the list. The former type of approach may result in
a list of low diversity that has relevant items covering the same aspects,
while the latter approaches may lead to a highly diversified list but with some
non-relevant items.
In this paper, we study an online LTR problem that considers both item
relevance and topical diversity. We assume cascading user behavior, where a
user browses the displayed list of items from top to bottom and clicks the
first attractive item and stops browsing the rest. We propose a hybrid
contextual bandit approach, called CascadeHybrid, for solving this problem.
CascadeHybrid models item relevance and topical diversity using two independent
functions and simultaneously learns those functions from user click feedback.
We derive a gap-free bound on the n-step regret of CascadeHybrid. We conduct
experiments to evaluate CascadeHybrid on the MovieLens and Yahoo music
datasets. Our experimental results show that CascadeHybrid outperforms the
baselines on both datasets.
In text/plain
format
Archived Files and Locations
application/pdf 4.8 MB
file_6ybrrluwb5hvzl6bet2pfys26e
|
arxiv.org (repository) web.archive.org (webarchive) |
1912.00508v2
access all versions, variants, and formats of this works (eg, pre-prints)