Tokicand Palm. Value-difference Based Exploration: Adaptive Control Between Epsilon-greedy and Softmax. Springer Berlin Heidelberg, 2011, doi:10.1007/978-3-642-24455-1_33.