Haarnoja, et al.. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. 8 Aug. 2018.