2016-08-23 Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks & Replay Buffer Spiking 後で読む http://arxiv.org/pdf/1608.05081.pdf DQNの次はBBQらしいですよ!