Information retrieval + RL
1. Ranking as Sequential Decision Making
Advantages: beyond independent relevance
2. RL: Learn to make good sequences of decisions
3. Alpha Go:
Breadth reduction: Policy Network 在某一手,下某些区域,是臭棋,可通过PN判断出来不必搜索,因而减少树的宽度。
Depth reduction: Value Network 在树搜索中,某一节点下注定赢不了棋,可通过VN剪枝,因而减少树的深度。
4. Ranking evaluation: NDCG (Normalized Discounted Cumulative Gain); Map(Mean average precision)
5. Monto-Carlo search
posted on 2017-11-22 17:36 WegZumHimmel 阅读(123) 评论(0) 收藏 举报
浙公网安备 33010602011771号