摘要:
So, the process is similar to one-to-many RNN? learn much more efficiently than model-free method iteratively get better less than 300 trials ~ 25min 阅读全文
posted @ 2018-05-02 23:02
ecoflex
阅读(227)
评论(0)
推荐(0)
摘要:
you wouldn't try to explore any problem structure in DFO low dimension policy 30 degrees of freedom 120 paramaters to tune keep the positive results i 阅读全文
posted @ 2018-05-02 13:08
ecoflex
阅读(196)
评论(0)
推荐(0)

浙公网安备 33010602011771号