摘要:
There are some problems: mismatch of model and reality; gradient explosion so, the dynamics can be quite messy, and backpropogating can be quite probl 阅读全文
摘要:
You have to force experts to treat some uncommon and extreme situations. a mechanical way to learn However, we don't know rt if you use sequence GAN, 阅读全文
摘要:
not only JS divergence could be applied to GAN, other divergences are all applicable! f start is convex several ACG icons become very similar, if trai 阅读全文
摘要:
Too much limitation of Gaussian model. The images are too blurry. So any general model? But if PG(x;θ) is a neural network, it's impossible to calcula 阅读全文
摘要:
HW2: input a sentence, output an ACG icon 3 target: trains from front view, side views. So that the output would be the average of the three pictures. 阅读全文
摘要:
https://www.bilibili.com/video/av15997678/ My own deep reinforcement learning code: https://github.com/ysgclight/Reinforcement-Learning-with-Pytorch D 阅读全文
摘要:
10 free hours run on AWS click this one click on new machine pick a region choose linux ubuntu 16 250GB is preferred ctrl shift v to paste your passwo 阅读全文
摘要:
https://www.bilibili.com/video/av22940029 left hand side: NN being constructed right hand side: NN being called turn the NN code into GPU compatible m 阅读全文
摘要:
high bias if the robot has learnt something (no changes appear with iterations) however, in the real world tasks, the task could change a little bit, 阅读全文
摘要:
intrinsic ambiguity: move toward purple triangle? move away from red triangle? move along grey arrow? or the combine of them? the right part of the ri 阅读全文
摘要:
So, the process is similar to one-to-many RNN? learn much more efficiently than model-free method iteratively get better less than 300 trials ~ 25min 阅读全文
摘要:
you wouldn't try to explore any problem structure in DFO low dimension policy 30 degrees of freedom 120 paramaters to tune keep the positive results i 阅读全文
摘要:
^ is the square root of epsilon a simplified version of hard version a more smooth way to find correct solution the first term is the REINFORCE term, 阅读全文
摘要:
fast feedback to robot with better shape reward func, and learning could be much faster open ai baseline rllab multiple tasks and multiple seeds to te 阅读全文