05 2018 档案
摘要:https://en.wikipedia.org/wiki/Bayesian_inference
阅读全文
摘要:
阅读全文
摘要:skip over
阅读全文
摘要:...
阅读全文
摘要:contact, friction, etc. are unknown
阅读全文
摘要:
阅读全文
摘要:skip over
阅读全文
摘要:...
阅读全文
摘要:
阅读全文
摘要:
阅读全文
摘要:jump over this lecture
阅读全文
摘要:...
阅读全文
摘要:...
阅读全文
摘要:after the break, we'll extend our IRL into continuous spaces
阅读全文
摘要:yellow region corresponds to β blue to α
阅读全文
摘要:...
阅读全文
摘要:make compromise between learnt policy and minimal cost! π hat is using states π theta is using observations
阅读全文
摘要:MPC means replan every step Every N step, rebuild the dynamic model
阅读全文
摘要:transition possibility is unknown and we even don't need to estimate the possibility
阅读全文
摘要:understand that correlated samples cause problem. and how paralled solve the problem another solution is replay buffers, fully ultilizing the advantag
阅读全文
摘要:in most AC algorithms, we actually just fit value function. less common to fit Q function as well. batch:off line, monte carlo。online: bootstrap,TD in
阅读全文
摘要:green bar is the reward function, blue curve is the possibility of differenct trajectories if green bars are equally increased to yellow bars, the res
阅读全文
摘要:first order markov chain on policy algorithm is easier to be paralleled off policy algorithm has to fit transition net, and policy net. much more comp
阅读全文
摘要:前面弄错了,应该看2017的秋季课,结果看了春季课了。 neural network control a virtual robot, by imitating human motion Domain shift cause the failure of supervised learning in
阅读全文
摘要:initialization dramatically influences the trajectory. the current state depends on all the past decision. ones reflect the dimensions being counted.
阅读全文
摘要:There are some problems: mismatch of model and reality; gradient explosion so, the dynamics can be quite messy, and backpropogating can be quite probl
阅读全文
摘要:...
阅读全文
摘要:solved normally by sequential quadratic programming algorithms an example of linear system
阅读全文
摘要:You have to force experts to treat some uncommon and extreme situations. a mechanical way to learn However, we don't know rt if you use sequence GAN,
阅读全文
摘要:not only JS divergence could be applied to GAN, other divergences are all applicable! f start is convex several ACG icons become very similar, if trai
阅读全文
摘要:Too much limitation of Gaussian model. The images are too blurry. So any general model? But if PG(x;θ) is a neural network, it's impossible to calcula
阅读全文
摘要:HW2: input a sentence, output an ACG icon 3 target: trains from front view, side views. So that the output would be the average of the three pictures.
阅读全文
摘要:比较有用的是conditioned generator,能够控制输入的vector来控制对应的文字音像 https://zhuanlan.zhihu.com/p/24767059 单纯生成人脸意义不大,因为随便拍一个路人就行了。 但是能从左右照片生成正面照片,就很神奇了 必须学会辨别转折 Varia
阅读全文
摘要:https://www.bilibili.com/video/av15997678/ My own deep reinforcement learning code: https://github.com/ysgclight/Reinforcement-Learning-with-Pytorch D
阅读全文
摘要:data augumentation
阅读全文
摘要:10 free hours run on AWS click this one click on new machine pick a region choose linux ubuntu 16 250GB is preferred ctrl shift v to paste your passwo
阅读全文
摘要:https://www.bilibili.com/video/av22940029 left hand side: NN being constructed right hand side: NN being called turn the NN code into GPU compatible m
阅读全文
摘要:high bias if the robot has learnt something (no changes appear with iterations) however, in the real world tasks, the task could change a little bit,
阅读全文
摘要:model free: high variance. model based: high bias within 1h of human demonstration of each task, VR!!!
阅读全文
摘要:intrinsic ambiguity: move toward purple triangle? move away from red triangle? move along grey arrow? or the combine of them? the right part of the ri
阅读全文
摘要:
阅读全文
摘要:So, the process is similar to one-to-many RNN? learn much more efficiently than model-free method iteratively get better less than 300 trials ~ 25min
阅读全文
摘要:you wouldn't try to explore any problem structure in DFO low dimension policy 30 degrees of freedom 120 paramaters to tune keep the positive results i
阅读全文
摘要:^ is the square root of epsilon a simplified version of hard version a more smooth way to find correct solution the first term is the REINFORCE term,
阅读全文
摘要:fast feedback to robot with better shape reward func, and learning could be much faster open ai baseline rllab multiple tasks and multiple seeds to te
阅读全文
摘要:https://statweb.stanford.edu/~owen/mc/Ch-var-is.pdf https://zhuanlan.zhihu.com/p/29934206 blue curve is the lower bounded one conjugate gradient to so
阅读全文
摘要:https://drive.google.com/file/d/0BxXI_RttTZAhTUpqUFdEZ3BXNFE/view game of Pong is a MDP. 终于一睹AK真容了,很有想法,很幽默 http://karpathy.github.io/
阅读全文
摘要:http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html https://zhuanlan.zhihu.com/p/22252270
阅读全文

浙公网安备 33010602011771号