随笔档案「2018年5月」 - ecoflex

摘要：understand that correlated samples cause problem. and how paralled solve the problem another solution is replay buffers, fully ultilizing the advantag 阅读全文

posted @ 2018-05-26 19:57 ecoflex 阅读(223) 评论(0) 推荐(0)

CS294-112 深度强化学习秋季学期（伯克利）NO.5 Actor-critic introduction

摘要：in most AC algorithms, we actually just fit value function. less common to fit Q function as well. batch：off line， monte carlo。online： bootstrap，TD in 阅读全文

posted @ 2018-05-26 12:28 ecoflex 阅读(218) 评论(0) 推荐(0)

CS294-112 深度强化学习秋季学期（伯克利）NO.4 Policy gradients introduction

摘要：green bar is the reward function, blue curve is the possibility of differenct trajectories if green bars are equally increased to yellow bars, the res 阅读全文

posted @ 2018-05-24 23:13 ecoflex 阅读(145) 评论(0) 推荐(0)

CS294-112 深度强化学习秋季学期（伯克利）NO.3 Reinforcement learning introduction

摘要：first order markov chain on policy algorithm is easier to be paralleled off policy algorithm has to fit transition net, and policy net. much more comp 阅读全文

posted @ 2018-05-24 18:13 ecoflex 阅读(161) 评论(0) 推荐(0)

CS294-112 深度强化学习秋季学期（伯克利）NO.1 Introduction NO.2 Supervised learning and imitation

摘要：前面弄错了，应该看2017的秋季课，结果看了春季课了。 neural network control a virtual robot, by imitating human motion Domain shift cause the failure of supervised learning in 阅读全文

posted @ 2018-05-24 16:43 ecoflex 阅读(1090) 评论(0) 推荐(0)

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.5 Guest lecture: lgor Mordatch （open ai）

摘要：initialization dramatically influences the trajectory. the current state depends on all the past decision. ones reflect the dimensions being counted. 阅读全文

posted @ 2018-05-24 13:59 ecoflex 阅读(319) 评论(0) 推荐(0)

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.4 Learning policies by imitating optimal controllers

摘要：There are some problems: mismatch of model and reality; gradient explosion so, the dynamics can be quite messy, and backpropogating can be quite probl 阅读全文

posted @ 2018-05-23 19:14 ecoflex 阅读(348) 评论(0) 推荐(0)

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.3 Learning dynamical system models from data

摘要：... 阅读全文

posted @ 2018-05-22 19:58 ecoflex 阅读(214) 评论(0) 推荐(0)

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.2 optimal control and planning

摘要：solved normally by sequential quadratic programming algorithms an example of linear system 阅读全文

posted @ 2018-05-21 20:33 ecoflex 阅读(221) 评论(0) 推荐(0)

【李宏毅深度学习合辑】Advanced Topics in Deep Learning - Imitation Learning

摘要：You have to force experts to treat some uncommon and extreme situations. a mechanical way to learn However, we don't know rt if you use sequence GAN, 阅读全文

posted @ 2018-05-19 20:21 ecoflex 阅读(469) 评论(0) 推荐(0)

李宏毅 2018最新GAN课程 class 4 fGAN: General Framework of GAN

摘要：not only JS divergence could be applied to GAN, other divergences are all applicable! f start is convex several ACG icons become very similar, if trai 阅读全文

posted @ 2018-05-15 18:43 ecoflex 阅读(664) 评论(0) 推荐(0)

李宏毅 2018最新GAN课程 class 3 Theory behind GAN

摘要：Too much limitation of Gaussian model. The images are too blurry. So any general model? But if PG(x;θ) is a neural network, it's impossible to calcula 阅读全文

posted @ 2018-05-15 14:50 ecoflex 阅读(459) 评论(0) 推荐(0)

李宏毅 2018最新GAN课程 class 2 Conditional Generation by GAN

摘要：HW2: input a sentence, output an ACG icon 3 target: trains from front view, side views. So that the output would be the average of the three pictures. 阅读全文

posted @ 2018-05-14 23:12 ecoflex 阅读(1661) 评论(0) 推荐(0)

李宏毅 2018最新GAN课程 class 1 Introduction

摘要：比较有用的是conditioned generator，能够控制输入的vector来控制对应的文字音像 https://zhuanlan.zhihu.com/p/24767059 单纯生成人脸意义不大，因为随便拍一个路人就行了。但是能从左右照片生成正面照片，就很神奇了必须学会辨别转折 Varia 阅读全文

posted @ 2018-05-13 13:12 ecoflex 阅读(3583) 评论(0) 推荐(0)

Records of Pytorch in Practice

摘要：https://www.bilibili.com/video/av15997678/ My own deep reinforcement learning code: https://github.com/ysgclight/Reinforcement-Learning-with-Pytorch D 阅读全文

posted @ 2018-05-06 14:54 ecoflex 阅读(183) 评论(0) 推荐(0)

fast.ai Lesson 2: Deep Learning 2018

摘要：data augumentation 阅读全文

posted @ 2018-05-05 19:46 ecoflex 阅读(187) 评论(0) 推荐(0)

fast.ai Lesson 1: Deep Learning 2018

摘要：10 free hours run on AWS click this one click on new machine pick a region choose linux ubuntu 16 250GB is preferred ctrl shift v to paste your passwo 阅读全文

posted @ 2018-05-05 18:49 ecoflex 阅读(572) 评论(0) 推荐(0)

Learn You a PyTorch! (aka Introduction Into PyTorch)

摘要：https://www.bilibili.com/video/av22940029 left hand side: NN being constructed right hand side: NN being called turn the NN code into GPU compatible m 阅读全文

posted @ 2018-05-04 18:02 ecoflex 阅读(160) 评论(0) 推荐(0)

Deep RL Bootcamp Frontiers Lecture I: Recent Advances,

摘要：high bias if the robot has learnt something (no changes appear with iterations) however, in the real world tasks, the task could change a little bit, 阅读全文

posted @ 2018-05-04 17:14 ecoflex 阅读(265) 评论(0) 推荐(0)

Deep RL Bootcamp TAs Research Overview

摘要：model free: high variance. model based: high bias within 1h of human demonstration of each task, VR!!! 阅读全文

posted @ 2018-05-04 15:34 ecoflex 阅读(255) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 10B Inverse Reinforcement Learning

摘要：intrinsic ambiguity: move toward purple triangle? move away from red triangle? move along grey arrow? or the combine of them? the right part of the ri 阅读全文

posted @ 2018-05-04 13:58 ecoflex 阅读(365) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 10A Utlities

摘要：阅读全文

posted @ 2018-05-03 18:55 ecoflex 阅读(186) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 9 Model-based Reinforcement

摘要：So, the process is similar to one-to-many RNN? learn much more efficiently than model-free method iteratively get better less than 300 trials ~ 25min 阅读全文

posted @ 2018-05-02 23:02 ecoflex 阅读(230) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 8 Derivative Free Methods

摘要：you wouldn't try to explore any problem structure in DFO low dimension policy 30 degrees of freedom 120 paramaters to tune keep the positive results i 阅读全文

posted @ 2018-05-02 13:08 ecoflex 阅读(200) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 7: SVG, DDPG, and Stochastic Computation Graphs

摘要：^ is the square root of epsilon a simplified version of hard version a more smooth way to find correct solution the first term is the REINFORCE term, 阅读全文

posted @ 2018-05-01 22:38 ecoflex 阅读(285) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

摘要：fast feedback to robot with better shape reward func, and learning could be much faster open ai baseline rllab multiple tasks and multiple seeds to te 阅读全文

posted @ 2018-05-01 21:34 ecoflex 阅读(352) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

摘要：https://statweb.stanford.edu/~owen/mc/Ch-var-is.pdf https://zhuanlan.zhihu.com/p/29934206 blue curve is the lower bounded one conjugate gradient to so 阅读全文

posted @ 2018-05-01 17:38 ecoflex 阅读(368) 评论(0) 推荐(0)

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

摘要：https://drive.google.com/file/d/0BxXI_RttTZAhTUpqUFdEZ3BXNFE/view game of Pong is a MDP. 终于一睹AK真容了，很有想法，很幽默 http://karpathy.github.io/ 阅读全文

posted @ 2018-05-01 12:52 ecoflex 阅读(178) 评论(0) 推荐(0)

Comparison among several SGD derivation

摘要：http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html https://zhuanlan.zhihu.com/p/22252270 阅读全文

posted @ 2018-05-01 12:45 ecoflex 阅读(105) 评论(0) 推荐(0)

ecoflex

05 2018 档案

公告