摘抄

Tips##

缩写使用：在摘要当中尽量不要使用缩写，在正文当中开始使用
用its等指示代词时，要明确指示的是具体哪个。
限制性定语从句一律用that
没有信息，特别抽象的句子，可以删除（尤其是自以为综合的，放在段首的句子）
一定要注意单复数的检查

Abstract##

Introduction##

该处出现的方法，只有在引用时，才能出现。
Specific contributions of this paper are listed as follows.

We formulate an
We develop a NC-FG to characterize the formulated optimization
We propose a robust

Since reinforcement learning techniques do not use an explicit teacher or supervisor, they construct an internal evaluator, or cirtic, capable of evaluating the dynamic system's performance. The construction of this cirtic so that it can properly evaluate the performance in a way which is useful to control objective is itself a significant problem in reinforcement learning. Given the evaluation by the cirtic, the other problem in reinforcement learning is how to adjust the control signal.

Reinforcement learning techniques assume that, during the learning process, no supervisor is present to directly judge the quality of the selected control action and instead, the final evaluation of a process is known after a long sequence of actions. The reinforcements received by the learning system can only be used to learn how to predict the outcome of the selected actions. Sutton used two neuro-like elements to sovle the learning problem in cart-pole balancing. In this approach, the state-space is partitioned into non-overlapping smaller regions and then credit assignment is performed on a local basis. However, in fuzzy reinforcement learning, the partitions of the state space can overlap leading to the use of fuzzy partitions in the antecedents and consequents of fuzzy rules. The reinforcements from the environment are then used to refine the fuzzy membership functions in the rules.

Temporal Difference and Q-learning methods can be used to learn how to select an appropriate continuous or discrete action, respectively. The Temporal Difference method and Q-learning are similiar in terms of using a distal teacher for learning. In TD, two fundamental memory structure, one for evaluation and another for policy, are kept. However, collapses these two and only maintains a structure which is a cross between an evaluation function and a policy.

For our purpose in demonstrating Fuzzy Q-learning, we assume that whenever these states are reached, then the state of the system immediately transfers to the starting state which can either be fixed or randomly selected. A fuzzy constrant on each state-action pair takes value in \([0,1]\) where the lower the value, the harder is to take the action.

Generalization is an important concern for application of machine learning in large problems. In most learning techniques, including the family of reinforcement learning algorithms, parameterized function approximators are needed to generalize between similiar situations and actions.

In many tasks to which we would like to apply reinforcement learning, most states will be encountered only once, which makes it very difficult to learn the value of these states. This will almost always the case when the state space is very large or continuous. The only way to learn in these domains is to generalize the learning experience from previously encountered states to the similar ones that have not been visited yet.

posted @ 2019-09-19 21:07 温酒待君归阅读(202) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

摘抄

Tips##

Abstract##

Introduction##

公告