| | | |

|

[paper reading][AAAI 2021] DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

1 Introduction
4 Background on Automata Synthesis
5 DeepSynth
Appendix
Discussion and Comments

AAAI 2021
https://arxiv.org/pdf/1911.10244.pdf
a seq of high-level objectives
synthesis of compact automata, human-interpretable
enrich the state space, discover structure

1 Introduction

hard to solve: sparse, non-Markovian, long-term sequential
ours: DeepSynth, infer unknown sequential dependencies, high-level objectives
automata: interpretable
off-the-shelf unsupervised image segmentation

4 Background on Automata Synthesis

the algorithm used for automatic inference of unknown high-level sequential structures as automata
input: a trace sequence. output: N-state automaton
- briefly: 1 to N, search, smallest conforming to...
- sliding window of \(w\) (hyperparam), only unique segments are processed, reduced the size (000 012 222 222 234 444 ...)
- hyperparam \(l\): length of negative examples. too small: accpets all. too long: NP hard
fail: add to

5 DeepSynth

other Atari: avoid, collect, no particular order. Montezuma's Revenge: long, complex sequence
- actually not a seq, shortcut!
the input image is segmented into enough objects whose correlation can guide the agent

Step 1

intrinsic reward \(r^T=\hat r +\mu r^i\), depends on the inferred automaton
stack of four \(84*84*4\)
transition tuple: additional \(L(s')\) to detect collision. empty or multiple: √

Step 2

an automaton comforms to the labels in the trace, label: a symbol of the alphabet

Step 3

new label, expand, reward
much less than extrinsic reward
invariant optimal policy and its proof
product MDP, incorporate 2 MDPs (DFA is a trivial MDP)
non-Markovianity is resloved where the DFA represents the history
multiple NNs for multiple states
- interconnected (through Q-learning, "backwards")
- don't share weights, buffers, \(\epsilon\) for \(\epsilon-\)greedy

Appendix

Discussion and Comments

strong prior: "the state space of the DFA": what's the overlapping object?
not general
maybe a pioneer to this field.

发表于 2021-11-12 13:34 minor_second 阅读(99) 评论(0) 收藏举报

刷新页面返回顶部