强化学习论文学习

BLENDING IMITATION AND REINFORCEMENT LEARNING FOR ROBUST POLICY IMPROVEMENT

To address the demand for robust policy improvement in real-world scenarios. The authors introduce a novel algorithm, Robust Policy Improvement (RPI), which actively interleaves between IL and RL based on an online estimate of their performance. This algorithm is capable of learning from and improving upon a diverse set of black-box oracles.

PREDICTIVE AUXILIARY OBJECTIVES IN DEEP RL MIMIC LEARNING IN THE BRAIN

1.The ability to predict upcoming events has been hypothesized to comprise a key aspect of natural and machine cognition.

2.The author study the effects predictive auxiliary objectives have on representation learning across different modules of an RL system and how these mimic representational changes observed in the brain.

3.The authors find that representational changes in this RL system bear a striking resemblance to changes in neural activity observed in the brain across various experiments.

4.Their work demonstrates how representation learning in deep RL systems can provide an interpretable framework for modeling multi-region interactions in the barin.

PRE-TRAINING GOAL-BASED MODELS FOR SAMPLE-EFFICIENT REINFORCEMENT LEARNING

1.The authors present PTGM to augment RL by providing temporal abstractions and behavior regularization.

2.They propose clustering goals in the dataset to form a discrete high-level action space

3.They also introduce a pre-trained goal prior model to regularize the behavior of the high-level policy in RL.

EFFICIENT EPISODIC MEMORY UTILIZATION OF COOPERATIVE MULTI-AGENT REINFORCEMENT LEARNING

Existing MARL algorithms are effective but still require significant learning time and often get trapped in local optima by complex tasks. Therefore, the authors introduce EMU for MARL.

1.They add a trainable encoder/decoder structure alongside MARL to leverage coherent memory.

2.EMU introduces a novel reward structure to promoting desirable transitions to prevent local convergence.

SWE-BENCH: CAN LANGUAGE MODELS RESOLVE REAL-WORLD GITHUB ISSUES?

1.To develop the LMs to study the frontier of their capabilities. They introduce SWE-bench, an evaluation framework consisting of 2294 software engineering problems drawn from real GitHub issues.

2.Their evaluations show that the state-of-the-art proprietary models is Claude 2, which is able to solve a mere 1.96% of the issues.

posted on 2025-05-18 08:13  bnbncch  阅读(23)  评论(0)    收藏  举报