论文标题
在连续或大型离散动作空间中计划计划的边际实用程序
Marginal Utility for Planning in Continuous or Large Discrete Action Spaces
论文作者
论文摘要
基于样本的计划是一种有力的算法系列,用于从环境模型中产生智能行为。产生良好的候选动作对于基于样本的计划者的成功至关重要,尤其是在连续或大型动作空间中。通常,候选人的行动产生耗尽了动作空间,使用领域知识,或者最近,涉及学习随机政策以提供此类搜索指导。在本文中,我们通过优化新的目标,边缘实用程序来探讨明确学习候选动作发生器。动作发生器的边缘效用衡量了动作价值的增加,而不是先前生成的动作。我们在卷发中验证了我们的方法,这是一个具有连续状态和动作空间的充满挑战的随机域,以及具有离散但动作空间较大的位置游戏。我们表明,经过边缘实用目标训练的发电机优于基于实质性领域知识,训练有素的随机策略以及其他自然目标的手工编码方案,用于为基于采样的规划人员制定动作。
Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.
