论文标题
使用确定性抽样和搭配的直接策略优化
Direct Policy Optimization using Deterministic Sampling and Collocation
论文作者
论文摘要
我们提出了一种方法,用于结合直接轨迹优化,确定性抽样和策略优化,以近似求解离散的随机最佳控制问题。我们的反馈运动规划算法使用准Newton方法同时优化参考轨迹,一组确定性选择的样本轨迹和参数化的策略。我们证明,在线性动力学,二次目标和高斯干扰的情况下,这种方法准确地恢复了LQR策略。我们还展示了几种非线性,不足的机器人系统的算法,以强调其性能和处理控制限制,安全避免障碍并在存在未建模动态的情况下生成健壮计划的能力。
We present an approach for approximately solving discrete-time stochastic optimal-control problems by combining direct trajectory optimization, deterministic sampling, and policy optimization. Our feedback motion-planning algorithm uses a quasi-Newton method to simultaneously optimize a reference trajectory, a set of deterministically chosen sample trajectories, and a parameterized policy. We demonstrate that this approach exactly recovers LQR policies in the case of linear dynamics, quadratic objective, and Gaussian disturbances. We also demonstrate the algorithm on several nonlinear, underactuated robotic systems to highlight its performance and ability to handle control limits, safely avoid obstacles, and generate robust plans in the presence of unmodeled dynamics.
