论文标题
基于物理学的灵巧操作,具有估计的手姿势和残余增强学习
Physics-Based Dexterous Manipulations with Estimated Hand Poses and Residual Reinforcement Learning
论文作者
论文摘要
仅使用深度传感器和最先进的3D手姿势估计器(HPE),裸露的手在虚拟环境中巧妙地操纵对象是具有挑战性的。虽然虚拟环境由物理统治,例如物体的重量和表面摩擦,缺乏力反馈使任务具有挑战性,因为即使在HPE的指尖或接触点上的轻微不准确性也可能导致相互作用失败。当手指关节穿透虚拟物体时,先前的艺术只是在手指闭合方向上产生接触力。尽管对于简单的抓握场景有用,但它们不能应用于诸如手机操纵之类的灵巧操作中。现有的强化学习(RL)和模仿学习(IL)方法通过使用特定于任务的奖励而无需考虑任何在线用户输入而学习技能。在这项工作中,我们建议学习一个模型,该模型映射噪声输入手姿势以定位虚拟姿势,该模型引入了所需的联系人,以完成物理模拟器上的任务。通过使用无模型的混合RL+IL方法,在残差设置中训练该试剂。当将物理学引导的校正目标姿势重新映射到输入空间时,引入了3D手姿势估计奖励,从而导致HPE准确性的提高。由于该模型通过应用接触的次要但至关重要的关节位移来纠正HPE错误,因此这有助于在视觉上将生成的运动视觉上接近用户输入。由于不存在执行成功虚拟交互的HPE序列,因此提出了训练和评估系统的数据生成方案。我们在两个应用程序中测试了我们的框架,这些应用程序使用手动姿势估计来进行灵巧的操作:在野外进行VR和手动运动重建中的手动相互作用。
Dexterous manipulation of objects in virtual environments with our bare hands, by using only a depth sensor and a state-of-the-art 3D hand pose estimator (HPE), is challenging. While virtual environments are ruled by physics, e.g. object weights and surface frictions, the absence of force feedback makes the task challenging, as even slight inaccuracies on finger tips or contact points from HPE may make the interactions fail. Prior arts simply generate contact forces in the direction of the fingers' closures, when finger joints penetrate virtual objects. Although useful for simple grasping scenarios, they cannot be applied to dexterous manipulations such as in-hand manipulation. Existing reinforcement learning (RL) and imitation learning (IL) approaches train agents that learn skills by using task-specific rewards, without considering any online user input. In this work, we propose to learn a model that maps noisy input hand poses to target virtual poses, which introduces the needed contacts to accomplish the tasks on a physics simulator. The agent is trained in a residual setting by using a model-free hybrid RL+IL approach. A 3D hand pose estimation reward is introduced leading to an improvement on HPE accuracy when the physics-guided corrected target poses are remapped to the input space. As the model corrects HPE errors by applying minor but crucial joint displacements for contacts, this helps to keep the generated motion visually close to the user input. Since HPE sequences performing successful virtual interactions do not exist, a data generation scheme to train and evaluate the system is proposed. We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.
