Our approach combines the strengths of reinforcement learning (RL) - planning in high-dimensional observation spaces with complex stochastic dynamics, and of trajectory optimization - guaranteeing constraints satisfaction while executing whole-body trajectories. It does not require collecting a task-specific dataset on the system and transfers zero-shot to hardware.
Website: https://thomasjlew.github.io/publication/table_wiping
Paper: https://arxiv.org/abs/2210.10865