Imitation learning has been proven effective in mimicking demonstrations across various robotic manipulation tasks. However, to develop robust policies, current imitation methods, such as diffusion policy, require training on extensive demonstrations, making data collection labor-intensive. In contrast, model-based planning with dynamics models can effectively cover a sufficient range of configurations using only off-policy data. Yet, without the guidance of expert demonstrations, many tasks are difficult and time-consuming to plan using the dynamics models. Therefore, we take the best of both dynamics model and imitation learning, and propose neural dynamics augmented imitation learning that covers a large scene configurations with few-shot demonstrations. This method trains a robust diffusion policy in a local support region with few-shot demonstrations, and manipulates the randomly initialized object to this region with neural dynamics models trained offline. Extensive experiments across various tasks in both simulations and real-world scenarios, including granular manipulation, contact-rich task and multi-object interaction task, have demonstrated that trained with only 1 to 30 demonstrations, our proposed method can robustly cover a significantly larger area than the policy trained purely from the demonstrations.
(Left) We propose neural dynamics augmented diffusion policy, with few-shot diffusion policy robustly covering a local supporting region and a dynamics model extending the initial configuration space. The green region denotes the supporting region that few-shot diffusion policy covers, and the red region denotes the space out of the supporting region where model-based planning with the dynamics model can cover. (Right) The proposed method has demonstrated its performance in various tasks. The deep green region denotes the area few-shot diffusion policy can cover, and the light green region denotes our method's augmentation on the space.
Our Proposed Framework. (a) Collecting few-shot human demonstrations that could cover a convex hull in the space. (b) Diffusion policy trained on the few-shot human demonstrations is robust in the local supporting region, but lacks robustness in outside configurations. (c) Model-based planning equipped with dynamics models generates manipulation trajectories from various initial poses to the supporting region. (d) The whole policy leveraging trajectories generated in (c) is robust in the large space.
InsertT: inserting a T-Shape (initially put on a random position with a random orientation on the table) into a slot. The manipulation succeeds when the T is successfully inserted into the slot.
Stow: stowing a book (initially put on a random position with a random orientation on the table) onto the bookshelf with a few books. The manipulation succeeds when all books are placed in upright posture.
DustPan: sweeping sparsely located granular into the dustpan. The evaluation metric in simulation is defined by the ratio of granular successfully swept into the dustpan, with success in the real world determined by sweeping a certain percentage 90% of pieces into the dustpan.
HangMug: hanging a mug (randomly positioned on the table) on the rack. The manipulation succeeds when the mug is successfully hung on the rack.
Qualitative Analysis on InsertT. With the same few-shot human demonstrations, while the original diffusion policy demonstrates robustness only in a certain local region, our proposed method supports the policy robustness in a much wider space with the augmentation of the neural dynamics model. The demonstrations in Diffusion Policy show the few-shot (10) human demonstrations, and those in \textbf{Ours} show dynamics augmented demonstrations cover the large space.
Qualitative Analysis on DustPan, Stow and HangMug. While diffusion policy only covers specific regions, our method covers a significantly larger space with model-based planning to manipulate diverse objects among different tasks into the local supporting region, followed by the few-shot diffusion policy. While the diffusion policy only covers specific regions, our method covers a significantly larger space with model-based planning to manipulate diverse objects among different tasks into the local supporting region, followed by the few-shot diffusion policy. For DustPan, "Planning" denotes this step is fulfilled by model-based planning. For HangMug, red denotes success and blue denotes failure.