Finetuning Offline World Models in the Real World

Paper ID #57

Anonymous Authors
Anonymous Affiliation

Approach. We propose a framework for offline pretraining and online finetuning of world models directly in the real world, without reliance on simulators or synthetic data. Our method iteratively collects new data by planning with the learned model, and finetunes the model on a combination of pre-existing data and newly collected data. Our method can be finetuned few-shot on unseen task variations in ≤20 trials by leveraging novel test-time regularization during planning.

Tasks

Tasks. We consider diverse tasks in simulation (D4RL, xArm) and on a real robot. Our real-world tasks use raw pixels as input. Our method achieves high success rates in offline-to-online transfer to both seen and unseen tasks in just 20 online trials on a real robot.

Qualitative Results

Videos. Our method can be finetuned few-shot to unseen task variations by online RL, directly in the real world. Inputs are raw RGB images and we use sparse rewards. Below videos are generated by our method after just 20 trials.

Reach


Pick

Few-shot Finetuning to Unseen Task Variations

Raw training footage. The videos below contain raw training footage of our method being finetuned with online RL to an unseen task variation (20 trials). While zero-shot transfer fails due to a domain gap, we observe a noticable improvement after a handful of trials. Playback speed has been increased for better viewing.

Quantitative Results

Results. Our method significantly improves the performance of offline-to-online finetuning of world models, and achieves high task success rates in both seen and unseen task variations with as little as 20 online trials on a real robot.