A Benchmark for High-Dimensional Robot Control


We introduce a new benchmarking suite for high-dimensional control, targeted at testing high spatial and temporal precision, coordination, and planning, all with an underactuated system frequently making-and-breaking contacts. The proposed challenge is mastering the piano through bi-manual dexterity, using a pair of simulated anthropomorphic robot hands. We call it RoboPianist, and the initial version covers a broad set of 150 variable-difficulty songs. We investigate both model-free and model-based methods on the benchmark, characterizing their performance envelopes. We present that while certain existing methods, when well-tuned, can achieve impressive levels of performance in certain aspects, there is significant room for improvement. RoboPianist provides a rich quantitative benchmarking environment, with human-interpretable results, high ease of expansion by simply augmenting the repertoire with new songs, and opportunities for further research, including in multi-task learning, zero-shot generalization, multimodal (sound, vision, touch) learning, and imitation.



The code for RoboPianist is fully open-sourced on GitHub. It is built on top of MuJoCo and dm_control which makes it plug and play with existing robot learning libraries.



We would like to thank Philipp Wu and Mohit Shridhar for being a constant source of inspiration and support, Ilya Kostrikov for raising the bar for RL engineering and for invaluable debugging help, the Magenta team for helpful pointers and feedback, and the MuJoCo team for the development of the MuJoCo physics engine and their support throughout the project.