Robot Chameleons: How SLoMo Transfers Animal and Human Motion Skills from Wild Videos to Legged Robots


Imagine a future where robots are capable of mimicking the graceful movements of animals and humans. Sounds like science fiction, doesn’t it? However, with the advent of the revolutionary SLoMo framework, this future may be closer than we think. SLoMo, short for “Skilled Locomotion from Monocular Videos,” is a groundbreaking method that enables legged robots to imitate animal and human motions by transferring these skills from casual, real-world videos.

SLoMo: A Three-Stage Framework

The SLoMo framework works in three stages:

  1. Synthesize a physically plausible reconstructed key-point trajectory from monocular videos.
  2. Optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely track the key points.
  3. Track the reference trajectory online using a general-purpose model-predictive controller on robot hardware.

This innovative approach surpasses traditional motion imitation techniques that often require expert animators, collaborative demonstrations, or expensive motion capture equipment. With SLoMo, all that’s needed is readily available monocular video footage, such as those found on YouTube.

Successful Demonstrations and Comparisons

SLoMo has been demonstrated across a range of hardware experiments on a Unitree Go1 quadruped robot and simulation experiments on the Atlas humanoid robot. This approach has proven more general and robust than previous motion imitation methods, handling unmodeled terrain height mismatch on hardware and generating offline references directly from videos without annotation.

Limitations and Future Work

Despite its promise, SLoMo does have limitations, such as key model simplifications and assumptions, as well as manual scaling of reconstructed characters. To further refine and improve the framework, future research should focus on:

  1. Extending the work to use full-body dynamics in both offline and online optimization steps.
  2. Automating the scaling process and addressing morphological differences between video characters and corresponding robots.
  3. Investigating improvements and trade-offs by using combinations of other methods in each stage of the framework, such as leveraging RGB-D video data.
  4. Deploying the SLoMo pipeline on humanoid hardware, imitating more challenging behaviors, and executing behaviors on more challenging terrains.

As SLoMo continues to evolve, the possibilities for robot locomotion and motion imitation are virtually limitless. This innovative framework may well be the key to unlocking a future where robots can seamlessly blend in with the natural world, walking, running, and even playing alongside their animal and human counterparts.

SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

John Z. Zhang Shuo Yang Gengshan Yang Arun L. Bishop Deva Ramanan Zachary Manchester
Robotics Institute, Carnegie Mellon University

AWS Cloud Credit for Research
Previous articleThe AI Job Market Boom: How to Navigate the World’s Fastest-Growing Industry
Next articleRevolutionizing Urban Robotics: SMAT Enables Autonomous Navigation in Unbounded and Dynamic Environments
Ava Martinez is a passionate AI enthusiast and technical writer based just outside of New York City. With a background in computer science and a strong affinity for artificial intelligence, Ava has made a name for herself by contributing to numerous publications and online forums on various AI topics. As a proud Latina, she enjoys bringing a unique perspective to the world of technology, bridging cultural gaps and promoting diversity within the field. When she's not busy writing, Ava can often be found exploring the city's hidden gems or engaging in thought-provoking conversations at local tech meetups.


Please enter your comment!
Please enter your name here