CMU researchers revolutionize robot training by teaching tasks through videos

By Srikanth
4 Min Read
CMU researchers revolutionize robot training by teaching tasks through videos 1

Robots have been programmed in such a way that they can learn about household chores by simply watching videos of humans performing these tasks.


Researchers made this fascinating breakthrough from the School of Computer Science at Carnegie Mellon University (CMU).

This revolutionary advancement in robotics has capabilities to enhance the functionality of robots in domestic settings. This ability enables them to facilitate assistance in tasks including cooking and cleaning.

Two robots are professionally trained to perform 12 varied tasks that belong to household work. These tasks include opening a drawer, an oven, or a lid, taking a pot off the stove, and picking up objects like a telephone.

Deepak Pathak, an assistant professor at CMU’s Robotics Institute, has said that the robot can only learn where and how humans interact with different objects by watching videos.

The knowledge acquired through these learning videos was instrumental in training a model that allowed the robots to perform similar tasks in various environments.

Training the robots conventionally involves humans manually demonstrating tasks or extensive training in a simulated environment.

The WHIRL method vs the VRB method

Pathak and his students proposed a novel method to observe humans’ complete tasks. This method is named as In-the-Wild Human Imitating Robot Learning (WHIRL).

Pathak’s latest research, the Vision-Robotics Bridge (VRB), expanded and refined the WHIRL concept. This new model eradicates the requirement for the robot to operate in an identical environment.

However, the robot still has the scope of practice to perfect a task as perfection. It is said and guaranteed that the robot can learn new tasks and work in just 25 minutes.

Shikhar Bahl, a PhD student in robotics, has also mentioned that they could take robots around campus and do all sorts of tasks.

Robots can use this model curiously.

The key to teaching the robots was applying the concept of affordances. This idea rooted in psychology refers to what an environment offers an individual.

In the case of VRB, the help of affordances was taken to determine where and how a robot might interact with an object.

For example, if a robot looks at human activities such as opening a drawer, it identifies the points of contact and the direction of the drawer’s movement.

For the complete training of the robots, the team utilized large datasets of videos that include Ego4D and Epic Kitchens. The Ego4D dataset comprises 4,000 hours of first-person perspective videos for daily activities.

Similar to the characteristics of Epic Kitchens, videos showcasing cooking, cleaning, and other kitchen tasks. These datasets are typically used to train computer vision models.

The company is also utilising these datasets in a new and different manner. This work could enable robots to learn from the vast amount of Internet and YouTube videos on the platform.

This forward-thinking application of existing datasets promises a fascinating future where domestic robots can be trained to perform various tasks. This ultimately makes our lives easier and more efficient.

Share This Article
Passionate Tech Blogger on Emerging Technologies, which brings revolutionary changes to the People life.., Interested to explore latest Gadgets, Saas Programs