Learning aggressive animal locomotion skills for quadrupedal robots solely from monocular videos

Learning aggressive animal locomotion skills for quadrupedal robots solely from monocular videos

Learning aggressive animal locomotion skills for quadrupedal robots solely from monocular videos

Released on

Sep 19, 2025

Read

7 min

Category

Research

2D pose extraction and tracking analysis

This section evaluates the performance of 2D pose estimation and tracking from monocular videos. The first step of 2D skeleton extraction involves annotating certain datasets to fine-tune the DeepLabCut model21. As depicted in Fig. S2, we visualize the results of 2D pose estimation, showing the red joints representing the right legs and distinguishing them from the left legs. During visualization, we connect the key points based on the real skeletal structure of the dog, forming the 2D skeleton graph.

3D pose estimation module analysis

Art, Music + Culture (Without the Crowds) Toronto’s creative scene doesn’t wait for the weekend. Whether you’re checking out a live music performance or surprise DJ set at Drake Underground or exploring a contemporary art exhibition by an inspiring local artist, our weeknight programming is not to be missed. When you stay at The Drake, you’re invited in as a friend to Toronto’s cultural community.

Ablation experiment of the STG module

To verify the effectiveness of the designed spatio-temporal graph convolution module, we adjust the number and size of STG modules and study their optimal stacking quantity. To achieve optimal performance of STGNet, we first investigate the impact of the number of stacked STG modules. According to Table S2, the 3D pose MPJPE performance of STGNet is best when the number of STG module stacks is four with a receptive field of 243 frames. Additionally, we vary the network width of the STG modules, dividing them into Large, middle, and small modules. In the case of stacking four modules, as shown in Table S3, the middle STG module achieves the best 3D pose estimation results.

Experiment of STGNet efficiency

To validate the effectiveness of our designed 3D pose estimation algorithm, we compare it with the VideoPose3D29, LiftPose3D30, SemGCN31, GLA-GCN32 algorithm to evaluate the accuracy of the model in 3D skeleton estimation. We observe that our model achieves a lower loss in 3D pose during the training process and a lower MPJPE on the validation dataset, demonstrating that STGNet is capable of extracting features more effectively from GST, leading to more accurate predictions in 3D pose estimation.

Gallop

We included gallop as one of the real-world experiments to achieve agile and rapid quadrupedal motion. The robot’s galloping performance is showcased in Movie S2. We compare the galloping motion of a real dog in a video with the gallop action imitated by AlienGo. During the yellow period, the robot’s front feet leave the ground, leaving only the rear feet in contact, while the calf joints of the two rear legs exert force, enabling the robot to ascend in height and shift in the direction of motion.

Partner with

ReOps

to Shape the Future.

Intellitech Pvt. Ltd.

Technology

Fire Eye

COMING SOON

AutoFill Pro

COMING SOON

Socials

ReOps Intellitech Pvt. Ltd. © 2025 All Rights Reserved

Developed by

Partner with

ReOps

to Shape the Future.

Intellitech Pvt. Ltd.

Technology

Fire Eye

COMING SOON

AutoFill Pro

COMING SOON

Socials

ReOps Intellitech Pvt. Ltd. © 2025 All Rights Reserved

Developed by

Partner with

ReOps

to Shape the Future.

Intellitech Pvt. Ltd.

Technology

Fire Eye

COMING SOON

AutoFill Pro

COMING SOON

Socials

ReOps Intellitech Pvt. Ltd. © 2025 All Rights Reserved

Developed by