KeyCLD

Learning Constrained Lagrangian Dynamics in Keypoint Coordinates from Images

Posted by Rembert Daems on June 10, 2022

Rembert Daems, Jeroen Taets, Francis wyffels, Guillaume Crevecoeur

This paper was published in Neurocomputing 573 (2024): 127175 and presented (oral, top 1.6%) at the Machine Learning and the Physical Sciences Workshop at NeurIPS 2023.

paper code slides

Abstract

We present KeyCLD, a framework to learn Lagrangian dynamics from images. Learned keypoints represent semantic landmarks in images and can directly represent state dynamics. Interpreting this state as Cartesian coordinates coupled with explicit holonomic constraints, allows expressing the dynamics with a constrained Lagrangian. KeyCLD is trained unsupervised end-to-end on sequences of images. Our method explicitly models the mass matrix, potential energy and the input matrix, thus allowing energy based control. We demonstrate learning of Lagrangian dynamics from images on the dm_control pendulum, cartpole and acrobot environments. We show that KeyCLD can be learned on these systems, wether they are unactuated, underactuated or fully actuated. Trained models are able to produce long-term video predictions, showing that the dynamics are accurately learned. We compare with Lag-VAE, Lag-caVAE and HGN, and ablations without constraints and without Lagrangian prior.

KeyCLD learns Lagrangian dynamics from images. (a) An observation of a dynamical system is processed by a keypoint estimator model. (b) The model represents the positions of the keypoints with a set of spatial probability heatmaps. (c) Cartesian coordinates are extracted using spatial softmax and used as state representations to learn Lagrangian dynamics. (d) The information in the keypoint coordinates bottleneck suffices for a learned renderer model to reconstruct the original observation, including background, reflections and shadows. The keypoint estimator model, Lagrangian dynamics models and renderer model are jointly learned unsupervised on sequences of images.

Results

KeyCLD predicts future frames for the pendulum, cartpole and acrobot environments. Every predicted sequence is based on the first three frames of the ground truth sequence (column 1) to estimate the velocities. KeyCLD (column 2) is capable of making accurate long-term predictions, including reflections and shadow. We compare these results with ablated models and related work in literature (other columns). See the paper for more details.

Learning explicit energy models allow simple and robust energy shaping control. The videos below show we can reach a target state by leveraging the learned potential energy models, see the paper for more details.