27753 Learned Operators for Hand GMM


Bidimensional images and videos give a distorted representation of an object, subject to light, pose and occlusions. Therefore, face alignment and pose estimation are essential tools for recognition, animation, tracking, image restoration, and many other applications. To gather 3D information about an object inside of an image, often a 3D-Morphable Model [3] is used.
This model gives a deformable 3D representation of the underlying structure of the target object in the form of a latent vector which encodes features learned thanks to the recent advancements in deep learning on 3D Data.
Graph Morphable Models (GMM) represent the 3D object with a mesh (Fig. 1), an embedding of a graph in the Euclidean space and leverage the geometric information given by the graph to express the desired mesh in terms of local transformations.
Kulon et al. [4] propose to infer the pose and the shape of a picture of a hand by first creating
a latent representation of the GMM through the use of an auto-encoder. Then, a Convolutional Neural Network replaces the graph encoder by taking the image as input. Adding a camera regressor creates a weak projection and
align the mesh with the image Fig. 2. This thesis expands on the work of Kulon et al. by improving their down-sampling method, which self-reportedly originates artefacts and has a significant impact on the
overall performance. Furthermore, this thesis explores recent breakthroughs in Geometric Deep Learning and tests the efficacy of a learnable operator in this setting.

Figure 1: Hand GMM in [4].


Learnable operators in Geometric Deep Learning.
Geometric Deep Learning [1] covers a wide range of techniques and solutions used to process graph data (for GMMs but also collaborative filters or modelling molecules) and often have similar approaches. One approach is to use spectral methods to indirectly exploit the eigenvalues of an operator thus performing local transformations while keeping the desired invariance. This invariance commonly implies the independence to the ordering of the mesh’s vertices, the resistance to different discretization and perturbations and the invariance to transformations such as rigid motions. Graph Convolutional
Networks (GCN) use the Graph Laplacian also to enforce the (graph) translational invariance and are equivalent to CNNs for two-dimensional grids.
At the same time, the Graph Laplacian is an intrinsic operator, so it is only dependent on the underlying Riemannian manifold. The resulting features lose information regarding the actual pose (on a hand, this could be the curvature of a finger). Recently Wang et al. [6] showed how to define a learnable operator whose invariance is itself learned, allowing the model to automatically calibrate the amount of extrinsic and intrinsic information required for the specific task.

Figure 2: Pipeline of the Hand Reconstruction model in [4].


The goal of this Master thesis is to:
1. Understand and critically evaluate the reconstruction algorithm with convolutions [4] on a mesh.
2. Investigate alternative ways to regularise the encoders (variational losses sparsity constraints, . . .).
3. Solve the current problems experienced during the downsampling of the mesh with a novel downsampling technique that does not oversimplify the complex topology of a hand.
4. Replace the current Laplacian with a learned operator. Here you can draw inspiration from [6] and adapt their approach or improve upon it further.
5. Evaluate the method on the Panoptic Hand dataset [5] and conduct a thorough analysis of its performance compared to the reference reconstruction model, with an ablation study upon its improvements. The student will also be encouraged and supported towards adding more theoretical justification of the considered approaches, linking them to Discrete
Differential Geometry [2], and in providing new insights on geometric deep learning for 3D reconstruction.

[1] M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, P. Vandergheynst, “Geometric deep learning: going beyond euclidean data” IEEE SIG PROC MAG 2016.
[2] K. Crane, F. Goes, M. Desbrun, P. Schröder, “Digital Geometry Processing with Discrete Exterior Calculus”. ACM SIGGRAPH 2013.
[3] B. Egger, W. A. P. Smith, A. Tewari, S. Wuhrer, M. Zollhoefer, T. Beeler, F. Bernard, T. Bolkart, A. Kortylewski, S. Romdhani, C. Theobalt, V. Blanz, T. Vetter, “3D Morphable Face Models – Past, Present and Future”. In ACM Transactions on Graphics (TOG).[4] D. Kulon, H. Wang, R. A. Güler, M. Bronstein, S. Zafeiriou “Single Image 3D Hand Reconstruction with Mesh Convolutions”. BMVC 2019.
[5] S. Tomas, J. Hanbyul, S. Yaser, “Hand Keypoint Detection in Single Images using Multiview Bootstrapping”. CVRP 2017
[6] Y. Wang, V. Kim, M. Bronstein, J. Solomon “Learning Geometric Operators on Meshes”. ICLR2019.