Publications

2025

  1. Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control
    Amin Fadaeinejad, Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amaury Depierre, Nikolaus F. Troje, Marcus A Brubaker, Marc-André Carbonneau

    2025.

    Creating realistic 3D head assets for virtual characters that match a precise artistic vision remains labor-intensive. We present a novel framework that streamlines this process by providing artists with intuitive control over generated 3D heads. Our approach uses a geometry-aware texture synthesis pipeline that learns correlations between head geometry and skin texture maps across different demographics. The framework offers three levels of artistic control: manipulation of overall head geometry, adjustment of skin tone while preserving facial characteristics, and fine-grained editing of details such as wrinkles or facial hair. Our pipeline allows artists to make edits to a single texture map using familiar tools, with our system automatically propagating these changes coherently across the remaining texture maps needed for realistic rendering. Experiments demonstrate that our method produces diverse results with clean geometries. We showcase practical applications focusing on intuitive control for artists, including skin tone adjustments and simplified editing workflows for adding age-related details or removing unwanted features from scanned models. This integrated approach aims to streamline the artistic workflow in virtual character creation.

2024

  1. MosAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
    Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-Andre Carbonneau

    2024.

    Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrinsic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects.

Blog Posts

  1. Can a Language Model Hear? A Technical Tour of Audio & Music Understanding LLMs
  2. Learning to See Without Labels: A Technical Tour of Meta's DINO, DINOv2, and DINOv3
  3. Real-Time Video Generation: How Distillation and Rolling Forcing Get Us to Interactive Frame Rates
  4. The Self-Driving Car Stack: A Mental Model for Engineers New to Autonomy
  5. Fine-Tuning an LLM with LoRA and QLoRA: A Hands-On Guide
  6. Fine-Tuning a Vision-Language Model with LoRA and QLoRA: A Hands-On Guide
  7. From Teacher Forcing to Self-Forcing: A Tutorial on Autoregressive Video Generation
  8. Distribution Matching Distillation: How a 50-Step Diffusion Model Becomes a 1-Step Generator
  9. Understanding JEPA: From Latent Prediction to Video World Models
  10. Fine-Tuning Qwen3-VL: A Practical Guide for Vision-Language Model Adaptation
  11. Post-Training Memory Reduction Techniques for Model Inference
  12. World Models in Deep Learning: A Technical Deep Dive
  13. VBench: Why We Needed to Rethink How We Evaluate Video Generation
  14. Vision Transformers vs CNNs: A Complete Technical Comparison