Publications

2025

CVPRW

Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control

Amin Fadaeinejad, Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amaury Depierre, Nikolaus F. Troje, Marcus A Brubaker, Marc-André Carbonneau

2025.

Abstract Project Page arXiv

Creating realistic 3D head assets for virtual characters that match a precise artistic vision remains labor-intensive. We present a novel framework that streamlines this process by providing artists with intuitive control over generated 3D heads. Our approach uses a geometry-aware texture synthesis pipeline that learns correlations between head geometry and skin texture maps across different demographics. The framework offers three levels of artistic control: manipulation of overall head geometry, adjustment of skin tone while preserving facial characteristics, and fine-grained editing of details such as wrinkles or facial hair. Our pipeline allows artists to make edits to a single texture map using familiar tools, with our system automatically propagating these changes coherently across the remaining texture maps needed for realistic rendering. Experiments demonstrate that our method produces diverse results with clean geometries. We showcase practical applications focusing on intuitive control for artists, including skin tone adjustments and simplified editing workflows for adding age-related details or removing unwanted features from scanned models. This integrated approach aims to streamline the artistic workflow in virtual character creation.

2024

CVPR

MosAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading

Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-Andre Carbonneau

2024.

Abstract Project Page arXiv

Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrinsic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects.

Blog Posts

Medium

Can a Language Model Hear? A Technical Tour of Audio & Music Understanding LLMs

Read on Medium
Medium

Learning to See Without Labels: A Technical Tour of Meta's DINO, DINOv2, and DINOv3

Read on Medium
Medium

Real-Time Video Generation: How Distillation and Rolling Forcing Get Us to Interactive Frame Rates

Read on Medium
Medium

The Self-Driving Car Stack: A Mental Model for Engineers New to Autonomy

Read on Medium
Medium

Fine-Tuning an LLM with LoRA and QLoRA: A Hands-On Guide

Read on Medium
Medium

Fine-Tuning a Vision-Language Model with LoRA and QLoRA: A Hands-On Guide

Read on Medium
Medium

From Teacher Forcing to Self-Forcing: A Tutorial on Autoregressive Video Generation

Read on Medium
Medium

Distribution Matching Distillation: How a 50-Step Diffusion Model Becomes a 1-Step Generator

Read on Medium
Medium

Understanding JEPA: From Latent Prediction to Video World Models

Read on Medium
Medium

Fine-Tuning Qwen3-VL: A Practical Guide for Vision-Language Model Adaptation

Read on Medium
Medium

Post-Training Memory Reduction Techniques for Model Inference

Read on Medium
Medium

World Models in Deep Learning: A Technical Deep Dive

Read on Medium
Medium

VBench: Why We Needed to Rethink How We Evaluate Video Generation

Read on Medium
Medium

Vision Transformers vs CNNs: A Complete Technical Comparison

Read on Medium

Publications

2025

Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control

2024

MosAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading

Blog Posts

Can a Language Model Hear? A Technical Tour of Audio & Music Understanding LLMs

Learning to See Without Labels: A Technical Tour of Meta's DINO, DINOv2, and DINOv3

Real-Time Video Generation: How Distillation and Rolling Forcing Get Us to Interactive Frame Rates

The Self-Driving Car Stack: A Mental Model for Engineers New to Autonomy

Fine-Tuning an LLM with LoRA and QLoRA: A Hands-On Guide

Fine-Tuning a Vision-Language Model with LoRA and QLoRA: A Hands-On Guide

From Teacher Forcing to Self-Forcing: A Tutorial on Autoregressive Video Generation

Distribution Matching Distillation: How a 50-Step Diffusion Model Becomes a 1-Step Generator

Understanding JEPA: From Latent Prediction to Video World Models

Fine-Tuning Qwen3-VL: A Practical Guide for Vision-Language Model Adaptation

Post-Training Memory Reduction Techniques for Model Inference

World Models in Deep Learning: A Technical Deep Dive

VBench: Why We Needed to Rethink How We Evaluate Video Generation

Vision Transformers vs CNNs: A Complete Technical Comparison