GTAvatar broadens applications of monocular Gaussian Splatting head avatars beyond reenactment and relighting, enabling interactive editing through textures. From a single video, we reconstruct a state-of-the-art 3D avatar, integrating material textures while preserving the training efficiency, rendering speed and visual fidelity of Gaussian Splatting.
We leverage the explicit surface representation of 2D Gaussians to map the ray-splat intersection to UV space:
This mapping is computed efficiently as detailed in our paper. From this UV coordinate, we retrieve texel values for albedo, roughness, specular reflectance and normal via bilinear texture filtering. These values replace the Spherical Harmonics used in standard Gaussian Splatting. Finally, the final image is rendered using the Cook-Torrance BRDF with the rasterized G-buffers. The Gaussian attributes and texture maps are jointly optimized end-to-end during training, with photometric, geometric and regularization losses.
Each Gaussian is embedded onto a triangle of the FLAME 3D Morphable Model with a fixed triangle ID, and parameterized by barycentric coordinates within the triangle, an offset along the normal, a relative rotation, two scales and an opacity. Given pose and expression parameters, Gaussian attributes are updated as follows:
During initialization, Gaussians are created at the center of each triangle. Typical densification and pruning from 2DGS are then used during optimization to refine the distribution of Gaussians over the surface.
The semantic texture space enables intuitive editing of the avatars' appearance with widely available tools. This section illustrates applications of our method through simple modifications.
Here, the albedo texture is replaced or edited manually in image editing software. The modified texture integrates with the original reconstruction seemlessly, preserving high-frequency details. Note that those edits are fully compatible with animation and relighting.
In the following examples, we partially or completely replace the material maps, resulting in substantial changes to the appearance. As shown, normal mapping can be used to add intricate details to the geometry. Our pipeline is compatible with off-the-shelf material definitions.
The following avatars are rendered with head poses and expressions tracked from video frames that were not seen during training (self-reenactment).
In addition to new poses and expressions, we modify the illumination of the scene using environment maps. Our approach achieves results that are comparable to state-of-the-art relighting methods.
For more examples along with comparisons, quantitative results and ablation studies, please refer to the paper.
If you find this work useful for your research, please cite:
(coming soon)
@article{baert2026gta,
author = {},
title = {},
journal = {},
number = {},
volume = {},
month = {},
year = {},
url = {}
}