I am Haoyuan Wang , a researcher at Tencent Hunyuan 3D team. Before that, I received my Ph.D. degree from Computer Science Department, City University of Hong Kong, supervised by Prof. Rynson Lau, and bachelor degree from School of Computer Science and Technology, Huazhong University of Science and Technology in 2021.
My current research interest is world model. My long-term goal is to contribute to building fully immersive virtual worlds.
* means equivalent contribution, and † means corresponding author.
We present WorldCompass, a novel RL post-training framework for long-horizon interactive world models, with clip-level rollout, complementary reward functions, and efficient RL algorithm to improve interaction accuracy and visual fidelity.
HY-World 1.5 enables real-time, interactive 3D world modeling through WorldPlay, a streaming video diffusion model that maintains long-term geometric consistency. Key contributions include Dual Action Representation, Reconstituted Context Memory, and an RL post-training framework to achieve 24 FPS streaming video across diverse scenes.

MoCA introduces a mixture-of-components attention mechanism for scalable compositional 3D generation, enabling efficient generation of complex 3D scenes with multiple objects.


We propose SeHDR, a novel HDR 3D Gaussian Splatting approach that learns HDR scene representation from single-exposure multi-view LDR images via estimating Bracketed 3D Gaussians and merging them through Differentiable Neural Exposure Fusion.

StyleSculptor is an zero-shot style-guided image-to-3D generation model for various style controls, including geometry-only, texture-only, and texture-geometry dual guided generation.

An open-source 3D scene generation model capable of roaming and simulation, which supports both text-to-3D and image-to-3D world generation, allowing 360° immersive roaming, the export of 3D mesh scene assets, and feature both interactivity and suitability for simulation.
We enhance video-based surface normal estimation with temporal coherence via Semantic Feature Regularization and a two-stage latent/pixel space training protocol.
We propose a novel G-buffer estimation model for high-quality material-aware 3D reconstruction from just a single image.

We propose a novel 5D Neural Plenoptic Function (NeP), building on NeRFs and ray tracing for glossy object inverse rendering, including both geometry and material reconstruction.
We propose an unsupervised method to decompose NeRF and enhance it to address the problem of reconstructing high-quality NeRF given low-quality low-light images with heavy noise.


We enhance the photos with both over and under exposed regions by a light-weight multi-scale local color prior guided CNN, trained on our proposed dataset.