Guangkai Xu 徐光锴

Ph.D. Candidate

State Key Lab of CAD & CG, Zhejiang University

Email: guangkai.xu@gmail.com
            guangkai.xu@zju.edu.cn
Google Scholar: Google Scholar Link
Github: https://github.com/guangkaixu
Wechat: Grank_Xu (Discussions and collaborations are welcome!)

Guangkai Xu

Biography

I'm currently a third-year Ph.D. student of the College of Computer Science and Technology at Zhejiang University, advised by Prof. Chunhua Shen and Hao Chen. Before that, I received my M.S. degree from the Department of Automation, University of Science and Technology of China (USTC) in 2023, where I was a member of USTC-BIVLab, advised by Prof. Feng Zhao, and my B.E. degree from the University of Electronic Science and Technology of China (UESTC) in 2020.

My research interests include Embodied AI with VLMs, MLLMs, and Visual Perception.

Awards

News

Publications

* indicates equal contribution (co-first authors). † indicates corresponding author.

♠ (Co-) First author Papers

Demo teaser for Unlocking the Power of Critical Factors for 3D Visual Geometry Estimation CVPR 2026
Unlocking the Power of Critical Factors for 3D Visual Geometry Estimation
Guangkai Xu*, Hua Geng*, Huanyi Zheng, Songyi Yin, Yanlong Sun, Hao Chen, Chunhua Shen
CVPR, 2026
We identify the critical factors behind 3D visual geometry estimation and show how explicitly modeling them leads to stronger and more reliable geometry predictions.
Demo teaser for What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? ICLR 2025
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
Guangkai Xu, Yongtao Ge, Mingyu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, Chunhua Shen
ICLR, 2025
[PDF] [Code]
Instead of running many diffusion steps at test time, the method fine-tunes a deterministic one-step model for dense prediction, making diffusion-based perception simpler and more efficient.
Teaser for FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models ICCV 2023
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models
Guangkai Xu*, Wei Yin*, Hao Chen, Chunhua Shen, Kai Cheng, Feng Zhao
ICCV, 2023
[PDF] [Code] [Homepage]
FrozenRecon keeps a pre-trained depth model fixed and only optimizes a small set of geometric correction parameters, offering a practical way to build coherent 3D scenes from ordinary video.
Demo teaser for DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-based Dense Incident Map Generation AAAI 2025
DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-based Dense Incident Map Generation
Xiankang He*, Guangkai Xu*, Bo Zhang, Hao Chen, Ying Cui, Dongyan Guo
AAAI, 2025 (Oral)
[PDF] [Code]
DiffCalib predicts a dense incident map and depth from one RGB image, then recovers camera intrinsics with RANSAC, making single-image calibration more reliable in everyday scenes.
Teaser for Towards Domain-Agnostic Depth Completion MIR 2024
Towards Domain-Agnostic Depth Completion
Guangkai Xu*, Wei Yin*, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Jia-Wang Bian
Machine Intelligence Research (MIR), 2024
[PDF]
A single model uses image-based depth estimates to fill missing regions in sparse and noisy depth maps, improving robustness when depth comes from varied sensors and real-world conditions.
Teaser for Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth arXiv 2022
Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth
Guangkai Xu*, Wei Yin*, Hao Chen*, Chunhua Shen, Kai Cheng, Feng Wu, Feng Zhao
arXiv, 2022
[PDF]
Local scale-and-shift recovery from sparse anchor points turns monocular video depth into consistent metric geometry for accurate dense 3D scene reconstruction.

♠ Co-author Papers

Teaser for Generative video matting SIGGRAPH 2025
Generative video matting
Yongtao Ge, Kangyang Xie, Guangkai Xu, Li Ke, Mingyu Liu, Longtao Huang, Hui Xue, Hao Chen, Chunhua Shen
SIGGRAPH, 2025
[PDF]
A pre-trained video diffusion model is repurposed for matting so the system can preserve fine structures like hair while keeping predictions coherent over time.
Teaser for POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction ICCV 2025
POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction
Songyan Zhang*, Yongtao Ge*, Jinyuan Tian*, Guangkai Xu, Hao Chen, Chen Lv, Chunhua Shen
ICCV, 2025
[PDF] [Code]
POMATO learns 3D geometry and cross-view matching together, so it can reconstruct dynamic scenes while also estimating motion masks and tracking points over time.
Teaser for Unleashing the potential of the diffusion model in few-shot semantic segmentation NeurIPS 2024
Unleashing the potential of the diffusion model in few-shot semantic segmentation
Muzhi Zhu*, Yang Liu*, Zekai Luo*, Chenchen Jing, Hao Chen, Guangkai Xu, Xinlong Wang, Chunhua Shen
NeurIPS, 2024
[PDF] [Code]
DiffewS adapts a latent diffusion model to few-shot segmentation by fusing support and query features and predicting masks directly, making diffusion priors useful for segmenting new categories from limited examples.
Teaser for Improving neural indoor surface reconstruction with mask-guided adaptive consistency constraints ICRA 2024
Improving neural indoor surface reconstruction with mask-guided adaptive consistency constraints
Xinyi Yu, Liqin Lu, Jintao Rong, Guangkai Xu, Linlin Ou
ICRA, 2024
[PDF]
Rather than treating all rays equally, the approach uses adaptive masks and virtual viewpoints to enforce cross-view consistency, which improves self-supervised indoor reconstruction from posed images.
Teaser for The second monocular depth estimation challenge CVPR Workshop 2023
The second monocular depth estimation challenge
CVPR Workshop, 2023
[PDF] [Homepage]
We achieved 1st place in the CVPR 2023 Monocular Depth Estimation Challenge.
Teaser for Geobench: Benchmarking and analyzing monocular geometry estimation models arXiv 2024
Geobench: Benchmarking and analyzing monocular geometry estimation models
Yongtao Ge, Guangkai Xu, Zhiyue Zhao, Libo Sun, Zheng Huang, Yanlong Sun, Hao Chen, Chunhua Shen
arXiv, 2024
[PDF] [Code]
A unified benchmark and codebase compares monocular depth and normal estimators under matched settings, making it easier to separate gains from model design, training data, and evaluation choices.
Teaser for Exploiting correspondences with all-pairs correlations for multi-view depth estimation arXiv 2022
Exploiting correspondences with all-pairs correlations for multi-view depth estimation
Kai Cheng, Hao Chen, Wei Yin, Guangkai Xu, Xuejin Chen
arXiv, 2022
[PDF]
All-pairs correlations and iterative refinement let the system recover fine structure and sharper boundaries in multi-view depth maps without relying on predefined cost-volume depth bins.

Academic Service