I am a third-year undergraduate in Physics at Shanghai Jiao Tong University, working on reinforcement learning for reasoning, verifiable AI systems, and multimodal reasoning evaluation.
My long-term goal is to understand how reasoning abilities can emerge, stabilize, and improve under scalable feedback. I approach this through reinforcement learning, verifiable task environments, and hierarchical evaluation of multimodal reasoning.
I am currently a research intern at THU C3I, supervised by Ning Ding. Previously, I worked with Jie Fu at Shanghai AI Lab and with Junchi Yan / Renqiu Xia at Shanghai Jiao Tong University.
Research Agenda
Reinforcement learning for generalizable reasoning. I am interested in when and why RL can improve reasoning beyond memorized trajectories, especially under weak supervision, sparse feedback, and long-horizon exploration. I also care about practical failure modes such as instability, reward hacking, and train-inference mismatch.
Verifiable and self-evolving AI systems. I study how models can interact with formal or programmatic verifiers, generate tasks for themselves, and use curriculum or self-play mechanisms to create scalable supervision. The broader question is how to move from externally curated data toward systems that can propose, solve, and verify increasingly challenging problems.
Hierarchical evaluation of multimodal reasoning. I am interested in decomposing reasoning failures into interpretable stages such as perception, planning, theorem application, and self-reflection. This motivates my work on GeoBench, where geometry problem solving is used as a structured testbed for multimodal reasoning.
Selected Publications and Projects
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
ICLR 2026
We propose a hierarchical benchmark for multimodal geometry reasoning, decomposing model failures into visual perception, goal-oriented planning, rigorous theorem application, and self-reflective backtracking.
Co4ICF: Co-evolving Physics-informed Surrogate and RL-based Pulse Optimizer for Inertial Confinement Fusion
Manuscript
This project explores a closed-loop AI4Physics system that couples a physics-informed surrogate with an RL-based optimizer for pulse design.
News
- Jan 2026: Our paper GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation was accepted to ICLR 2026.
- Oct 2025: I received the Xiaomi Scholarship.
- Sep 2025: I joined THU C3I as a research intern, supervised by Ning Ding.
Research Experience
- Sep 2025 - Present: Research Intern, THU C3I. Supervised by Ning Ding.
- Jul 2025 - Dec 2025: Research Intern, Big AI Dream Lab, Shanghai AI Lab. Supervised by Jie Fu.
- Mar 2025 - May 2025: Research Intern, SAI, Shanghai Jiao Tong University. Supervised by Junchi Yan and Renqiu Xia.
- Aug 2024: Research Assistant, Zhangjiang National Laboratory, working on an AI4Physics project.
Education
- BSc in Physics, Zhiyuan Honor College, Shanghai Jiao Tong University, 2027 expected.
