Towards Better Multimodal Reasoning: From Staged Reinforcement Learning to Thinking with Images

发布时间：2026-04-07浏览次数：17文章来源：华东师范大学信息与电子工程学院（集成电路科学与工程学院）

微信图片_20260407143940_99_241.jpg

报告人：成宇教授

时间：2026年4月10日 9：30-10：30

地点：闵行校区信息楼133

报告人简介：

成宇，香港中文大学计算机系教授，昆仑万维集团&天工AI首席科学家。他同时也是上海人工智能实验室和上海创智兼职教授/导师。从 2018年到2023年，担任微软雷德蒙德研究院的首席研究员。研究涵盖深度学习，特别关注模型压缩和效率、深度生成模型和语言/多模态大模型等。从2021年开始，带领团队和OpenAI团队紧密合作，对GPT系列模型进行了效率、鲁棒性和扩展性优化，推动相关服务和应用的产品化，包括以GPT-4作为主要模型的New Bing、由GPT-3.5提供后台服务的Github Copilot以及由DALL-E-2提供支持的Image Creator。从2023年到2025年，主导或者参与了Minimax的 abab6.5/7, M1/Hailuo Video以及Skywork R1V2/V3, Matrix-Game, Super Agent等产品和模型。

报告内容介绍：

Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes an effective reasoning pipeline or a meaningful chain of thought. In this talk, we will introduce how to build effective VL reasoning models, from staged reinforcement learning, visually perceptive policy optimization, and interleaved chain-of-thought method.