報告題目: Towards Interactive Multi-Modal Visual Understanding
報告時間:2024年6月4日16:00-17:00
報告地點:437bwin必贏國際官網B404會議室
報告人:馮春梅
報告人國籍:中國
報告人單位:新加坡科技研究局

報告人簡介:Chun-Mei Feng is currently a research scientist at A*STAR, Singapore. Before this, she obtained her Ph.D. from Harbin Institute of Technology, Shenzhen, in 2022. During her Ph.D. period, she interned at the Inception Institute of Artificial Intelligence (IIAI) in 2020 and then visited ETH Zurich in 2021. Her research interests lie in multi-modal visual understanding, medical imaging, and decentralized AI in the era of large pretrained models. She has numerous peer-reviewed publications, most in flagship conferences/journals proceedings including CVPR, ICCV, ICLR (Spotlight), MICCAI (Early Accept), AAAI, and some journals, e.g., TIP, TNNLS, and TMI.
報告摘要:Language and visual interactions play a crucial role in our comprehension of the real world, highlighting the significance of Interactive Multi-Modal Visual Understanding as a promising field. This presentation will focus on two steps: interactive in multimodal and interaction with clients. For step one, including Composed Image Retrieval (CIR) and Referring Image Segmentation (RIS). CIR utilizes relevant captions to refine image retrieval results, while RIS employs language descriptions to precisely identify the segmentation targets. For step two, i.e., interaction with clients, it provides privacy protection manner for both CIR and Medical RIS because these will often involve different platforms such as Alibaba, Amazon, and eBay, as well as various hospitals. This talk is dedicated to enhancing multi-modal collaboration and reasoning capabilities. Additionally, it will include a discussion designed to promote the integration of additional modalities and interaction techniques, e.g., the development of practical systems for real-world e-commerce applications, transform understanding abilities into execution, and empower AI agents to achieve human-object interaction.
邀請人:董性平
