题目：Vision with A Billion Eyes （亿眼睽睽）
主讲人：Jiebo Luo教授（IEEE Fellow），美国罗切斯特大学
A recent trend in computer vision is driven by images and video generated by heterogeneous and multi-perspective visual sensing networks. We present a few examples of research along this line. First, we will present an interesting framework for event recognition. With GPS information, we obtain satellite images corresponding to picture locations and investigate their novel use to recognize the picture-taking environment. We then combine this inference with classical vision-based event detection methods and demonstrate the synergistic fusion of the two approaches. Second, to determine the viewing direction for geotagged photos, we utilize both Google StreetView and Google Earth satellite images. Third, we explore using phone-captured images for localization as it contains more context information than the embedded sensory GPS coordinates. We then build applications to enable people to enjoy ubiquitous location-based services (LBS) using their phones. Fourth, we leverage crowd-sourced photos to remove unwanted bystanders from tourist photos taken at popular attractions and measure air pollution in major cities in China. Furthermore, given a new source of visual data from public webcams deployed in urban environments, we will present some ongoing work on crowd analytics using such data.
Professor Jiebo Luo joined the University of Rochester (UR) in 2011 after a prolific career of over fifteen years at Kodak Research Laboratories. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010, IEEE CVPR 2012, and IEEE ICIP 2017. He has served as the Editor-in-Chief of the Journal of Multimedia, and on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), IEEE Transactions on Multimedia (TMM), IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Pattern Recognition, Machine Vision and Applications, and Journal of Electronic Imaging. He is a Fellow of the SPIE, IEEE, and IAPR. He is a Data Science Distinguished Researcher with the CoE Goergen Institute for Data Science (IDS) at UR.
题目：Video Captioning and Commenting: Bridging Video and
Language with Deep Learning
主讲人：Tao Mei 研究员（Lead Researcher），Microsoft Research Asia
The recent advances in deep learning have boosted the research on video analysis. For example, convolutional neural networks have demonstrated the superiority on modeling high-level visual concepts, while recurrent neural networks have been proven to be good at modeling mid-level temporal dynamics in the video data. We present a few recent advances for understanding video content using deep learning techniques. Specifically, this talk will focus on translating video to sentence with joint embedding and translation, which achieves the best to-date performance in this nascent vision task. We will also talk about future directions for video captioning, as well as a new “Video to Language” Grand Challenge with dataset organized by Microsoft Research Asia.
Dr. Tao Mei is a Lead Researcher with Microsoft Research Asia. His current research interests include multimedia information retrieval and computer vision. He has authored or co-authored over 150 papers in journals and conferences and holds 15 U.S. granted patents. Tao has shipped a dozen video technologies to Microsoft products, with millions of daily users. He was the recipient of several paper awards from prestigious multimedia journals and conferences, including the IEEE Communications Society MMTC Best Journal Paper Award in 2015, IEEE Trans. on Circuits and Systems for Video Technology Best Paper Award in 2014, the IEEE Trans on Multimedia Prize Paper Award in 2013, and the Best Paper Awards at ACM Multimedia in 2009 and 2007, and so on. He is an Associate Editor of IEEE Trans. on Multimedia, ACM Trans. on Multimedia Computing, Communications, and Applications, Machine Vision and Applications, and Multimedia Systems. He is the General Co-chair and Program Co-chair of several multimedia conferences. He received the B.E. and Ph.D. degrees from University of Science and Technology of China, Hefei, China, in 2001 and 2006, respectively. He is also an adjunct professor (PhD supervisor) of USTC (中国科大) and SYSU (中山大学).