deepid-net: deformable deep convolutional neural networks for object detection
abstract:in this talk, i will introduce the deep learning based framework for general object detection on imagenet. it significantly outperforms well-known object detection works such as googlenet, vgg and rcnn with large margins on the ilsvrc2014 detection test set. the proposed pipeline integrates region proposal, bounding box rejection, a new pre-training strategy based on object-level annotations, feature learning, part-deformation learning, contextual modeling, bounding box regression, and model averaging. detailed component-wise analysis will be provided through extensive experimental evaluation,which provides a global view for people to understand the deep learning object detection pipeline. in the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.
through the application of object detection, i would also like to highlight two key points on deep learning. (1) in order to learn feature representation with high discriminative power and good generalization capability, it is better to use challenging supervision tasks with high dimensional prediction to train deep models. once these features are learned with challenging tasks, they can be well applied to easier tasks. (2) instead of treating deep learning as black box, one could build the connection between the layers of deep models and the key components of existing vision systems. the research experience from existing vision systems can help us proposed new layers and new training strategies.
xiaogang wang received his bachelor degree in electrical engineering and information science from the special class of gifted young at the university of science and technology of china in 2001, m. phil. degree in information engineering from the chinese university of hong kong in 2004, and phd degree in computer science from massachusetts institute of technology in 2009. he is an assistant professor in the department of electronic engineering at the chinese university of hong kong since august 2009. he received the outstanding young researcher in automatic human behaviour analysis award in 2011, hong kong rgc early career award in 2012, and young researcher award of the chinese university of hong kong. he is the associate editor of the image and visual computing journal. he was the area chair of iccv 2011, eccv 2014, accv 2014, and iccv 2015. his research interests include computer vision, deep learning, crowd video surveillance, object detection, and face recognition.