Real-time Pedestrian Attribute Detection from Surveillance Cameras
Betül Ay1*, Galip Aydin2
1Fırat University , Elazığ, Turkey
2Fırat University , Elazığ, Turkey
* Corresponding author: betulay@firat.edu.tr
Presented at the International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA2019), Ürgüp, Turkey, Jul 05, 2019
SETSCI Conference Proceedings, 2019, 8, Page (s): 7-10 , https://doi.org/10.36287/setsci.4.5.002
Published Date: 12 October 2019 | 2040 32
Abstract
In recent years, computer vision has taken great strides in understanding and recognition the visual scenes together with deep learning technologies. Object recognition is one of the key areas of the computer vision applications. It is mainly concerned with recognition and localization of specific objects in an image. There are open-source and pre-trained models for the detection of general objects (such as cars, persons, cats, dogs). However, it is necessary to develop problem-based algorithms and training with deep neural networks by creating annotated training data for the recognition of special objects (such as headscarf). Furthermore, objects come in different shapes, sizes, angles, colors and additional noise from changes in the realworld environment, perspective, lighting and shadows. Taking into account these problems and needs in the real world, this paper focuses on the development of problem-based deep neural networks algorithms and the creation of labeled and reliable training datasets for the objects to be recognized. The contribution of the paper is to make use of transfer learning with the optimized R-FCN and Faster R-CNN pre-trained models in order to recognize pedestrian attributes including hat, headscarf, eyeglasses, bag objects and gender from security cameras. The proposed detection model has been trained on large-scale labeled dataset using TensorFlow open source platform. The performance of the neural network model has been evaluated using Average Precision (AP) values for each class and over 75% Mean Average Precision (mAP) for all classes is achieved.
Keywords - Object detection, deep learning, pedestrian attribute detection
References
[1] Talo, Muhammed, et al. "Bigailab-4race-50K: Race Classification with a New Benchmark Dataset." 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). IEEE, 2018.
[2] S. M. Li, Dangwei, et al. "A richly annotated dataset for pedestrian attribute recognition." arXiv preprint arXiv:1603.07054 (2016).
[3] Tian, Yonglong, et al. "Pedestrian detection aided by deep learning semantic tasks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[4] Zhu, Jianqing, et al. "Pedestrian attribute classification in surveillance: Database and evaluation." Proceedings of the IEEE international conference on computer vision workshops. 2013.
[5] Li, Dangwei, Xiaotang Chen, and Kaiqi Huang. "Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios." 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). IEEE, 2015.
[6] Deng, Yubin, et al. "Pedestrian attribute recognition at far distance." Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014.
[7] Gupta, Agrim and Jayanth S. Ramesh. “Pedestrian Attribute Detection using CNN.” (2016).
[8] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
[9] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
[10] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
[11] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[12] Dai, Jifeng, et al. "R-fcn: Object detection via region-based fully convolutional networks." Advances in neural information processing systems. 2016.
[13] Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
[14] Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." international Conference on computer vision & Pattern Recognition (CVPR'05). Vol. 1. IEEE Computer Society, 2005.
[15] Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
[16] Kuznetsova, Alina, et al. "The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale." arXiv preprint arXiv:1811.00982 (2018).
[17] Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." European conference on computer vision. Springer, Berlin, Heidelberg, 2006.
[18] Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories." 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Vol. 2. IEEE, 2006.
[19] Tuzel, Oncel, Fatih Porikli, and Peter Meer. "Region covariance: A fast descriptor for detection and classification." European conference on computer vision. Springer, Berlin, Heidelberg, 2006.
[20] Everingham, Mark, et al. "The PASCAL visual object classes challenge 2007 (VOC2007) results." (2007).
[21] Lampert, Christoph H., Matthew B. Blaschko, and Thomas Hofmann. "Beyond sliding windows: Object localization by efficient subwindow search." 2008 IEEE conference on computer vision and pattern recognition. IEEE, 2008.
[22] Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." IEEE transactions on pattern analysis and machine intelligence 32.9 (2009): 1627-1645.
[23] Wang, Xiaoyu, Tony X. Han, and Shuicheng Yan. "An HOG-LBP human detector with partial occlusion handling." 2009 IEEE 12th international conference on computer vision. IEEE, 2009.
[24] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
[25] Csurka, Gabriela, and Florent Perronnin. "Fisher vectors: Beyond bagof-visual-words image representations." International Conference on Computer Vision, Imaging and Computer Graphics. Springer, Berlin, Heidelberg, 2010.
[26] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[27] Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).
[28] Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." European conference on computer vision. Springer, Cham, 2014.
[29] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
[30] Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[31] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
[32] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
[33] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[34] Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.