Journals Books 2687-5527
Latest Issue Archive Future Issues About Us
Conference Proceedings

SETSCI - Volume 1 (2017)
ISMSIT2017 - International Symposium on Multidisciplinary Studies and Innovative Technologies, Tokat, Turkey, Dec 02, 2017

Structured Deep Learning Supported with Point Cloud for 3D Human Pose Estimation
Erdal Özbay1, Ahmet Çınar2, Zafer Güler3*
1Fırat University, Elazığ, Turkey
2Fırat University, Elazığ, Turkey
3Fırat University, Elazığ, Turkey
* Corresponding author:
Published Date: 2017-12-08   |   Page (s): 304-309   |    554     7

ABSTRACT In this paper, a structural-output is obtained to estimate 3D human pose using 3D human point cloud and monocular images. The Neural Network takes a human image and 3D pose as inputs and gives outputs a score value. Conditional Random Field (CRF) approach is using to semantically classify human limbs in its point cloud for 3D human pose production. The voxel cloud connectivity segmentation (VCCS) is used as the segmentation method that voxelisation of the 3D point cloud. The network structure consists of a convolutional neural network for image feature extraction and pose into a joint embedding. The score function is calculation from the dot-product between the images and pose embeddings which is high when the image-pose pair matches and low otherwise. Image-pose embedding and score function are jointly trained using the max-margin cost function. Finally we present visualizations of the image-position placement field, showing that the network has learned a high level embedding of body orientation and pose configuration.  
KEYWORDS Pose estimation, Point Cloud, Deep learning, Structured learning, 3D
REFERENCES [1] S. Li, and A. B. Chan, 3d human pose estimation from monocular images with deep convolutional neural network. In Asian Conference on Computer Vision, 2014, (pp. 332-347).

[2] A. Toshev, and C. Szegedy, Deeppose: Human pose estimation viadeep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, (pp. 1653-1660).

[3] S. Li, Z. Q. Liu, and A. B. Chan, Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, (pp. 482-489).

[4] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in neural information processing systems, 2014, (pp. 1799-1807).

[5] A. Jain, J. Tompson, M. Andriluka, G. W. Taylor, and C. Bregler, Learning human pose estimation features with convolutional networks. 2013, arXiv preprint arXiv:1312.7302.

[6] P. F. Felzenszwalb, and D. P. Huttenlocher, Pictorial structures for object recognition. International journal of computer vision, 2005, 61(1), 55-79.

[7] D. Koller, and N. Friedman, Probabilistic graphical models: Principles and techniques. 2009, Cambridge: MIT Press.

[8] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, Support vector machine learning for interdependent and structured output spaces. 2004, In ICML

[9] J. A. Rodríguez, and F. Perronnin, Label embedding for text recognition. 2013, In BMVC

[10] C. Ionescu, L. Bo, and C. Sminchisescu, Structural SVM for visual localization and continuous state estimation. In ICCV 2009, (pp. 1157– 1164)

[11] B. Sapp, and B. Taskar, Modec: Multimodal decomposablemodels for human pose estimation. In Proceedings of the IEEE conference on CVPR, 2013.

[12] Y. Yang, and D. Ramanan, Articulated pose estimation with flexible mixtures-of-parts. In CVPR, 2011, (pp. 1385 – 1392).

[13] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE TPAMI, 2014, 36(7), 1325–1339.

[14] Y. Bengio, G. Mesnil, Y. Dauphin, and S. Rifai, Better mixing via deep representations. In ICML, 2013, (pp. 552–560).

[15] S. E. Nasab, S. Kasaei, E. Sanaei, A. Ossia, and M. Mobini, Multiview 3D reconstruction and human point cloud classification. 22nd Iranian Conference on Electrical Engineering (ICEE), 2014.

[16] “dijkstra algorithm.” [Online]. Available:’s_algorithm.

[17] “boost c++ library.” [Online]. Available:

[18] J. Nation, CRF Based Point Cloud Segmentation. 2011.

[19] P. Krähenbühl and V. Koltun, Efficient inference in fully connected crfs with gaussian edge potentials, arXiv preprint arXiv:1210.5644, no. 4, pp. 1–4, 2012.

[20] A. Adams, J. Baek, and M. A. Davis, Fast High‐Dimensional Filtering Using the Permutohedral Lattice. In Computer Graphics Forum 2010, (Vol. 29, No. 2, pp. 753-762).

[21] V. Nair, and G. E. Hinton, Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning 2010, (ICML-10) (pp. 807-814).

[22] Y. Sun, X. Wang, and X. Tang, Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, (pp. 1891-1898).

[23] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 2005, 6, 1453–1484.

[24] S. Li, W. Zhang, and A. B. Chan, Maximum-margin structured learning with deep networks for 3d human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, 2015, (pp. 2848-2856). ISO 690

SET Technology - Turkey

eISSN  : 2687-5527    

E-mail :
+90 533 2245325

Tokat Technology Development Zone Gaziosmanpaşa University Taşlıçiftlik Campus, 60240 TOKAT-TURKEY
©2018 SET Technology