Data Sets and Software Libraries Used for Deep Learning
Özkan İnik1*, Erkan Ülker2
1Gaziosmanpaşa University, Tokat, Turkey
2Selçuk University, Konya, Turkey
* Corresponding author: ozkan.inik@gop.edu.tr
Presented at the International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT2017), Tokat, Turkey, Dec 02, 2017
SETSCI Conference Proceedings, 2017, 1, Page (s): 72-77 , https://doi.org/
Published Date: 08 December 2017 | 1284 11
Abstract
The purpose of this study is to examine the software libraries and data sets used for Deep Learning architectures. Deep Learning brings a different perspective to the field of artificial intelligence. It has begun to be used an incredibly wide field in recent years. Deep Learning models process high resolution photos in computer vision. Unlike traditional machine learning method, there is no pre-processing phase, such as cropping or extracting features for identifying objects on a photo. Similarly, while old networks can only define two types of objects (or, in some cases, the absence and presence of a single object), these modern networks can describe many different categories of objects. There are two main reasons why Deep Learning has emerged, especially in recent years. The first of these is the training data as much as today. Secondly there is hardware to process this data. In this context according to its purpose many software libraries have been developed and data sets have been created. A total of 10 different data sets and 6 different software libraries were examined in this study. The data sets are the MNIST data Set, CIFAR10 data set, CIFAR 100 data set, STL-10 data set, Street View House Numbers (SVHN) data set, Large Scale Visual Recognition Challenge (LSVRC) data set, Caltech 101 data set, Caltech 256, Labeled Faces in the Wild data set and Pascal VOC data set respectively. The number of images in each data set, the number of classes, etc. is explained in detail. The software libraries are Theano, Caffe, Torch, TensorFlow, Keras and MatConvNet respectively. The advantages and disadvantages of these software libraries are explained. In addition, platform and performance values are given in detail. Necessary information (Deep Learning Libraries and data sets) are presented especially for researchers who want to work in the field of Deep Learning.
Keywords - Caffe, Classification, CIFAR10, CNN, Deep Learning, Keras, MNIST, Pascal VOC, TensorFlow, Theano, Torch
References
[1] S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, pp. 221-231, 2013.
[2] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, 2014, pp. 568-576.
[3] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014, pp. 1725-1732.
[4] Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1701-1708.
[5] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, et al., "Vqa: Visual question answering," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2425-2433.
[6] A. Karpathy and L. Fei-Fei, "Deep visual-semantic alignments for generating image descriptions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3128-3137.
[7] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.
[8] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[9] Y. Sun, X. Wang, and X. Tang, "Deep learning face representation from predicting 10,000 classes," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1891-1898.
[10] Y. LeCun, C. Cortes, and C. J. Burges, "Mnist handwritten digit database. AT&T Labs," ed, 2010.
[11] A. Krizhevsky, V. Nair, and G. Hinton, "The CIFAR-10 dataset," online: http://www. cs. toronto. edu/kriz/cifar. html, 2014.
[12] A. Krizhevsky, V. Nair, and G. Hinton, "The CIFAR-100 dataset," online: http://www. cs. toronto. edu/kriz/cifar. html, 2014.
[13] A. Coates, A. Ng, and H. Lee, "An analysis of single-layer networks in unsupervised feature learning," in Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 215-223 URL:http://cs.stanford.edu/~acoates/stl10
[14] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading digits in natural images with unsupervised feature learning," in NIPS workshop on deep learning and unsupervised feature learning, 2011, p. 5 URL:http://ufldl.stanford.edu/housenumbers.
[15] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 248-255.
[16] L. Fei-Fei, R. Fergus, and P. Perona, "Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories," Computer vision and Image understanding, vol. 106, pp. 59-70 URL:http://www.vision.caltech.edu/Image_Datasets/Caltech101/,
2007.
[17] G. H. Griffin and A. Perona, "P. the caltech-256," Caltech Technical Report, Tech. Rep., 2012 URL:http://www.vision.caltech.edu/Image_Datasets/Caltech256/.
[18] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, "Labeled faces in the wild: A database for studying face recognition in unconstrained environments," Technical Report 07-49, University of Massachusetts, Amherst2007 URL http://viswww.cs.umass.edu/lfw/index.html.
[19] "Pascal VOC Challenges Datasets(2005-2010)," URL http://host.robots.ox.ac.uk/pascal/VOC/.
[20] J. Bergstra, O. Breuleux, P. Lamblin, R. Pascanu, O. Delalleau, G. Desjardins, et al., "Theano: Deep learning on gpus with python," 2011 URL: http://deeplearning.net/software/theano/index.html.
[21] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, et al., "Theano: A CPU and GPU math compiler in Python," in Proc. 9th Python in Science Conf, 2010, pp. 1-7.
[22] Y. JİA, "http://daggerfs.com/."
[23] Caffe, "http://caffe.berkeleyvision.org/ ".
[24] R. Collobert, K. Kavukcuoglu, and C. Farabet, "Torch7: A matlab-like environment for machine learning," in BigLearn, NIPS Workshop, 2011.
[25] Torch, "http://torch.ch/."
[26] T. Flow, https://www.tensorflow.org/.
[27] F. Chollet, "Keras (2015)," URL http://keras. io, 2017.
[28] V. Kovalev, A. Kalinovsky, and S. Kovalev, "Deep Learning with Theano, Torch, Caffe, Tensorflow, and Deeplearning4J: Which One is the Best in Speed and Accuracy?," 2016.
[29] A. Vedaldi and K. Lenc, "Matconvnet: Convolutional neural networks for matlab," in Proceedings of the 23rd ACM international conference on Multimedia, 2015 URL:http://www.vlfeat.org/matconvnet/, pp. 689-692.