CS231n（1）：计算机视觉简介

2016-02-29 | 研究学术 | CNN 计算机视觉机器学习应用

这个CNN系列，主要内容是斯坦福大学“CS231n: Convolutional Neural Networks for Visual Recognition”课程的笔记。斯坦福大学机器视觉相关课程包括CS131、CS231a、CS231n、CS331和CS431。

机器视觉简史

1959年，Hubel & Wiesel，[1]；
1963年，Larry Roberts，Block world [2]；
1966年，The Summer Vision Project；
1970s，David Marr，”Vision”，Stages of Visual Representation [3]；
1973年，Fischler & Elschlager，Pictorial Structure [4]；
1979年，Brooks & Binford，Generalized Cylinder [5]；
1987年，David Lowe，[6]；
1997年，Shi & Malik，Normalized Cut [7]；
1999年，David Lowe，SIFT & Object Recognition [8]；
2001年，Viola & Jones，Face Detection [9]；
2005年，Dalal & Triggs，HOG（Histogram of Gradients） [10]；
2005年～2012年，PASCAL Visual Object Challenge [11], [12]；
2006年，Lazebnik, Schmid & Ponce，Spatial Pyramid Matching [13]；
2009年，Felzenswalb, McAllester & Ramanan，Deformable Part Model [14]；
2009年，ImageNet：Large scale visual recognition challenge [15], [16]；

2006年，Fuji Film采用Viola & Jones的方法[9]，第一个实现了人脸检测的数码相机。

图像分类简介

图像分类与一系列的视觉识别问题都相关，比如：对象识别、图像标注、行为识别。卷积神经网络（CNN，Convolutional Neural Network）是对象识别的重要工具。

在ILSVRC比赛中，2011年采用的是经典的特征提取与线性分类器[17]，从2012年开始，优胜队伍均采用了深度神经网络[18], [19], [20], [21]，2015年MSRA的深度神经网络多达151层。

2012年，Krizhevsky采用的深度神经网络，事实上对LeCun的网络[22]改进很少，但是由于计算能力的提升，数据量的增加，赢得了ImageNET的LSVRC比赛。

视觉智能（visual intelligence）追求的目标远远高于对象识别，不仅要识别对象，而且要理解图像表达的意思[23]。

参考资料

[1]D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,” The Journal of physiology, vol. 148, no. 3, pp. 574–591, 1959.
[2]L. G. Roberts, “Machine perception of three-dimensional solids,” PhD thesis, Massachusetts Institute of Technology, 1963.
[3]D. Marr, Vision: A computational investigation into the human representation and processing of visual information. The MIT Press, 2010. [Online]
[4]M. A. Fischler and R. A. Elschlager, “The representation and matching of pictorial structures,” IEEE Transactions on computers, no. 1, pp. 67–92, 1973.
[5]R. A. Brooks, R. Creiner, and T. O. Binford, “The ACRONYM model-based vision system,” in Proceedings of the 6th international joint conference on Artificial intelligence, 1979, pp. 105–113.
[6]D. G. Lowe, “Three-dimensional object recognition from single two-dimensional images,” Artificial intelligence, vol. 31, no. 3, pp. 355–395, 1987.
[7]J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
[8]D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[9]P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, vol. 1, pp. I–511.
[10]N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 886–893.
[11]M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
[12]M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes Challenge: A Retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, Jan. 2015.
[13]S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in IEEE Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 2169–2178.
[14]P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained, multiscale, deformable part model,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
[15]J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[16]O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
[17]Y. Lin et al., “Large-scale image classification: fast feature extraction and svm training,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1689–1696.
[18]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
[19]C. Szegedy et al., “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
[20]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[21]K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
[22]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[23]L. Fei-Fei, A. Iyer, C. Koch, and P. Perona, “What do we perceive in a glance of a real-world scene?,” Journal of vision, vol. 7, no. 1, pp. 10–10, 2007.

打赏作者

2016-10-24 » NNML（03）：BP 学习
2016-10-16 » NNML（02）：感知器学习
2016-10-09 » 鲁棒及自适应控制（2）：模型
2016-10-09 » NNML（01）：引言
2016-09-24 » 家用监控设备用于电网的可行性分析
2016-09-19 » 无人机电缆隧道巡检可行性调研报告
2016-09-18 » 鲁棒及自适应控制（1）：概论
2016-09-13 » Gradient-Based Learning Applied to Document Recognition

上一篇：DILinAV（4）：基于最小割的推理下一篇：CS231n（2）：图像分类流程