CS231n(1):计算机视觉简介

| 研究学术  | CNN  计算机视觉  机器学习应用 

这个CNN系列,主要内容是斯坦福大学“CS231n: Convolutional Neural Networks for Visual Recognition”课程的笔记。斯坦福大学机器视觉相关课程包括CS131CS231aCS231nCS331和CS431。

机器视觉简史

  • 1959年,Hubel & Wiesel,[1]
  • 1963年,Larry Roberts,Block world [2]
  • 1966年,The Summer Vision Project;
  • 1970s,David Marr,”Vision”,Stages of Visual Representation [3]
  • 1973年,Fischler & Elschlager,Pictorial Structure [4]
  • 1979年,Brooks & Binford,Generalized Cylinder [5]
  • 1987年,David Lowe,[6]
  • 1997年,Shi & Malik,Normalized Cut [7]
  • 1999年,David Lowe,SIFT & Object Recognition [8]
  • 2001年,Viola & Jones,Face Detection [9]
  • 2005年,Dalal & Triggs,HOG(Histogram of Gradients) [10]
  • 2005年~2012年,PASCAL Visual Object Challenge [11], [12]
  • 2006年,Lazebnik, Schmid & Ponce,Spatial Pyramid Matching [13]
  • 2009年,Felzenswalb, McAllester & Ramanan,Deformable Part Model [14]
  • 2009年,ImageNet:Large scale visual recognition challenge [15], [16]

2006年,Fuji Film采用Viola & Jones的方法[9],第一个实现了人脸检测的数码相机。

图像分类简介

图像分类与一系列的视觉识别问题都相关,比如:对象识别、图像标注、行为识别。卷积神经网络(CNN,Convolutional Neural Network)是对象识别的重要工具。

ILSVRC比赛中,2011年采用的是经典的特征提取与线性分类器[17],从2012年开始,优胜队伍均采用了深度神经网络[18], [19], [20], [21],2015年MSRA的深度神经网络多达151层。

2012年,Krizhevsky采用的深度神经网络,事实上对LeCun的网络[22]改进很少,但是由于计算能力的提升,数据量的增加,赢得了ImageNET的LSVRC比赛。

视觉智能(visual intelligence)追求的目标远远高于对象识别,不仅要识别对象,而且要理解图像表达的意思[23]

参考资料

  1. [1]D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,” The Journal of physiology, vol. 148, no. 3, pp. 574–591, 1959.
  2. [2]L. G. Roberts, “Machine perception of three-dimensional solids,” PhD thesis, Massachusetts Institute of Technology, 1963.
  3. [3]D. Marr, Vision: A computational investigation into the human representation and processing of visual information. The MIT Press, 2010. [Online]
  4. [4]M. A. Fischler and R. A. Elschlager, “The representation and matching of pictorial structures,” IEEE Transactions on computers, no. 1, pp. 67–92, 1973.
  5. [5]R. A. Brooks, R. Creiner, and T. O. Binford, “The ACRONYM model-based vision system,” in Proceedings of the 6th international joint conference on Artificial intelligence, 1979, pp. 105–113.
  6. [6]D. G. Lowe, “Three-dimensional object recognition from single two-dimensional images,” Artificial intelligence, vol. 31, no. 3, pp. 355–395, 1987.
  7. [7]J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
  8. [8]D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
  9. [9]P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, vol. 1, pp. I–511.
  10. [10]N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 886–893.
  11. [11]M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
  12. [12]M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes Challenge: A Retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, Jan. 2015.
  13. [13]S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in IEEE Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 2169–2178.
  14. [14]P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained, multiscale, deformable part model,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
  15. [15]J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
  16. [16]O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
  17. [17]Y. Lin et al., “Large-scale image classification: fast feature extraction and svm training,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1689–1696.
  18. [18]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
  19. [19]C. Szegedy et al., “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
  20. [20]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  21. [21]K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
  22. [22]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  23. [23]L. Fei-Fei, A. Iyer, C. Koch, and P. Perona, “What do we perceive in a glance of a real-world scene?,” Journal of vision, vol. 7, no. 1, pp. 10–10, 2007.


打赏作者


上一篇:DILinAV(4):基于最小割的推理     下一篇:CS231n(2):图像分类流程