# Tracking-Learning-Detection

| |

## 1 引言

1. 处理任意复杂的视频流，其中跟踪失败频繁；
2. 若视频不包含相关信息，也从不劣化检测器；
3. 操作的时实性。

## 2 相关工作

### 2.1 目标跟踪

• 第一种，在环境中搜寻支持目标（supporting object），通过感兴趣的目标（object of interest）校正其运动[25][26]。当感兴趣的目标从摄像机视野中消失或经历复杂的变化时，这些支持目标对跟踪有帮助。
• 第二种，环境被当成跟踪器应区别开的负类（negative class）。

## 3 跟踪－学习－检测

TLD是一种针对视频流中长效跟踪未知目标的框架。它的框图如上图所示。该框架模块具有的特点如下：

• 跟踪器基于帧与帧之间的运动有限且目标可见的假设，估计连续帧之间目标的运动。如果目标跑出摄像机视野，跟踪器可能失败且不可恢复。
• 检测器认为每帧都是独立的，进行全图扫描，定位过去已发现过和学习过的全部目标。与其它任何检测器一样，检测器会犯两种错误：纳伪（false positive）和弃真（false negative）。
• 学习过程关注跟踪器和检测器的性能，估计检测器的误差，生成训练样本以避免未来犯这些错误。学习模块认为跟踪器和检测器都可能失败。凭借学习过程，检测器推广到更多的目标外观，同时区分背景。

## 4 P-N学习

### 4.1 形式化

1. 需要学习的分类器；
2. 训练集——标注的训练样本集合；
3. 监督训练——从训练集训练分类器的方法；
4. P-N专家——在训练过程中产生正负训练样本的函数。

P-N学习的关键部分是估计分类器误差。其核心思想是将估计纳伪和估计弃真分开。由于这个原因，未标注集通过目前分类结果被分为两部分，每部分由独立的专家分析：

• P专家分析分类为负的样本，估计弃真的样本并将它们按正标签加入训练集。在第$k$轮迭代中，P专家输出$n^+(k)$个正样本。
• N专家分析分类为正的样本，估计纳伪的样本并将它们按负标签加入训练集。在第$k$轮迭代中，N专家输出$n^-(k)$个负样本。

P专家增强分类器的泛化力（generality）。N专家增强分类器的辨别力（discriminability）。

### 4.2 稳定性

\begin{align} \label{eq:1a} \alpha(k+1) &= \alpha(k)-n_c^-(k)+n_f^+(k)\\ \label{eq:1b} \beta(k+1) &= \beta(k)-n_c^+(k)+n_f^-(k)。 \end{align}

1. P精度——正标签的可靠性，也就是，正确的正样本数量除以P专家输出的所有正样本数量，$P^+=n_c^+/\left(n_c^++n_f^+\right)$。
2. P召回率——认定为弃真的误差所占百分比，也就是，正确的正样本数量除以分类器输出的所有弃真的样本数量，$R^+=n_c^+/\beta$。
3. N精度——负标签的可靠性，也就是，正确的负样本数量除以N专家输出的所有负样本数量，$P^-=n_c^-/\left(n_c^-+n_f^-\right)$。
4. N召回率——识别为纳伪的误差所占百分比，也就是，正确的负样本数量除以分类器输出的所有纳伪的样本数量，$R^-=n_c^-/\alpha$。

\begin{align} \label{eq:2a} n_c^+ = R^+\beta(k),\quad &n_f^+(k)={(1-P^+)\over P^+}R^+\beta(k) \\ \label{eq:2b} n_c^- = R^-\alpha(k),\quad &n_f^-(k)={(1-P^-)\over P^-}R^-\alpha(k)。 \end{align}

\begin{align} \label{eq:3a} \alpha(k+1) &= (1-R^-)\alpha(k)+{(1-P^+)\over P^+}R^+\beta(k) \\ \label{eq:3b} \beta(k+1) &= {(1-P^-)\over P^-}R^-\alpha(k)+(1-R^+)\beta(k)。 \end{align}

\end{bmatrix}

$$\vec{x}(k+1)=\mathbf M\vec{x}(k)。$$

### 4.3 仿真专家的试验

P-N专家通过4个质量度量刻画，$P^+$、$R^+$、$P^-$和$R^-$。为了约简该四维空间，参数设置为$P^+=R^+=P^-=R^-=1-\epsilon$，其中$\epsilon$表示专家的误差。转移矩阵变为了$\mathbf M=\epsilon\mathbf 1$，其中$\mathbf 1$是所有元素为$1$的$2\times 2$矩阵。该矩阵的特征值$\lambda_1=0$、$\lambda_2=2\epsilon$。因此，当$\epsilon<0.5$时，P-N学习将提升性能。误差在$\epsilon=0:0.9$范围内变动。

### 4.4 真实专家的设计

P-N学习通过称之为初始检测器（inital detector）的监督训练初始化。在每帧，P-N学习执行以下步骤：

1. 在当前帧评估检测器；
2. 通过P-N专家估计检测器误差；
3. 通过专家输出的标注样本更新检测器。

P专家利用视频中的时间结构，认为目标沿轨迹运动。P专家记住了目标在前一帧的位置，利用帧到帧的跟踪器估计当前帧目标的位置。若检测器将当前位置标记为负（也就是，犯了弃真错误），P专家生产一个正样本。

N专家利用视频中的空间结构，认为目标只能出现在一个位置。N专家分析当前帧检测器的所有响应、跟踪器的响应，选出最可信的一个位置。那些与最可信块不重叠的块标记为负。最可信块重新初始化跟踪器的位置。

## 参考资料

1. [1]Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-learning-detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1409–1422, 2012.
2. [2]A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of the eleventh annual conference on Computational learning theory, 1998, pp. 92–100.
3. [3]B. D. Lucas, T. Kanade, and others, “An iterative image registration technique with an application to stereo vision,” in IJCAI, 1981, vol. 81, pp. 674–679.
4. [4]J. Shi and C. Tomasi, “Good features to track,” in Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Conference on, 1994, pp. 593–600.
5. [5]P. Sand and S. Teller, “Particle video: Long-range motion estimation using point trajectories,” International Journal of Computer Vision, vol. 80, no. 1, pp. 72–91, 2008.
6. [6]L. Wang, W. Hu, and T. Tan, “Recent developments in human motion analysis,” Pattern recognition, vol. 36, no. 3, pp. 585–601, 2003.
7. [7]D. Ramanan, D. A. Forsyth, and A. Zisserman, “Tracking people by learning their appearance,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 1, pp. 65–81, 2007.
8. [8]P. Buehler, M. Everingham, D. P. Huttenlocher, and A. Zisserman, “Long term arm and hand tracking for continuous sign language TV broadcasts,” in Proceedings of the 19th British Machine Vision Conference, 2008, pp. 1105–1114.
9. [9]S. Birchfield, “Elliptical head tracking using intensity gradients and color histograms,” in Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on, 1998, pp. 232–237.
10. [10]M. Isard and A. Blake, “Condensation—conditional density propagation for visual tracking,” International journal of computer vision, vol. 29, no. 1, pp. 5–28, 1998.
11. [11]C. Bibby and I. Reid, “Robust real-time visual tracking using pixel-wise posteriors,” in Computer Vision–ECCV 2008, Springer, 2008, pp. 831–844.
12. [12]C. Bibby and I. Reid, “Real-time tracking of multiple occluding objects using level sets,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 1307–1314.
13. [13]B. K. Horn and B. G. Schunck, “Determining optical flow,” in 1981 Technical Symposium East, 1981, pp. 319–331.
14. [14]T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in Computer Vision-ECCV 2004, Springer, 2004, pp. 25–36.
15. [15]J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” International journal of computer vision, vol. 12, no. 1, pp. 43–77, 1994.
16. [16]D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 5, pp. 564–577, 2003.
17. [17]I. Matthews, T. Ishikawa, and S. Baker, “The template update problem,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 6, pp. 810–815, 2004.
18. [18]N. D. H. Dowson and R. Bowden, “Simultaneous modeling and tracking (smat) of feature sets,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, vol. 2, pp. 99–105.
19. [19]A. Rahimi, L.-P. Morency, and T. Darrell, “Reducing drift in differential tracking,” Computer Vision and Image Understanding, vol. 109, no. 2, pp. 97–111, 2008.
20. [20]A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi, “Robust online appearance models for visual tracking,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 10, pp. 1296–1311, 2003.
21. [21]A. Adam, E. Rivlin, and I. Shimshoni, “Robust fragments-based tracking using the integral histogram,” in Computer vision and pattern recognition, 2006 IEEE Computer Society Conference on, 2006, vol. 1, pp. 798–805.
22. [22]M. J. Black and A. D. Jepson, “Eigentracking: Robust matching and tracking of articulated objects using a view-based representation,” International Journal of Computer Vision, vol. 26, no. 1, pp. 63–84, 1998.
23. [23]D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, “Incremental learning for robust visual tracking,” International Journal of Computer Vision, vol. 77, no. 1-3, pp. 125–141, 2008.
24. [24]J. Kwon and K. M. Lee, “Visual tracking decomposition,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 1269–1276.
25. [25]M. Yang, Y. Wu, and G. Hua, “Context-aware visual tracking,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 7, pp. 1195–1209, 2009.
26. [26]H. Grabner, J. Matas, L. Van Gool, and P. Cattin, “Tracking the invisible: Learning where the object might be,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 1285–1292.
27. [27]S. Avidan, “Support vector tracking,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, no. 8, pp. 1064–1072, 2004.
28. [28]R. T. Collins, Y. Liu, and M. Leordeanu, “Online selection of discriminative tracking features,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 10, pp. 1631–1643, 2005.
29. [29]S. Avidan, “Ensemble tracking,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 2, pp. 261–271, 2007.
30. [30]H. Grabner and H. Bischof, “On-line boosting and vision,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2006, vol. 1, pp. 260–267.
31. [31]B. Babenko, M.-H. Yang, and S. Belongie, “Visual tracking with online multiple instance learning,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 983–990.
32. [32]H. Grabner, C. Leistner, and H. Bischof, “Semi-supervised on-line boosting for robust tracking,” in Computer Vision–ECCV 2008, Springer, 2008, pp. 234–247.
33. [33]F. Tang, S. Brennan, Q. Zhao, and H. Tao, “Co-tracking using semi-supervised support vector machines,” in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 2007, pp. 1–8.
34. [34]Q. Yu, T. B. Dinh, and G. Medioni, “Online tracking and reacquisition using co-trained generative and discriminative trackers,” in Computer Vision–ECCV 2008, Springer, 2008, pp. 678–691.
35. [35]D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
36. [36]P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, vol. 1, pp. I–511.
37. [37]V. Lepetit, P. Lagger, and P. Fua, “Randomized trees for real-time keypoint recognition,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, vol. 2, pp. 775–781.
38. [38]L. Vacchetti, V. Lepetit, and P. Fua, “Stable real-time 3d tracking using online and offline information,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, no. 10, pp. 1385–1391, 2004.
39. [39]S. Taylor and T. Drummond, “Multiple target localisation at over 100 fps,” 2009.
40. [40]J. Pilet and H. Saito, “Virtually augmenting hundreds of real pictures: An approach based on learning, retrieval, and tracking,” in Virtual Reality Conference (VR), 2010 IEEE, 2010, pp. 71–78.
41. [41]S. Obdrzalek and J. Matas, “Sub-linear Indexing for Large Scale Object Recognition.,” in BMVC, 2005, pp. 1–10.
42. [42]S. Hinterstoisser, O. Kutter, N. Navab, P. Fua, and V. Lepetit, “Real-time learning of accurate patch rectification,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 2945–2952.
43. [43]O. Chapelle, B. Schölkopf, A. Zien, and others, Semi-supervised learning. MIT press Cambridge, 2006.
44. [44]X. Zhu and A. B. Goldberg, “Introduction to semi-supervised learning,” Synthesis lectures on artificial intelligence and machine learning, vol. 3, no. 1, pp. 1–130, 2009.
45. [45]K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell, “Text classification from labeled and unlabeled documents using EM,” Machine learning, vol. 39, no. 2-3, pp. 103–134, 2000.
46. [46]R. Fergus, P. Perona, and A. Zisserman, “Object class recognition by unsupervised scale-invariant learning,” in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, 2003, vol. 2, pp. II–264.
47. [47]C. Rosenberg, M. Hebert, and H. Schneiderman, “Semi-supervised self-training of object detection models,” 2005.
48. [48]N. Poh, R. Wong, J. Kittler, and F. Roli, “Challenges and research directions for adaptive biometric recognition systems,” in Advances in Biometrics, Springer, 2009, pp. 753–764.
49. [49]A. Levin, P. Viola, and Y. Freund, “Unsupervised improvement of visual detectors using cotraining,” in Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, 2003, pp. 626–633.
50. [50]O. Javed, S. Ali, and M. Shah, “Online detection and classification of moving objects using progressively improving detectors,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, vol. 1, pp. 696–701.
51. [51]O. Williams, A. Blake, and R. Cipolla, “Sparse bayesian learning for efficient visual tracking,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 8, pp. 1292–1304, 2005.
52. [52]Y. Li, H. Ai, T. Yamashita, S. Lao, and M. Kawade, “Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 10, pp. 1728–1740, 2008.
53. [53]K. Okuma, A. Taleghani, N. De Freitas, J. J. Little, and D. G. Lowe, “A boosted particle filter: Multitarget detection and tracking,” in Computer Vision-ECCV 2004, Springer, 2004, pp. 28–39.
54. [54]B. Leibe, K. Schindler, and L. Van Gool, “Coupled detection and trajectory estimation for multi-object tracking,” in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 2007, pp. 1–8.
55. [55]M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool, “Robust tracking-by-detection using a detector confidence particle filter,” in Computer Vision, 2009 IEEE 12th International Conference on, 2009, pp. 1515–1522.
56. [56]K.-K. Sung and T. Poggio, “Example-based learning for view-based human face detection,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, no. 1, pp. 39–51, 1998.
57. [57]K. Zhou, J. C. Doyle, K. Glover, and others, Robust and optimal control, vol. 40. Prentice hall New Jersey, 1996.
58. [58]K. Ogata, Modern control engineering. Prentice-Hall Englewood Cliffs, 2009.