SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
Internship opening: Looking for interns to work at SenseTime Group Limited. e-mail
* means equal contribution
Siamese network based trackers formulate tracking as convolutional
feature cross-correlation between a target template and a search region.
However, Siamese trackers still have an accuracy gap compared with
state-of-the-art algorithms and they cannot take advantage of features
from deep networks, such as ResNet-50 or deeper.
In this work we prove the core reason comes from the lack of strict
translation invariance. By comprehensive theoretical analysis and experimental validations,
we break this restriction through a simple yet effective spatial
aware sampling strategy and successfully train a ResNet-driven Siamese tracker
with significant performance gain. Moreover, we propose a new model architecture
to perform layer-wise and depth-wise aggregations, which not only further improves
the accuracy but also reduces the model size. We conduct extensive ablation studies
to demonstrate the effectiveness of the proposed tracker, which obtains currently
the best results on five large tracking benchmarks,
including OTB2015, VOT2018, UAV123, LaSOT, and TrackingNet.
Our model will be released to facilitate further researches.
▸ bibtex
@article{li2018siamrpn++,
Author = {Li, Bo and Wu, Wei and Wang, Qiang and Zhang, Fangyi and Xing, Junliang and Yan, Junjie},
Title = {SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks},
Journal = {arXiv preprint arXiv:1812.11703},
Year = {2018}
}