Object Tracking with Instance Matching and Online Learning

Tracking Scheme for Rigid
Object with Instance Matching
and Online Learning
Jui-Hsin(Larry) Larr
y

 
Demo Video www.larry-lai.com/tracking.html

2
Outline
Dif
fi
culty in Object Tracking

Proposed Tracking Scheme

Experimental Results

Conclusion & Extensions

3
Difficulty in Object Tracking (1/4)
Translation
translation is the simplest
difficulty in object tracking

most previous works were
proposed to solve this kind
of problem
Zooming
the tracking features should
be well designed

e.g. scale-invariant
features

e.g. scalable object size

4
Rotation
can not be solved without
directional features

most previous works failed
to this problem
Panning/Tilting
a very difficult challenge in
object tracking

object’s appearance beyond
initial training

5
Occlusion
a common challenge in
the real cases

tracking features should
be tolerant to outliers
Illuminance
a common challenge in
the real cases

tracking features should
be tolerant to luminance
change

6
Blur
out of focus is common in the
capture

tracking features should be
tolerant to image blur
The real cases...
combination of all the
difficulties

to design a robust tracking
algorithm just like the
mission impossible

7
Challenge
overcome these difficulties

translation, zooming, rotation, tilting/panning

occlusion, illumination, blur, combination
A tracking algorithm
high accurate performance

high precision rate

high recall rate
low computation loading

real-time tracking system

target: 640x480, 30 fps

8
Outline
Dif
fi




9
Tracking Scheme -- Step 1
... Target Instance
Possible Region
Region filter to preserve possible regions

10
Reduce total execution time

feature extraction on the whole frame is time-
consumption

instance matching on the whole frame is also time-
consumption
Find the possible regions
Region Filter
Increase tracking accuracy

tracking performance would decrease if matching to
the whole frame

11
?
t+3
Particles will randomly
distribute, and resample the
important particles.
t
t+1
t+2
Particle Filter as the Region Filter
Particles with high matching
probability are considered as
the possible regions
Sample Importance and
Resampling

12
Similarity of luminance histogram
a regional statistical feature for luminance distribution

the method of histogram shift to compensate
luminance variation

cope with translation, zooming, and blur
“Importance” in Particle Matching
2
0
0
7
0 .
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253
2
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 21 41 61 81 101 121 141 161 181 201 221 241
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 22 43 64 85 106 127 148 169 190 211 232 253
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 21 41 61 81 101 121 141 161 181 201 221 241
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 22 43 64 85 106 127 148 169 190 211 232 253

Scaling
13
Unified Instance Size
Feature Detection
...
...
...
...
Descriptor
Feature extraction on unified instance size

14
More Accurate Features
Rotation, Tilting/Panning

feature detectors with accurate feature location/size/
orientation/transformation

feature descriptors with more detailed descriptions
Need more accurate features to cope with
How to choose the feature detector

FAST, MSER, STAR, DoG, Harris, Hessian,
GoodFeatureToTrac(GFTT), ...

Harris-Affine, Hessian-Affine, ...
How to choose the feature descriptor

SIFT, A-SIFT, SURF, BRIEF, ...

15
Much Powerful Feature Detector
Simple feature

only cope with translation and rotation without scale
invariant

computation time is short
Trade-off
In our opinion, no feature detector can completely handle
the real cases.
Affine-Invariant feature

invariant to affine transformation

but computation time drastically increase

affine-invariant features have lower accuracy than
simple feature, especially angle less than 40o

Most importantly, the real-case is usually
perspective transformation, but not affine
transformation

16
Normalize the candidate region from Region Filter

apply simple feature detector and descriptor

cope with translation, zooming, and rotation

17
Panning in X-axis
Tilting
in
Y-axis
Instance Number
Instance
Database
Instance Matching
& Pose Estimation
Instance matching and online learning

18
Instance Matching
Find the highest matching instance

calculate the perspective transformation by
matching features

successful matching angle is less than 30o due to
using simple feature detector
Matching to the instances in the database

19
Pose Estimation
Find the object’s XYZ rotation and XYZ translation

refer to the method in AR calibration to find the rotation
and translation in real world coordinate

proposed the refinement model to solve the jitter
problem
Perspective model is not enough

20
Online Learning (1/2)
Panning in X-axis
Tilting
in
Y-axis
Instance Number
3D Instance Model

learn all the instance’s
appearance online

appearance in each
panning/tilting angle is
recored

construct the instance’s
3D model
Online construct the database
Multiple Instances

multiple instances in each
panning/tilting angle

record variant luminance,
occlusion, and blur situations

21
Online Learning (2/2)
Trade-off

a more complete database provides higher tracking
performance

but the computation time will linearly increase while
the database enlarging
Database is growing
Set the upper bound for database size

total instance number in each angle is fixed

First-In-First-Out mechanism is used to remove the
earliest instance in the database

=> computation time would be fixed and the
performance keeps at a high rate

22
Review of Proposed Tracking Scheme
Region filter to preserve possible regions
... Target Instance
Possible Region
...
...
...
...
Descriptor
Feature Detection
Scaling
Feature extraction on unified instance size
Instance matching and online learning
Panning in X-axis
Tilting
in
Y-axis
Instance Number
Instance Matching
& Pose Estimation
Instance
Database
Cope with

Translation

Zooming

Blur
Cope with

Translation

Zooming

Rotation
Cope with

Pan/Tilt

Luminance

Occlusion

Blur

23
Outline
Dif
fi




24
Testing Videos (1/3)
Synthetic videos by 3ds Max

isolate each tracking difficulty

targeting results are obtained
Testing videos with targeting results
Background: Simple vs. Complex
Object: Textured vs. Textureless
Provide complete testing videos for following researchers.

25
Testing videos with 13 factors. Each video length is 10s.
1 Zooming for the change of object size

2 Zooming for the change of camera focal length
3 Rotation about the z-axis of the object

4 Rotation about the z-axis of the camera

5 Panning/Tilting change by spinning the object

6 Panning/Tilting change by spinning the camera

7 Translation of the object

8 Translation of the camera

9 Occlusion (by a textured object)

10 Illumination change

11 Deformation
12 Blur
13 Combination

26
Total number of testing videos is 52 (4x13) !
Complex B. & Textured O. Simple B. & Textured O.
Complex B. & Textureless O. Simple B. & Textureless O.

27
Experiments Overview
Exp. 1 -- Performance analysis of tracking scheme

performance w/o region filter

performance w/o unified instance size

performance w/o online learning
Exp. 2 -- Performance analysis of various feature
detector and descriptor

tracking accuracy

tracking time
Exp. 3 -- Performance comparison

3 relative studies (state-of-the-art)

objective evaluation VS. subjective evaluation
Exp. 4 -- Real-Time tracking system

video of live demos

28
Exp. 1 -- Tracking Scheme Analysis (1/9)
Figure description
Index of Testing Video
F-Value
Average F-value of testing
video (10 seconds)
Overlapping region of tracking
result is used to calculate
precision and recall rates

29
F-values of the proposed tracking scheme
Occlusion (by a textured object)
Rotation Translation

30
Discussion
Performance with proposed tracking scheme

high precision and recall rates in most testing videos

textureless objects have lower performance than
textured objects

complex background or occlusion object sometimes
would make tracking performance decrease
Complex Background Textured occlusion object

31
F-values with and without Region Filter

32
Performance without Region Filter

accuracy and recall rates of most testing videos
drastically decrease

the feature is difficult to find the perfect matching
when facing a large number of features

execution time is 12.36 times slower in average
due to large number of features in the whole frame
Discussion

33
F-values with and without Unified Instance Size

34
Discussion
Performance without Unified Instance Size

performance reduction in some cases and the
overall performance become less robust

variant execution time for each tracking object due
to different object size
set the instance size to 152x152 pixels

employ FAST detector + SIFT descriptor
Settings of Unified Instance Size

35
F-values with and without Online Learning

36
Discussion
Performance without Online Learning

accuracy and recall rates of most testing videos
drastically decrease, especially for textureless
objects

but execution time is 5.31 times faster due to single
matching
upper bound of instance number is set to 17
Settings of Online Learning

37
Exp. 2 -- Feature Extractor Analysis (1/3)
Performance analysis under various feature
detectors and descriptors
28 combinations with 52 testing videos!
7 feature detectors

FAST, Harris, GFTT

MSER, STAR, DoG, Hessian
4 feature descriptors

SIFT, SURF, BRIEF, ORB
All the methods are referred to OpenCV library

38
noise robustness and clear description are the
key factors

SIFT performs the best among these 4 descriptors

BRIEF or ORB achieves good results on textured
objects

SURF works poorly in our tracking scheme
Accuracy of Descriptor
stabilization of pixel location and feature
number are the key factors

lightweight detectors (like FAST, Harris) work
comparably well in our framework
Accuracy of Detector
Best performance is obtained by FAST + SIFT

39
SIFT : SURF : BRIEF : ORB

15.80 : 3.55 : 1.00 : 1.58
Extraction time of Descriptor
SIFT : SURF : BRIEF : ORB

9.23 : 4.83 : 2.22 : 1.00
Matching time of Descriptor
FAST : Harris : GFTT : MSER : DoG : SURF : STAR : ORB

1.00 : 4.38 : 5.84 : 34.36 : 163.63 : 47.38 : 4.54 : 6.88
Extraction time of Detector

40
Exp. 3 -- Performance Comparison (1/8)
Online

AdaBoost
Mul
ti
ple

Instance Learning
[2011 PAMI] Robust Object Tracking with Online Mul
ti
ple Instance Learning

41
http://vision.ucsd.edu/~bbabenko/project_miltrack.shtml
Performance of PAMI2011

42
[2010 ICPR] Forward-Backward Error: Automatic Detection of Tracking Failures
Predator
[2010 ICIP] Face-TLD: Tracking-Learning-Detection Applied to Faces
[2010 CVPR] P-N Learning: Bootstrapping Binary Classifiers by Structural
Constraints
[2009 OLCV] Online learning of robust object detectors during unstable tracking

43
http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html
Performance of Predator

44
F-value comparison for OAB, MIL and Ours

45
Comparison of tracking abilities
Predator OAB MIL Ours
Translation
Zooming
Rotation
Panning/Tilting
Occlusion
Illuminance
Blur

46
1.298&
0.563&
1&
0&
0.2&
0.4&
0.6&
0.8&
1&
1.2&
1.4&
MIL& OAB& Our&Approach&
Computa(on*(me**
(normalized*by*our*approach)*

47
No golden answer, only user ranking
Subjective evaluation

48
Exp. 4 -- Video of Live Demos
Under Construction

49
Outline
Dif
fi




50
Conclusion (1/2)
Tracking Scheme
only candidate regions are used for following feature
processing

drastically improve tracking accuracy and efficiency

cope with translation, zooming, and blur
Region Filter
simple feature detector on unified size to obtain the
scale invariant property

make tracking accuracy more robust and fix tracking
execution time

cope with translation, zooming, and rotation

51
Contribution (2/2)
Tracking Scheme
construct instance’s appearance in 3D instance model

multiple instance to record variant instance
appearance

drastically improve tracking ability and increase
accuracy

cope with panning/tilting, illuminance, occlusion,
and blur
Online Learning

52
Contribution
Propose a tracking scheme
comparable performance with state-of-the-art in
translation/zooming/occlusion/illuminance/blur

outstanding performance than state-of-the-art in
rotation and panning/tilting
Accuracy
cope with the 7 difficulties in real cases

experimental results show the robustness
Capability
low computation requirement

achieve 640x480 30fps
Computation

53
Extensions
It’s just the beginning
feature points as the tracking feature is much
powerful and functional

without user manual initialization

tour navigation, advertisement extension, ...
Combination of Object Recognition
face can be seen as rigid instance

improve face recognition by face detection and face
tracking
Face Application
proposed scheme can be extended to non-rigid object

modify feature extraction and 3D instance model
Non-Rigid Object

Object Tracking with Instance Matching and Online Learning

More Related Content

What's hot

Similar to Object Tracking with Instance Matching and Online Learning

Recently uploaded

Object Tracking with Instance Matching and Online Learning