The Kalman filter is an important generalization
of the Wiener filter. Unlike Wiener filters, which are designed
under the assumption that the signal and noise are
stationary, the Kalman filter has the ability
to adapt itself to non-stationary
environments.
The Kalman filter can be viewed as a
sequential minimum MSE estimator of a
signal in additive noise. If the signal and noise are jointly
Gaussian, the then Kalman filter is optimal in a minimum MSE
sense (minimizes expected quadratic loss).
If the signal and/or noise are non-Gaussian, then
the Kalman filter is the best linear estimator (linear estimator that
minimizes MSE among all possible linear estimators).
Recall the simple DC signal estimation problem.
∀n,n=0…N-1:
x
n
=A+
w
n
n
n
0
…
N
1
x
n
A
w
n
(1)
Where
AA is the unknown DC level and
w
n
w
n
is the white Gaussian
noise.
AA could represent the
voltage of a DC power supply. We know how to find several good
estimators of
AA given the
measurements
x
0
…
x
N
-
1
x
0
…
x
N
-
1
.
In practical situations this model may be too
simplistic. the load on the power supply may charge over time
and there will be other variations due to temperature and
component aging.
To account for these variations we can employ a
more accurate measurement model:
∀n,n=0…N-1:
x
n
=
A
n
+
w
n
n
n
0
…
N
1
x
n
A
n
w
n
(2)
where the voltage
A
n
A
n
is the
true voltage at time
nn.
Now the estimation problem is significantly more
complicated since we must estimate
A
0
…
A
N
-
1
A
0
…
A
N
-
1
. Suppose that the true voltage
A
n
A
n
does not vary too rapidly over time. Then successive
samples of
A
n
A
n
will not be too different, suggesting that the voltage
signal displays a high degree of correlation.
This reasoning suggests that it may be reasonable
to regard the sequence
A
0
…
A
N
-
1
A
0
…
A
N
-
1
, as a realization of a correlated (not white) random
process. Adopting a random process model for
A
n
A
n
allows us to pursue a Bayesian approach to the
estimation problem (Figure 1).
Using the model in
Equation 2, it is easy to verify
that the maximum likelihood and MVUB esitmators are given by
Ân=
x
n
A
n
x
n
(3)
Our estimate is simply the noisy measurements! No averaging
takes place, so there is no noise reduction.
Let's look at the example again, Figure 2.
The voltage
A
n
A
n
is varying about an average value of 10V. Assume this
average value is known and write
A
n
=10+
y
n
A
n
10
y
n
(4)
Where
y
n
y
n
is a zero-mean random process. Now a simple model for
y
n
y
n
which allows us to specify the correlation between samples is
the
first-order Gauss-Markov prcoess model:
∀n,n=12…:
y
n
=a
y
n
-
1
+
u
n
n
n
1
2
…
y
n
a
y
n
-
1
u
n
(5)
Where
u
n
∼0
σ
u
2
u
n
0
σ
u
2
iid (white Gaussian noise process). To initialize the
process we take
y
0
y
0
to be the realization of a Gaussian random variable:
y
0
∼0
σ
y
2
y
0
0
σ
y
2
.
u
n
u
n
is called the
driving or
excitation noise. The model in
Equation 5 is called the
dynamical or
state model. The current output
y
n
y
n
depends only on the
state of the
system at the previous time, or
y
n
-
1
y
n
-
1
, and the current input
u
n
u
n
(
Figure 3).
y
1
=a
y
0
+
u
0
y
1
a
y
0
u
0
y
2
=a
y
1
+
u
1
=aa
y
0
+
u
0
+
u
1
=a2
y
0
+a
u
1
+
u
2
y
2
a
y
1
u
1
a
a
y
0
u
0
u
1
a
2
y
0
a
u
1
u
2
(6)
⋮
⋮
y
n
=an+1
y
0
+∑k=1nak
u
n
-
k
y
n
a
n
1
y
0
k
1
n
a
k
u
n
-
k
E
y
n
=an+1E
y
0
+∑k=1nakE
u
n
-
k
=0
y
n
a
n
1
y
0
k
1
n
a
k
u
n
-
k
0
(7)
Correlation:
E
y
m
y
n
=Eam+1
y
0
+∑k=1mak
u
m
-
k
an+1
y
0
+∑l=1nal
u
n
-
l
=Eam+n+2
y
0
2+E∑k=1m∑l=1nak+l
u
m
-
k
u
n
-
l
y
m
y
n
a
m
1
y
0
k
1
m
a
k
u
m
-
k
a
n
1
y
0
l
1
n
a
l
u
n
-
l
a
m
n
2
y
0
2
k
1
m
l
1
n
a
k
l
u
m
-
k
u
n
-
l
(8)
E
u
m
-
k
u
n
-
l
=
σ
n
2ifm-k=n-l0otherwise
u
m
-
k
u
n
-
l
σ
n
2
m
k
n
l
0
(9)
If
m>n
m
n
, then
E
y
m
y
n
=am+n+2
σ
y
2+am-n
σ
u
2∑k=1na2k
y
m
y
n
a
m
n
2
σ
y
2
a
m
n
σ
u
2
k
1
n
a
2
k
(10)
If
|a|>1
a
1
, then it's obvious that the process diverges (
variance→∞
variance
). This is equivalent to having a pole outside the unit
circle shown in
Figure 4.
So, let's assume
|a|<1
a
1
and hence a stable system. Thus as
mm and
nn get large
am+n+2
σ
y
2→0
a
m
n
2
σ
y
2
0
Now let
m-n=τ
m
n
τ
. Then for
mm and
nn large we have
E
y
m
y
n
=aτ
σ
u
2∑k=1na2k=aτ+2
σ
u
21-a2
y
m
y
n
a
τ
σ
u
2
k
1
n
a
2
k
a
τ
2
σ
u
2
1
a
2
(11)
This shows us how correlated the process is:
|a|→1⇒heavily correlated (or anticorrelated)
a
1
heavily correlated (or anticorrelated)
|a|→0⇒weakly correlated
a
0
weakly correlated
How can we use this model to help us in our estimation problem?
Let's look at a more general formulation of the
problem at hand. Suppose that we have a vector-valued dynamical
equation
yn+1=Ayn+b
u
n
y
n
1
A
y
n
b
u
n
(12)
Where
yn
y
n
is
p×1
p
1
dimensional,
A
A is
p×p
p
p
,
and
b
b is
p×1
p
1
.
The initial
state vector is
Y0∼0R0
Y
0
0
R
0
, where
R0
R
0
is the covariance matrix and
u
n
∼0
σ
u
2
u
n
0
σ
u
2
iid (white Gaussian
excitation noise).
This reduces to the case we just looked at when
p=1
p
1
. This model could represent a
p
th
p
th
order Gauss-Markov process:
y
n
-
1
=
a
1
y
n
+
a
2
y
n
-
1
+…+
a
p
y
n
-
p
+
1
+
u
n
y
n
-
1
a
1
y
n
a
2
y
n
-
1
…
a
p
y
n
-
p
+
1
u
n
(13)
Define
yn=
y
n
-
p
+
1
y
n
-
p
+
2
⋮
y
n
-
1
y
n
y
n
y
n
-
p
+
1
y
n
-
p
+
2
⋮
y
n
-
1
y
n
(14)
Then,
yn+1=Ayn+b
u
n
=010……00010…0⋮⋮⋱⋱⋱⋮⋮⋮⋱⋱⋱000……01
a
1
a
2
……
a
p
-
1
a
p
y
n
-
p
+
1
y
n
-
p
+
2
⋮⋮
y
n
-
1
y
n
+0⋮⋮⋮01+
u
n
y
n
1
A
y
n
b
u
n
0
1
0
…
…
0
0
0
1
0
…
0
⋮
⋮
⋱
⋱
⋱
⋮
⋮
⋮
⋱
⋱
⋱
0
0
0
…
…
0
1
a
1
a
2
…
…
a
p
-
1
a
p
y
n
-
p
+
1
y
n
-
p
+
2
⋮
⋮
y
n
-
1
y
n
0
⋮
⋮
⋮
0
1
u
n
(15)
Here
A
A is the
state transition matrix. Since
yn
y
n
is a linear combination of Gaussian vectors:
yn=A2y0+∑k=1nAk-1b
u
n
-
k
y
n
A
2
y
0
k
1
n
A
k
1
b
u
n
-
k
(16)
We know that
yn
y
n
is also Gaussian distributed with mean and covariance
Rn=EynynT
R
n
y
n
y
n
,
Yn∼Rn
Y
n
R
n
. The covariance can be recursively computed from the
basic state equation:
Rn+1=ARnAT+
σ
u
2bbT
R
n
1
A
R
n
A
σ
u
2
b
b
(17)
Assume that measurements of the state are available:
x
n
=CTyn+
w
n
x
n
C
y
n
w
n
(18)
Where
w
n
∼0
σ
w
2
w
n
0
σ
w
2
iid independant of
u
n
u
n
(white Gaussian
observation noise).
For example, if
C=0…01T
C
0
…
0
1
, then
x
n
=
y
n
+
w
n
x
n
y
n
w
n
(19)
Where
x
n
x
n
is the observation,
y
n
y
n
is the signal, and
w
n
w
n
is the noise. Since our model for the signal is
Gaussian as well as the observation noise, it follows that
x
n
∼0
σ
n
2
x
n
0
σ
n
2
,
where
σ
n
2=CTRnC+
σ
w
2
σ
n
2
C
R
n
C
σ
w
2
(
Figure 5).
Kalman first posed the problem of estimating the
state of
yn
y
n
from the sequence of measurements
xn=
x
0
⋮
x
n
x
n
x
0
⋮
x
n
To derive the Kalman filter we will call upon the Gauss-Markov
Theorem.
First note that the conditional distribution of
yn
y
n
given
xn
x
n
is Gaussian:
yn
|
xn
∼ŷ
n
|
n
P
n
|
n
y
n
|
x
n
y
n
|
n
P
n
|
n
Where
ŷ
n
|
n
y
n
|
n
is the conditional mean and
P
n
|
n
P
n
|
n
is the covariance.
We know that this is the
form of the conditional distribution because
yn
y
n
and
xn
x
n
are jointly Gaussian
distributed.
yn
|
xn
∼ŷ
n
|
n
P
n
|
n
y
n
|
x
n
y
n
|
n
P
n
|
n
where
yn
y
n
is the signal samples
y
n
,
…
,
y
n
-
p
+
1
y
n
,
…
,
y
n
-
p
+
1
,
xn
x
n
is the observations/measurements
x
n
,
…
,
x
n
-
p
+
1
x
n
,
…
,
x
n
-
p
+
1
, and
ŷ
n
|
n
y
n
|
n
is the best (minimum MSE) estimator of
yn
y
n
given
xn
x
n
.
This is all well and good, but we need to know what
the conditional mean and covariance are explicitly. So the
problem is now to find/compute
ŷ
n
|
n
y
n
|
n
and
P
n
|
n
P
n
|
n
. We can take advantage of the recursive state equation
to obtain a recursive procedure for this calculation. To begin,
consider the "predictor"
ŷ
n
|
n
-
1
y
n
|
n
-
1
:
yn
|
x
n
-
1
∼ŷ
n
|
n
-
1
P
n
|
n
-
1
y
n
|
x
n
-
1
y
n
|
n
-
1
P
n
|
n
-
1
Where
yn
y
n
is the signal samples,
y
n
…
y
n
-
p
+
1
y
n
…
y
n
-
p
+
1
,
x
n
-
1
x
n
-
1
is the observations
x
n
-
1
…
x
n
-
p
x
n
-
1
…
x
n
-
p
, and
ŷ
n
|
n
-
1
y
n
|
n
-
1
is the best min MSE estimator of
yn
y
n
given
x
n
-
1
x
n
-
1
. Although we don't know what forms
ŷ
n
|
n
-
1
y
n
|
n
-
1
and
P
n
|
n
-
1
P
n
|
n
-
1
have, we do know two important facts:
- The predictor
ŷ
n
|
n
-
1
y
n
|
n
-
1