We derived the minimum mean-squared error estimator in the
previous section with no constraint on
the form of the estimator. Depending on the problem, the
computations could be a linear function of the observations
(which is always the case in Gaussian problems) or
nonlinear. Deriving this estimator is often difficult, which
limits its application. We consider here a variation of MMSE
estimation by constraining the estimator to be linear while
minimizing the mean-squared estimation error. Such
linear
estimators may not be optimum; the conditional expected
value may be nonlinear and it
always has
the smallest mean-squared error. Despite this occasional
performance deficit, linear estimators have well-understood
properties, they interact will with other signal processing
algorithms because of linearity, and they can always be derived,
no matter what the problem.
Let the parameter estimate
θ
̂r
θ
r
be expressed as
ℒr
ℒ
r
where
ℒ·
ℒ
·
is a linear operator:
ℒ
a
1
r1+
a
2
r2=
a
1
ℒr1+
a
2
ℒr2
ℒ
a
1
r
1
a
2
r
2
a
1
ℒ
r
1
a
2
ℒ
r
2
where
a
1
a
1
,
a
2
a
2
are scalars. Although all estimators of this form are
obviously linear, the term
linear estimator denotes
that member of this family that minimizes the mean-squared
error.
argminℒrEεTε=θ̂LINr
ℒ
r
ε
ε
θ
LIN
r
(1)
Because of the transformation's linearity, the theory of linear
vector spaces can be fruitfully used to derive the estimator and
to specify its properties. One result of that theoretical
framework is the well-known
Orthogonality Principle
(Papoulis, pp. 407-414) The linear
estimator is that particular linear transformation that yields
an estimation error orthogonal to all linear transformations of
the data. The orthogonality of the error to
all linear transformations is termed the
universality constraint. This principle provides us
not only with a formal definition of the linear estimator but
also with the mechanism to derive it. To demonstrate this
intriguing result, let
<·,·>
·
·
denote the absract inner product between two vectors and
∥·∥
·
the associated norm.
∥x∥2=<x,x>
x
2
x
x
(2)
For example, if
xx
and
yy are each
column matrices having only one column, their inner product might be defined as
<x,x>=xTy
x
x
x
y
. Thus, the linear estimator as defined by the
Orthogonality Principle must satisfy
∀, for all linear transformations ℒ·:E<θ̂LINr-θ,ℒr>=0
for all linear transformations
ℒ
·
θ
LIN
r
θ
ℒ
r
0
(3)
To see that this principle produces the MMSE linear estimator,
we express the mean-squared estimation error
EεTε=E∥ε∥2
ε
ε
ε
2
for
any choice of linear estimator
θ
̂
θ
as
E∥
θ
̂-θ∥2=E∥θ̂LIN-θ-θ̂LIN-
θ
̂∥2=E∥θ̂LIN-θ∥2+E∥θ̂LIN-
θ
̂∥2-2E<θ̂LIN-θ,θ̂LIN-
θ
̂>
θ
θ
2
θ
LIN
θ
θ
LIN
θ
2
θ
LIN
θ
2
θ
LIN
θ
2
2
θ
LIN
θ
θ
LIN
θ
(4)
As
θ̂LIN-
θ
̂
θ
LIN
θ
is the difference of two linear transformations, it
too is linear and is orthogonal to the estimation error resulting
from
θ̂LIN
θ
LIN
. As a result, the last term is zero and the
mean-squared estimation error is the sum of two squared norms,
each of which is, of course, nonnegative. Only the second norm
varies with estimator choice; we minimize the mean-squared
estimation error by choosing the estimator
θ
̂
θ
to be the estimator
θ̂LIN
θ
LIN
, which sets the second term to zero.
The estimation error for the minimum mean-squared linear
estimator can be calculated to some degree without knowledge of
the form of the estimator. The mean-squared estimation error is
given by
E∥θ̂LIN-θ∥2=E<θ̂LIN-θ,θ̂LIN-θ>=E<θ̂LIN-θ,θ̂LIN>+E<θ̂LIN-θ,-θ>
θ
LIN
θ
2
θ
LIN
θ
θ
LIN
θ
θ
LIN
θ
θ
LIN
θ
LIN
θ
θ
(5)
The first term is zero because of the Orthogonality
Principle. Rewriting the second term yields a general expression
for the MMSE linear estimator's mean-squared error.
E∥ε∥2=E∥θ∥2-E<θ̂LIN,θ>
ε
2
θ
2
θ
LIN
θ
(6)
This error is the difference of two terms. The first, the
mean-squared value of the parameter, represents the largest
value that the estimation error can be for any reasonable
estimator. That error can be obtained by the estimator that
ignores the data and has a value of zero. The second term
reduces this maximum error and represents the degree to which
the estimate and the parameter agree on the average.
Note that the definition of the minimum mean-squared error
linear estimator makes no explicit
assumptions about the parameter estimation problem being
solved. This property makes this kind of estimator attractive in
many applications where neither the
a priori
density of the parameter vector nor the density of the
observations is known precisely. Linear transformations,
however, are homogeneous: A zero-values input yields a zero
output. Thus, the linear estimator is especially pertinent to
those problems where the expected value of the parameter is
zero. If the expected value is nonzero, the linear estimator
would not necessarily yield the best result (See
this problem)
Example 1
Express the
first example in vector notation so that the observation
vector is written as
r=Aθ+n
r
A
θ
n
where the matrix
AA
has the form
A=1…1T
A
1
…
1
. The expected value of the parameter is zero. The
linear estimator has the form
θ̂LIN=Lr
θ
LIN
L
r
, where
LL is a
1
×
L
1
×
L
matrix. The orthogonality Principle states that the
linear estimator satisfies
∀,for all
1
×
L
matricies M:ELr-θTMr=0
for all
1
×
L
matricies
M
L
r
θ
M
r
0
To use the Orthogonality Principle to derive an equation
implicitly specifying the linear estimator, the "for all
linear transformations" phrase must be interpreted. Usually
the quantity specifying the linear transformation must be
removed from the constraining inner product by imposing a very
stringent but equivalent condition. In this example, this
phrase becomes one about matrices. The elements of the matrix
MM can be such that
each element of the observation vector multiplies each element
of the estimation error. Thus, in this problem the
Othogonality Principle means that the expected value of the
matrix consisting of all pairwise priducts of these elements
must be zero.
ELr-θrT=0
L
r
θ
r
0
Thus, two terms must equal each other:
ELrrT=EθrT
L
r
r
θ
r
. The second term equals
Eθ2AT
θ
2
A
as the additive noise and the parameter are assumed
to be statistically independent quantities. The quantity
ErrT
r
r
in the first term is the correlation matrix of the
observations, which is given by
AATEθ2+
K
n
A
A
θ
2
K
n
. Here,
K
n
K
n
is the noise covariance matrix, and
Eθ2
θ
2
is the parameter's variance. The quantity
AAT
A
A
is a
L
×
L
L
×
L
matrix with each element equaling 1. The noise
vector has independent components; the covariance matrix thus
equals
σ
n
2I
σ
n
2
I
. The equation that
LL must satisfy is therefore
given by
L
1
⋯
L
L
σ
n
2+
σ
θ
2
σ
θ
2⋯
σ
θ
2
σ
θ
2
σ
n
2+
σ
θ
2⋱⋮⋮⋱⋱
σ
θ
2
σ
θ
2⋯
σ
θ
2
σ
n
2+
σ
θ
2=
σ
θ
2⋯
σ
θ
2
L
1
⋯
L
L
σ
n
2
σ
θ
2
σ
θ
2
⋯
σ
θ
2
σ
θ
2
σ
n
2
σ
θ
2
⋱
⋮
⋮
⋱
⋱
σ
θ
2
σ
θ
2
⋯
σ
θ
2
σ
n
2
σ
θ
2
σ
θ
2
⋯
σ
θ
2
The components of
LL are equal and are given by
L
i
=
σ
θ
2
σ
n
2+L
σ
θ
2
L
i
σ
θ
2
σ
n
2
L
σ
θ
2
. Thus, the minimum mean-squared error linear
estimator has the form
θ̂LINr=
σ
θ
2
σ
θ
2+
σ
n
2L1L∑lrl
θ
LIN
r
σ
θ
2
σ
θ
2
σ
n
2
L
1
L
l
l
r
l
Note that this result equals the minimum mean-squared error
estimate derived
earlier under
the condition that
Eθ=0
θ
0
. Mean-squared error, linear estimators, and Gaussian
problems are intimately related to each other. The linear
minimum mean-squared error solution to a problem is optimal if
the underlying distributions are Gaussian.
References-
A. Papoulis. (1984). Probability, Random Variables, and Stochastic Processes. (second edition). New York: McGraw-Hill.