Problem 1
Estimates for identical parameters are heavily dependent on the
assumed underlying probability densities. To understand this
sensitivity better, consider the following variety of
problems, each of which asks for estimates of quantities
related to variance. Determine the bias and consistency in
each case.
1.a)
Compute the maximum a posteriori and
maximum likelihood estimates of
θ
θ
based on
L
L
statistically independent observations of a Maxwellian
random variable
r
r.
∀r,θ,r>0∧θ>0:pr|θr=2πθ-3/2r2ⅇ-12r2θ
r
θ
r
0
θ
0
p
r
θ
r
2
π
θ
-32
r
2
1
2
r
2
θ
∀θ,θ>0:pθθ=λⅇ-λθ
θ
θ
0
p
θ
θ
λ
λ
θ
1.b)
Find the maximum a posteriori estimate
of the variance
σ2
σ
2
from
L
L
statistically independent observations having the
exponential density
∀r,r>0:prr=1σ2ⅇ-rσ2
r
r
0
p
r
r
1
σ
2
r
σ
2
where the variance is uniformly distributed over the interval
0
σ
max
2
0
σ
max
2
.
1.c)
Find the maximum likelihood estimate of the variance of
L
L
identically distributed, but dependent Gaussian random
variables. Here, the covariance matrix is written
Kr=σ2
K
∼
r
K
r
σ
2
K
∼
r
,
where the normalized covariance matrix has trace
tr
K
∼
r=L
tr
K
∼
r
L
Problem 2
Imagine yourself idly standing on the corner in a large city
when you note the serial number of a passing beer truck.
Because you are idle, you wish to estimate (guess may be
more accurate here) how many beer trucks the city has from
this single operation
2.a)
Making appropriate assumptions, the beer truck's number is
drawn from a uniform probability density ranging between
zero and some unknown upper limit, find the maximum
likelihood estimate of the upper limit.
2.b)
Show that this estimate is biased.
2.c)
In one of your extraordinarily idle moments, you observe
throughout the city
L
L
beer trucks. Assuming them to be independent
observations, now what is the maximum likelihood estimate
of the total?
2.d)
Is this estimate of
θ
θ
biased? asymptotically biased? consistent?
Problem 3
We make
L
L
observations
r
1
,
…
,
r
L
r
1
,
…
,
r
L
of a parameter
θ
θ
corrupted by additive noise (
r
l
=θ+
n
l
r
l
θ
n
l
). The parameter
θ
θ
is a Gaussian random variable
[
θ∼0
σ
θ
2
θ
0
σ
θ
2
]
and
n
l
n
l
are statistically independent Gaussian random variables
[
n
l
∼0
σ
n
2
n
l
0
σ
n
2
].
3.a)
Find the MMSE estimate of
θ
θ.
3.b)
Find the maximum a posteriori estimate of
θ
θ.
3.c)
Compute the resulting mean-squared error for each estimate.
3.d)
Consider an alternate procedure based on the same observations
r
l
r
l
. Using the MMSE criterion, we estimate
θ
θ
immediately after each observation. This procedure yields
the sequence of estimates
θ
1
r
1
̂
θ
1
r
1
,
θ
2
r
1
r
2
̂
θ
2
r
1
r
2
, …,
θ
L
r
1
…
r
L
̂
θ
L
r
1
…
r
L
. Express
θ̂1
θ
1
as a function of
θ̂
l
-
1
θ
l
-
1
,
σ
l
-
1
2
σ
l
-
1
2
, and
r
l
r
l
. Here,
σ
l
2
σ
l
2
denotes the variance of the estimation error of the
l
th
l
th
estimate. Show that
1
σ
l
2
=1
σ
θ
2
+1
σ
n
2
1
σ
l
2
1
σ
θ
2
1
σ
n
2
Problem 4
Although the maximum likelihood estimation procedure was not
clearly defined until early in the 20th century, Gauss
showed in 1905 that the Gaussian density
was the sole density for which the
maximum likelihood estimate of the mean equaled the sample
average. Let
r
0
…
r
L
-
1
r
0
…
r
L
-
1
be a sequence of statistically independent, identically
distributed random variables.
4.a)
What equation defines the maximum likelihood estimate
m̂ML
m
ML
of the mean
m
m
when the common probability density function of the data
has the form
pr-m
p
r
m
?
4.b)
The sample average is, of course,
∑l
r
l
L
l
l
r
l
L
.
Show that it minimizes the mean-square error
∑l
r
l
-m2
l
l
r
l
m
2
.
4.c)
Equating the sample average to
m̂ML
m
ML
, combine this equation with the maximum
likelihood equation to show that the Gaussian density
uniquely satisfies the equations.
note:
Because both equations equal 0, they can be equated. Use
the fact that they must hold for all
L
L
to derive the result. Gauss thus showed that mean-squared
error and the Gaussian density were closely linked,
presaging ideas from modern robust estimation theory.
Problem 5
In
this example,
we derived the maximum likelihood estimate of the mean and
variance of a Gaussian random vector. You might wonder why
we chose to estimate the variance
σ2
σ
2
rather than the standard deviation
σ σ. Using the same
assumptions provided in the example, let's explore the
consequences of estimating a
function
of a parameter (
van Trees: Probs
2.4.9, 2.4.10).
5.a)
Assuming that the mean is known, find the maximum
likelihood estimates of first the variance, then the
standard deviation.
5.b)
Are these estimates biased?
5.c)
Describe how these two estimates are related. Assuming that
f·
f
·
is a monotonic function, how are
θ̂ML
θ
ML
and
f
(
θ
)
̂ML
f
(
θ
)
ML
related in general? These results suggest a general
question. Consider the problem of estimating some
function of a parameter
θ
θ, say
f
1
θ
f
1
θ
.
The observed quantity is
r
r
and the conditional density
pr|θr
p
r
θ
r
is known. Assume that
θ
θ
is a nonrandom parameter.
5.d)
What are the conditions for an efficient estimate
f
1
θ
̂
f
1
θ
to exist?
5.e)
What is the lower bound on the variance of the error of
any unbiased estimate of
f
1
θ
f
1
θ
?
5.f)
Assume an efficient estimate of
f
1
θ
f
1
θ
exists; when can an efficient estimate of some other function
f
2
θ
f
2
θ
exist?
Problem 6
Let the observations
rl
r
l
consist of statistically independent, identically
distributed Gaussian random variables having zero mean but
unknown variance. We wish to estimate
σ2
σ
2
, their variance.
6.a)
Find the maximum likelihood estimate
σ
ML
2
̂
σ
ML
2
and compute the resulting mean-squared error.
6.b)
Show that this estimate is efficient.
6.c)
Consider a new estimate
σ
NEW
2
̂
σ
NEW
2
given by
σ
NEW
2
̂=α
σ
ML
2
̂
σ
NEW
2
α
σ
ML
2
, where
α
α
is a constant. Find the value of
α
α
that minimizes the mean-squared error for
σ
NEW
2
̂
σ
NEW
2
.
Show that the mean-squared error of
σ
NEW
2
̂
σ
NEW
2
is less than that of
σ
ML
2
̂
σ
ML
2
. Is this result compatible with
this previous part?
Problem 7
Let the observations be of the form
r=Hθ+n
r
H
θ
n
where
θ
θ
and
n
n
are statistically independent Gaussian random vectors.
θ∼0
K
θ
θ
0
K
θ
n∼0
K
n
n
0
K
n
The vector
θ
θ
has dimension
M
M;
the vectors
r
r
and
n
n
have dimension
N
N.
7.a)
Derive the minimum mean-squared error estimate of
θ
θ,
θ̂MMSE
θ
MMSE
,
from the relationship
θ̂MMSE=Eθ|r
θ
MMSE
r
θ
7.b)
Show that this estimate and the optimum linear estimate
θ̂LIN
θ
LIN
derived by the Orthogonality Principle are equal.
7.c)
Find an expression for the mean-squared error when these
estimates are used.
Problem 8
To illustrate the power of importance sampling, let's
consider a somewhat naïve example. Let
r
r
have a zero-mean Laplacian distribution; we want to employ
importance sampling techniques to estimate
Prr>γ
r
γ
(despite the fact that we can calculate it easily). Let the
density for
r
˜
r
˜
be Laplacian having mean
γ
γ.
8.a)
Find the weight
c
l
c
l
that must be applied to each decision based on the variable
r
˜
r
˜
.
8.b)
Find the importance sampling gain. Show that this gain
means that a fixed number of
simulations are needed to achieve a given percentage
estimation error (as defined by the coefficient of
variation). Express this number as a function of the
criterion value for the coefficient of variation.
8.c)
Now assume that the density for
r
˜
r
˜
is Laplacian, but with mean
m
m. Optimize
m
m
by finding the value that maximizes the importance
sampling gain.
Problem 9
Suppose we consider an estimate of the parameter
θ
θ
having the form
θ
̂=ℒr+C
θ
ℒ
r
C
, where
r
r
denotes the vector of the observables and
ℒ·
ℒ
·
is a linear operator. The quantity
C
C
is a constant. This estimate is not a
linear function of the observables unless
C=0
C
0
. We are interested in finding applications for
which it is advantageous to allow
C≠0
C
0
. Estimates of this form we term
"quasi-linear".
9.a)
Show that the optimum (minimum mean-squared error)
quasi-linear estimate satisfies
E<
ℒ
⋄
r+
C
⋄
-θ,ℒr+C>=0
ℒ
⋄
r
C
⋄
θ
ℒ
r
C
0
for all
ℒ·
ℒ
·
and
C
C where
θ̂QLIN=
ℒ
⋄
r+
C
⋄
θ
QLIN
ℒ
⋄
r
C
⋄
.
9.b)
Find a general expression for the mean-squared error
incurred by the optimum quasi-linear estimate.
9.c)
Such estimates yield a smaller mean-squared error when
the parameter
θ
θ
has a nonzero mean. Let
θ
θ
be a scalar parameter with mean
m
m. The observables comprise a vector
r
r
having components given by
r
l
=θ+
n
l
r
l
θ
n
l
,
l∈1…N
l
1
…
N
where
n
l
n
l
are statistically independent Gaussian random variables
[
n
l
∼0
σ
n
2
n
l
0
σ
n
2
] independent of
θ
θ. Compute expressions for
θ̂QLIN
θ
QLIN
and
θ̂LIN
θ
LIN
. Verify that
θ̂QLIN
θ
QLIN
yields a smaller mean-squared error when
m≠0
m
0
.
Problem 10
In
this section, we
questioned the existence of an efficient estimator for
signal parameters. We found in the succeeding example that
an unbiased efficient estimator exists for the signal
amplitude. Can a nonlinearly represented parameter, such as
time delay, have an efficient estimator?
10.a)
Simplify the condition for the existence of an efficient
estimator by assuming it to be unbiased. Note carefully
the dimensions of the matrices involved.
10.b)
Show that the only solution in this case occurs when the
signal depends "linearly" on the parameter vector.
Problem 11
In Poission problems, the number of events
n
n
occurring in the interval
0T
0
T
is governed by the probability distribution (see
The Poission Process)
Prn=λTnn!ⅇ-λT
n
λ
T
n
n
λ
T
where
λ
λ
is the average rate at which events occur.
11.a)
What is the maximum likelihood estimate of average rate?
11.b)
Does this estimate satisfy the Cramér-Rao bound?
Problem 12
In the "classic" radar problem, not only is the time of
arrival of the radar pulse unknown but also the amplitude.
In this problem, we seek methods of simultaneously
estimating these parameters. The received signal
rl
r
l
is of the form
rl=
θ
1
sl-
θ
2
+nl
r
l
θ
1
s
l
θ
2
n
l
where
θ
1
θ
1
is Gaussian with zero mean and variance
σ
1
2
σ
1
2
and
θ
2
θ
2
is uniformly distributed over the observation interval.
Find the receiver that computes the maximum a
posteriori estimates of
θ
1
θ
1
and
θ
2
θ
2
jointly. Draw a block diagram of this receiver and
interpret its structure.
Problem 13
We state without derivation the Cramér-Rao bound for
estimates of signal delay (see
this equation).
13.a)
The parameter
θ
θ
is the delay of the signal
s·
s
·
observed in additive, white Gaussian noise:
rl=sl-θ+nl
r
l
s
l
θ
n
l
,
l∈0…L-1
l
0
…
L
1
.
Derive the Cramér-Rao bound for this problem.
13.b)
In
Time-delay Estimation,
this bound is claimed to be given by
σ
n
2
Eβ2
σ
n
2
E
β
2
, where
β2
β
2
is the mean-squared bandwidth. Derive this result from
your general formula. Does the bound make sense for all
values of signal-to-noise ratio
E
σ
n
2
E
σ
n
2
?
13.c)
Using optimal detection theory, derive the expression
(see
Time-Delay Estimation)
for the probability of error incurred when trying to
distinguish between a delay of
ττ and a delay of
τ+Δ
τ
Δ
. Consistent with the problem pposed for the
Cramér-Rao bound, assume the delayed signals are
observed in additive, white Gaussian noise.
Problem 14
In formulating detection problems, the signal as well as the
noise are sometimes modeled as Gaussian processes. Let's
explore what differences arise in the Cramér-Rao
bound derived when the signal is deterministic. Assume that
the signal contains unknown parameters
θ θ, that it is statistically
independent of the noise, and that the noise covariance
matrix is known.
14.a)
What forms do the conditional densities of the
observations take under the two assumptions? What are the
two covariance matrices?
14.b)
Assuming the stochastic signal model, show that each
element of the Fisher information matrix has the form
Fij=12trK-1∂∂
θ
i
KK-1∂∂
θ
j
K
F
i
j
1
2
tr
K
θ
i
K
K
θ
j
K
where
K
K
denotes the covariance matrix of the observations. Make
this expression more complex by assuming the noise
complement has no unknown parameters.
14.c)
Compare the stochastic and deterministic bounds, the
latter is given by
this equation, when the unknown
signal parameters are amplitude and delay. Assume the
noise covariance matrix equals
σ
n
2
I
σ
n
2
I
. Do these bounds have similar dependence on
signal-to-noise ratio?
Problem 15
The histogram probability density estimator is a special
case of a more general class of estimators known as
kernel estimators.
prx
̂=1L∑l=0L-1kx-rl
p
r
x
1
L
l
0
L
1
k
x
r
l
Here, the kernel
k·
k
·
is usually taken to be a density itself.
15.a)
What is the kernel for the histogram estimator.
15.b)
Interpret the kernel estimator in signal processing
terminology. Predict what the most time consuming
computation of this estimate might be. Why?
15.c)
Show that the sample average equals the expected value
of a random variable having the density
prx
̂
p
r
x
regardless of the choice of kernel.
Problem 16
Random variables can be generated quite easily if the
probability distribution function is
"nice." Let
X
X
be a random variable having distribution function
PX·
P
X
·
.
16.a)
Show that the random variable
U=PXX
U
P
X
X
is uniformly distributed over
01
0
1
.
16.b)
Based on this result, how would you generate a random
variable having a specific density with a uniform random
variable generator, which is commonly supplied with most
computer and calculator systems?
16.c)
How would you generate random variables having the
hyperbolic secant density
pXx=12sechπx2
p
X
x
1
2
x
2
?
16.d)
Why is the Gaussian not in the class of "nice" probability
distribution functions? Despite this fact, the Gaussian
and other similarly unfriendly random variables can be
generated using tabulated rather than analytic forms for
the distribution function.
References-
H.L. van Trees. (1968). Detection, Estimation, and Modulation Theory, Part I. New York: John Wiley and Sons.
Comments, questions, feedback, criticisms?
Send feedback