Connexions

You are here: Home » Content » Estimation Theory: Problems
Content Actions

Estimation Theory: Problems

Module by: Don Johnson

Problem 1
Estimates for identical parameters are heavily dependent on the assumed underlying probability densities. To understand this sensitivity better, consider the following variety of problems, each of which asks for estimates of quantities related to variance. Determine the bias and consistency in each case.

1.a)

Compute the maximum a posteriori and maximum likelihood estimates of θ θ based on L L statistically independent observations of a Maxwellian random variable r r. r,θ,r>0θ>0:pr|θr=2πθ-3/2r2-12r2θ r θ r 0 θ 0 p r θ r 2 π θ -32 r 2 1 2 r 2 θ θ,θ>0:pθθ=λ-λθ θ θ 0 p θ θ λ λ θ

1.b)

Find the maximum a posteriori estimate of the variance σ2 σ 2 from L L statistically independent observations having the exponential density r,r>0:prr=1σ2-rσ2 r r 0 p r r 1 σ 2 r σ 2 where the variance is uniformly distributed over the interval 0 σ max 2 0 σ max 2 .

1.c)

Find the maximum likelihood estimate of the variance of L L identically distributed, but dependent Gaussian random variables. Here, the covariance matrix is written Kr=σ2 K r K r σ 2 K r , where the normalized covariance matrix has trace tr K r=L tr K r L
Problem 2
Imagine yourself idly standing on the corner in a large city when you note the serial number of a passing beer truck. Because you are idle, you wish to estimate (guess may be more accurate here) how many beer trucks the city has from this single operation

2.a)

Making appropriate assumptions, the beer truck's number is drawn from a uniform probability density ranging between zero and some unknown upper limit, find the maximum likelihood estimate of the upper limit.

2.b)

Show that this estimate is biased.

2.c)

In one of your extraordinarily idle moments, you observe throughout the city L L beer trucks. Assuming them to be independent observations, now what is the maximum likelihood estimate of the total?

2.d)

Is this estimate of θ θ biased? asymptotically biased? consistent?
Problem 3
We make L L observations r 1 , , r L r 1 , , r L of a parameter θ θ corrupted by additive noise ( r l =θ+ n l r l θ n l ). The parameter θ θ is a Gaussian random variable [ θ0 σ θ 2 θ 0 σ θ 2 ] and n l n l are statistically independent Gaussian random variables [ n l 0 σ n 2 n l 0 σ n 2 ].

3.a)

Find the MMSE estimate of θ θ.

3.b)

Find the maximum a posteriori estimate of θ θ.

3.c)

Compute the resulting mean-squared error for each estimate.

3.d)

Consider an alternate procedure based on the same observations r l r l . Using the MMSE criterion, we estimate θ θ immediately after each observation. This procedure yields the sequence of estimates θ 1 r 1 ̂ θ 1 r 1 , θ 2 r 1 r 2 ̂ θ 2 r 1 r 2 , …, θ L r 1 r L ̂ θ L r 1 r L . Express θ̂1 θ 1 as a function of θ̂ l - 1 θ l - 1 , σ l - 1 2 σ l - 1 2 , and r l r l . Here, σ l 2 σ l 2 denotes the variance of the estimation error of the l th l th estimate. Show that 1 σ l 2 =1 σ θ 2 +1 σ n 2 1 σ l 2 1 σ θ 2 1 σ n 2
Problem 4
Although the maximum likelihood estimation procedure was not clearly defined until early in the 20th century, Gauss showed in 1905 that the Gaussian density 1 was the sole density for which the maximum likelihood estimate of the mean equaled the sample average. Let r 0 r L - 1 r 0 r L - 1 be a sequence of statistically independent, identically distributed random variables.

4.a)

What equation defines the maximum likelihood estimate m̂ML m ML of the mean m m when the common probability density function of the data has the form pr-m p r m ?

4.b)

The sample average is, of course, l r l L l l r l L . Show that it minimizes the mean-square error l r l -m2 l l r l m 2 .

4.c)

Equating the sample average to m̂ML m ML , combine this equation with the maximum likelihood equation to show that the Gaussian density uniquely satisfies the equations.
note: Because both equations equal 0, they can be equated. Use the fact that they must hold for all L L to derive the result. Gauss thus showed that mean-squared error and the Gaussian density were closely linked, presaging ideas from modern robust estimation theory.
Problem 5
In this example, we derived the maximum likelihood estimate of the mean and variance of a Gaussian random vector. You might wonder why we chose to estimate the variance σ2 σ 2 rather than the standard deviation σ σ. Using the same assumptions provided in the example, let's explore the consequences of estimating a function of a parameter (van Trees: Probs 2.4.9, 2.4.10).

5.a)

Assuming that the mean is known, find the maximum likelihood estimates of first the variance, then the standard deviation.

5.b)

Are these estimates biased?

5.c)

Describe how these two estimates are related. Assuming that f· f · is a monotonic function, how are θ̂ML θ ML and f ( θ ) ̂ML f ( θ ) ML related in general? These results suggest a general question. Consider the problem of estimating some function of a parameter θ θ, say f 1 θ f 1 θ . The observed quantity is r r and the conditional density pr|θr p r θ r is known. Assume that θ θ is a nonrandom parameter.

5.d)

What are the conditions for an efficient estimate f 1 θ ̂ f 1 θ to exist?

5.e)

What is the lower bound on the variance of the error of any unbiased estimate of f 1 θ f 1 θ ?

5.f)

Assume an efficient estimate of f 1 θ f 1 θ exists; when can an efficient estimate of some other function f 2 θ f 2 θ exist?
Problem 6
Let the observations rl r l consist of statistically independent, identically distributed Gaussian random variables having zero mean but unknown variance. We wish to estimate σ2 σ 2 , their variance.

6.a)

Find the maximum likelihood estimate σ ML 2 ̂ σ ML 2 and compute the resulting mean-squared error.

6.b)

Show that this estimate is efficient.

6.c)

Consider a new estimate σ NEW 2 ̂ σ NEW 2 given by σ NEW 2 ̂=α σ ML 2 ̂ σ NEW 2 α σ ML 2 , where α α is a constant. Find the value of α α that minimizes the mean-squared error for σ NEW 2 ̂ σ NEW 2 . Show that the mean-squared error of σ NEW 2 ̂ σ NEW 2 is less than that of σ ML 2 ̂ σ ML 2 . Is this result compatible with this previous part?
Problem 7
Let the observations be of the form r=Hθ+n r H θ n where θ θ and n n are statistically independent Gaussian random vectors. θ0 K θ θ 0 K θ n0 K n n 0 K n The vector θ θ has dimension M M; the vectors r r and n n have dimension N N.

7.a)

Derive the minimum mean-squared error estimate of θ θ, θ̂MMSE θ MMSE , from the relationship θ̂MMSE=Eθ|r θ MMSE r θ

7.b)

Show that this estimate and the optimum linear estimate θ̂LIN θ LIN derived by the Orthogonality Principle are equal.

7.c)

Find an expression for the mean-squared error when these estimates are used.
Problem 8
To illustrate the power of importance sampling, let's consider a somewhat naïve example. Let r r have a zero-mean Laplacian distribution; we want to employ importance sampling techniques to estimate Prr>γ r γ (despite the fact that we can calculate it easily). Let the density for r ˜ r ˜ be Laplacian having mean γ γ.

8.a)

Find the weight c l c l that must be applied to each decision based on the variable r ˜ r ˜ .

8.b)

Find the importance sampling gain. Show that this gain means that a fixed number of simulations are needed to achieve a given percentage estimation error (as defined by the coefficient of variation). Express this number as a function of the criterion value for the coefficient of variation.

8.c)

Now assume that the density for r ˜ r ˜ is Laplacian, but with mean m m. Optimize m m by finding the value that maximizes the importance sampling gain.
Problem 9
Suppose we consider an estimate of the parameter θ θ having the form θ ̂=r+C θ r C , where r r denotes the vector of the observables and · · is a linear operator. The quantity C C is a constant. This estimate is not a linear function of the observables unless C=0 C 0 . We are interested in finding applications for which it is advantageous to allow C0 C 0 . Estimates of this form we term "quasi-linear".

9.a)

Show that the optimum (minimum mean-squared error) quasi-linear estimate satisfies E< r+ C -θ,r+C>=0 r C θ r C 0 for all · · and C C where θ̂QLIN= r+ C θ QLIN r C .

9.b)

Find a general expression for the mean-squared error incurred by the optimum quasi-linear estimate.

9.c)

Such estimates yield a smaller mean-squared error when the parameter θ θ has a nonzero mean. Let θ θ be a scalar parameter with mean m m. The observables comprise a vector r r having components given by r l =θ+ n l r l θ n l , l1N l 1 N where n l n l are statistically independent Gaussian random variables [ n l 0 σ n 2 n l 0 σ n 2 ] independent of θ θ. Compute expressions for θ̂QLIN θ QLIN and θ̂LIN θ LIN . Verify that θ̂QLIN θ QLIN yields a smaller mean-squared error when m0 m 0 .
Problem 10
In this section, we questioned the existence of an efficient estimator for signal parameters. We found in the succeeding example that an unbiased efficient estimator exists for the signal amplitude. Can a nonlinearly represented parameter, such as time delay, have an efficient estimator?

10.a)

Simplify the condition for the existence of an efficient estimator by assuming it to be unbiased. Note carefully the dimensions of the matrices involved.

10.b)

Show that the only solution in this case occurs when the signal depends "linearly" on the parameter vector.
Problem 11
In Poission problems, the number of events n n occurring in the interval 0T 0 T is governed by the probability distribution (see The Poission Process) Prn=λTnn!-λT n λ T n n λ T where λ λ is the average rate at which events occur.

11.a)

What is the maximum likelihood estimate of average rate?

11.b)

Does this estimate satisfy the Cramér-Rao bound?
Problem 12
In the "classic" radar problem, not only is the time of arrival of the radar pulse unknown but also the amplitude. In this problem, we seek methods of simultaneously estimating these parameters. The received signal rl r l is of the form rl= θ 1 sl- θ 2 +nl r l θ 1 s l θ 2 n l where θ 1 θ 1 is Gaussian with zero mean and variance σ 1 2 σ 1 2 and θ 2 θ 2 is uniformly distributed over the observation interval. Find the receiver that computes the maximum a posteriori estimates of θ 1 θ 1 and θ 2 θ 2 jointly. Draw a block diagram of this receiver and interpret its structure.
Problem 13
We state without derivation the Cramér-Rao bound for estimates of signal delay (see this equation).

13.a)

The parameter θ θ is the delay of the signal s· s · observed in additive, white Gaussian noise: rl=sl-θ+nl r l s l θ n l , l0L-1 l 0 L 1 . Derive the Cramér-Rao bound for this problem.

13.b)

In Time-delay Estimation, this bound is claimed to be given by σ n 2 Eβ2 σ n 2 E β 2 , where β2 β 2 is the mean-squared bandwidth. Derive this result from your general formula. Does the bound make sense for all values of signal-to-noise ratio E σ n 2 E σ n 2 ?

13.c)

Using optimal detection theory, derive the expression (see Time-Delay Estimation) for the probability of error incurred when trying to distinguish between a delay of ττ and a delay of τ+Δ τ Δ . Consistent with the problem pposed for the Cramér-Rao bound, assume the delayed signals are observed in additive, white Gaussian noise.
Problem 14
In formulating detection problems, the signal as well as the noise are sometimes modeled as Gaussian processes. Let's explore what differences arise in the Cramér-Rao bound derived when the signal is deterministic. Assume that the signal contains unknown parameters θ θ, that it is statistically independent of the noise, and that the noise covariance matrix is known.

14.a)

What forms do the conditional densities of the observations take under the two assumptions? What are the two covariance matrices?

14.b)

Assuming the stochastic signal model, show that each element of the Fisher information matrix has the form Fij=12trK-1 θ i KK-1 θ j K F i j 1 2 tr K θ i K K θ j K where K K denotes the covariance matrix of the observations. Make this expression more complex by assuming the noise complement has no unknown parameters.

14.c)

Compare the stochastic and deterministic bounds, the latter is given by this equation, when the unknown signal parameters are amplitude and delay. Assume the noise covariance matrix equals σ n 2 I σ n 2 I . Do these bounds have similar dependence on signal-to-noise ratio?
Problem 15
The histogram probability density estimator is a special case of a more general class of estimators known as kernel estimators. prx ̂=1Ll=0L-1kx-rl p r x 1 L l 0 L 1 k x r l Here, the kernel k· k · is usually taken to be a density itself.

15.a)

What is the kernel for the histogram estimator.

15.b)

Interpret the kernel estimator in signal processing terminology. Predict what the most time consuming computation of this estimate might be. Why?

15.c)

Show that the sample average equals the expected value of a random variable having the density prx ̂ p r x regardless of the choice of kernel.
Problem 16
Random variables can be generated quite easily if the probability distribution function is "nice." Let X X be a random variable having distribution function PX· P X · .

16.a)

Show that the random variable U=PXX U P X X is uniformly distributed over 01 0 1 .

16.b)

Based on this result, how would you generate a random variable having a specific density with a uniform random variable generator, which is commonly supplied with most computer and calculator systems?

16.c)

How would you generate random variables having the hyperbolic secant density pXx=12sechπx2 p X x 1 2 x 2 ?

16.d)

Why is the Gaussian not in the class of "nice" probability distribution functions? Despite this fact, the Gaussian and other similarly unfriendly random variables can be generated using tabulated rather than analytic forms for the distribution function.
1. It wasn't called the Gaussian density in 1805; this result is one of the reasons why it is.
References
  1. H.L. van Trees. (1968). Detection, Estimation, and Modulation Theory, Part I. New York: John Wiley and Sons.

Comments, questions, feedback, criticisms?

Send feedback