<< Chapter < Page Chapter >> Page >

Notice that we cannot form a density estimate by simply differentiating the empirical CDF, since this function contains discontinuities at thesample locations X i . Rather, we need to estimate the probability that a random variable willfall within a particular interval of the real axis. In this section, we will describe a common method known as the histogram .

The histogram

Our goal is to estimate an arbitrary probability density function, f X ( x ) , within a finite region of the x -axis. We will do this by partitioning the region into L equally spaced subintervals, or “bins”,and forming an approximation for f X ( x ) within each bin. Let our region of support start at the value x 0 , and end at x L . Our L subintervals of this region will be [ x 0 , x 1 ] , ( x 1 , x 2 ] , ..., ( x L - 1 , x L ] . To simplify our notation we will define b i n ( k ) to represent the interval ( x k - 1 , x k ] , k = 1 , 2 , , L , and define the quantity Δ to be the length of each subinterval.

b i n ( k ) = ( x k - 1 , x k ] k = 1 , 2 , , L Δ = x L - x 0 L

We will also define f ˜ ( k ) to be the probability that X falls into b i n ( k ) .

f ˜ ( k ) = P ( X b i n ( k ) ) = x k - 1 x k f X ( x ) d x
f X ( x ) Δ for x b i n ( k )

The approximation in [link] only holds for an appropriately small bin width Δ .

Next we introduce the concept of a histogram of a collection of i.i.d. random variables { X 1 , X 2 , , X N } . Let us start by defining a function that will indicate whether ornot the random variable X n falls within b i n ( k ) .

I n ( k ) = 1 , if X n b i n ( k ) 0 , if X n b i n ( k )

The histogram of X n at b i n ( k ) , denoted as H ( k ) , is simply the number of random variables that fall within b i n ( k ) . This can be written as

H ( k ) = n = 1 N I n ( k ) .

We can show that the normalized histogram, H ( k ) / N , is an unbiased estimate of the probability of X falling in b i n ( k ) . Let us compute the expected value of the normalized histogram.

E H ( k ) N = 1 N n = 1 N E [ I n ( k ) ] = 1 N n = 1 N { 1 · P ( X n b i n ( k ) ) + 0 · P ( X n b i n ( k ) ) } = f ˜ ( k )

The last equality results from the definition of f ˜ ( k ) , and from the assumption that the X n 's have the same distribution. A similar argument may be used to show that the variance of H ( k ) is given by

V a r H ( k ) N = 1 N f ˜ ( k ) ( 1 - f ˜ ( k ) ) .

Therefore, as N grows large, the bin probabilities f ˜ ( k ) can be approximated by the normalized histogram H ( k ) / N .

f ˜ ( k ) H ( k ) N

Using [link] , we may then approximate the density function f X ( x ) within b i n ( k ) by

f X ( x ) H ( k ) N Δ for x b i n ( k ) .

Notice this estimate is a staircase function of x which is constant over each interval b i n ( k ) . It can also easily be verified that this density estimate integrates to 1.

Exercise

Let U be a uniformly distributed random variable on the interval [0,1]with the following cumulative probability distribution, F U ( u ) :

F U ( u ) = 0 , if u < 0 u , if 0 u 1 1 , if u > 1

We can calculate the cumulative probability distribution for the new random variable X = U 1 3 .

F X ( x ) = P ( X x ) = P ( U 1 3 x ) = P ( U x 3 ) = F U ( u ) u = x 3 = 0 , if x < 0 x 3 , if 0 x 1 1 , if x > 1

Plot F X ( x ) for x [ 0 , 1 ] . Also, analytically calculate the probability density f X ( x ) , and plot it for x [ 0 , 1 ] .

Using L = 20 , x 0 = 0 and x L = 1 , use Matlab to compute f ˜ ( k ) , the probability of X falling into b i n ( k ) .

Use the fact that f ˜ ( k ) = F X ( x k ) - F X ( x k - 1 ) .
Plot f ˜ ( k ) for k = 1 , , L using the stem function.

Inlab report

  1. Submit your plots of F X ( x ) , f X ( x ) and f ˜ ( k ) . Use stem to plot f ˜ ( k ) , and put all three plots on a single figure using subplot .
  2. Show (mathematically) how f X ( x ) and f ˜ ( k ) are related.

Generate 1000 samples of a random variable U that is uniformly distributed between 0 and 1 (using the rand command). Then form the random vector X by computing X = U 1 3 .

Use the Matlab function hist to plot a normalized histogram for your samples of X , using 20 bins uniformly spaced on the interval [ 0 , 1 ] .

Use the Matlab command H=hist(X,(0.5:19.5)/20) to obtain the histogram, and then normalize H .
Use the stem command to plot the normalized histogram H ( k ) / N and f ˜ ( k ) together on the same figure using subplot .

Inlab report

  1. Submit your two stem plots of H ( k ) / N and f ˜ ( k ) . How do these plots compare?
  2. Discuss the tradeoffs (advantages and the disadvantages) between selecting a very large or very small bin-width.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Purdue digital signal processing labs (ece 438). OpenStax CNX. Sep 14, 2009 Download for free at http://cnx.org/content/col10593/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Purdue digital signal processing labs (ece 438)' conversation and receive update notifications?

Ask