<< Chapter < Page Chapter >> Page >

The density function

If the probability mass in the induced distribution is spread smoothly along the real line, with no point mass concentrations, there is a probability density function f X which satisfies

P ( X M ) = P X ( M ) = M f X ( t ) d t (area under the graph of f X over M )

At each t , f X ( t ) is the mass per unit length in the probability distribution. The density function has three characteristic properties:

(f1) f X 0 (f2) R f X = 1 (f3) F X ( t ) = - t f X

A random variable (or distribution) which has a density is called absolutely continuous . This term comes from measure theory. We often simply abbreviate as continuous distribution.

    Remarks

  1. There is a technical mathematical description of the condition “spread smoothly with no point mass concentrations.” And strictly speaking the integrals areLebesgue integrals rather than the ordinary Riemann kind. But for practical cases, the two agree, so that we are free to use ordinary integration techniques.
  2. By the fundamental theorem of calculus
    f X ( t ) = F X ' ( t ) at every point of continuity of f X
  3. Any integrable, nonnegative function f with f = 1 determines a distribution function F , which in turn determines a probability distribution. If f 1 , multiplication by the appropriate positive constant gives a suitable f . An argument based on the Quantile Function shows the existence of a randomvariable with that distribution.
  4. In the literature on probability, it is customary to omit the indicationof the region of integration when integrating over the whole line. Thus
    g ( t ) f X ( t ) d t = R g ( t ) f X ( t ) d t
    The first expression is not an indefinite integral. In many situations, f X will be zero outside an interval. Thus, the integrand effectively determines the region ofintegration.
A graph of the Weibull(alpha,lambda) density for alpha = 2. The x-axis show the range of t from 0-3, while the y-axis shows the Density ranging from 0-1.8. On the graph there are three distributions plotted. The first distribution has a rapid rise and peaks at a density of about 1.7 and a t value of about 0.4. this line is labeled lambda=4. The second line peaks at a density of a little less than 0.9 and a t value of  about 0.75. This distribution is labeled lambda = 1. The third line peaks at a density just above 0.4 and a t value of about 1.5. This distribution is labeled lambda = 0.25 A graph of the Weibull(alpha,lambda) density for alpha = 2. The x-axis show the range of t from 0-3, while the y-axis shows the Density ranging from 0-1.8. On the graph there are three distributions plotted. The first distribution has a rapid rise and peaks at a density of about 1.7 and a t value of about 0.4. this line is labeled lambda=4. The second line peaks at a density of a little less than 0.9 and a t value of  about 0.75. This distribution is labeled lambda = 1. The third line peaks at a density just above 0.4 and a t value of about 1.5. This distribution is labeled lambda = 0.25
The Weibull density for α = 2 , λ = 0 . 25 , 1 , 4 .
A graph of the Weibull(alpha,lambda) density for alpha = 10. The x-axis show the range of t from 0-3, while the y-axis shows the Density ranging from 0-8. On the graph there are three distributions plotted. The first distribution has a rapid rise and peaks at a density of about 7.5 and a t value of about 0.5. this line is labeled lambda = 1000. The second line peaks at a density of a little more than 3.5 and a t value of  about 1. This distribution is labeled lambda = 1. The third line peaks at a density just less than 2 and a t value of about 2. This distribution is labeled lambda = 0.001 A graph of the Weibull(alpha,lambda) density for alpha = 10. The x-axis show the range of t from 0-3, while the y-axis shows the Density ranging from 0-8. On the graph there are three distributions plotted. The first distribution has a rapid rise and peaks at a density of about 7.5 and a t value of about 0.5. this line is labeled lambda = 1000. The second line peaks at a density of a little more than 3.5 and a t value of  about 1. This distribution is labeled lambda = 1. The third line peaks at a density just less than 2 and a t value of about 2. This distribution is labeled lambda = 0.001
The Weibull density for α = 10 , λ = 0 . 001 , 1 , 1000 .

Some common absolutely continuous distributions

  1. Uniform ( a , b ) .
    Mass is spread uniformly on the interval [ a , b ] . It is immaterial whether or not the end points are included, since probability associated witheach individual point is zero. The probability of any subinterval is proportional to the length of the subinterval. The probability of being in any two subintervals of the samelength is the same. This distribution is used to model situations in which it is known that X takes on values in [ a , b ] but is equally likely to be in any subinterval of a given length. The density must be constant over the interval (zero outside), and the distributionfunction increases linearly with t in the interval. Thus,
    f X ( t ) = 1 b - a a < t < b (zero outside the interval)
    The graph of F X rises linearly, with slope 1 / ( b - a ) from zero at t = a to one at t = b .
  2. Symmetric triangular ( - a , a ) . f X ( t ) = ( a + t ) / a 2 - a t < 0 ( a - t ) / a 2 0 t a
    This distribution is used frequently in instructional numerical examples because probabilities can be obtained geometrically. It can be shifted, with a shift of the graph, to differentsets of values. It appears naturally (in shifted form) as the distribution for the sum or difference of two independent random variables uniformly distributed on intervals of the samelength. This fact is established with the use of the moment generating function (see Transform Methods).More generally, the density may have a triangular graph which is not symmetric.

    Use of a triangular distribution

    Suppose X symmetric triangular ( 100 , 300 ) . Determine P ( 120 < X 250 ) .

    Remark . Note that in the continuous case, it is immaterial whether the end point of the intervals are included or not.

    Solution

    To get the area under the triangle between 120 and 250, we take one minus the area of the right triangles between 100 and 120 and between 250 and 300. Using the fact thatareas of similar triangles are proportional to the square of any side, we have

    P = 1 - 1 2 ( ( 20 / 100 ) 2 + ( 50 / 100 ) 2 ) = 0 . 855
    Got questions? Get instant answers now!
  3. Exponential ( λ ) f X ( t ) = λ e - λ t t 0 (zero elsewhere).
    Integration shows F X ( t ) = 1 - e - λ t t 0 (zero elsewhere). We note that P ( X > t ) = 1 - F X ( t ) = e - λ t t 0 . This leads to an extremely important property of the exponential distribution. Since X > t + h , h > 0 implies X > t , we have
    P ( X > t + h | X > t ) = P ( X > t + h ) / P ( X > t ) = e - λ ( t + h ) / e - λ t = e - λ h = P ( X > h )
    Because of this property, the exponential distribution is often used in reliability problems. Suppose X represents the time to failure (i.e., the life duration) of a device put into service at t = 0 . If the distribution is exponential, this property says that if the device survives to time t (i.e., X > t ) then the (conditional) probability it will survive h more units of time is the same as the original probability of surviving for h units of time. Many devices have the property that they do not wear out. Failure is dueto some stress of external origin. Many solid state electronic devices behave essentially in this way, once initial “burn in” tests have removed defective units.Use of Cauchy's equation (Appendix B) shows that the exponential distribution is the only continuous distribution with this property.
  4. Gamma distribution ( α , λ ) f X ( t ) = λ α t α - 1 e - λ t Γ ( α ) t 0 (zero elsewhere)
    We have an m-function gammadbn to determine values of the distribution function for X gamma ( α , λ ) . Use of moment generating functions shows that for α = n , a random variable X gamma ( n , λ ) has the same distribution as the sum of n independent random variables, each exponential ( λ ) . A relation to the Poisson distribution is described in Sec 7.5.

    An arrival problem

    On a Saturday night, the times (in hours) between arrivals in a hospital emergency unit may be represented by a random quantity which is exponential ( λ = 3 ) . As we show in the chapter Mathematical Expectation , this means that the average interarrival time is 1/3 hour or 20 minutes. What is theprobability of ten or more arrivals in four hours? In six hours?

    Solution

    The time for ten arrivals is the sum of ten interarrival times. If we suppose these are independent, as is usually the case, then the time for ten arrivals isgamma ( 10 , 3 ) .

    >> p = gammadbn(10,3,[4 6])p  =  0.7576    0.9846
    Got questions? Get instant answers now!
  5. Normal, or Gaussian ( μ , σ 2 ) f X ( t ) = 1 σ 2 π exp - 1 2 t - μ σ 2 t
    We generally indicate that a random variable X has the normal or gaussian distribution by writing X N ( μ , σ 2 ) , putting in the actual values for the parameters. The gaussian distribution plays a central role in many aspects of applied probability theory, particularlyin the area of statistics. Much of its importance comes from the central limit theorem (CLT), which is a term applied to a number of theorems in analysis. Essentially, the CLT shows that thedistribution for the sum of a sufficiently large number of independent random variables has approximately the gaussian distribution. Thus, the gaussian distribution appears naturallyin such topics as theory of errors or theory of noise, where the quantity observed is an additive combinationof a large number of essentially independent quantities. Examination of the expression shows that the graph for f X ( t ) is symmetric about its maximum at t = μ . The greater the parameter σ 2 , the smaller the maximum value and the more slowly the curve decreases with distance from μ . Thus parameter μ locates the center of the mass distribution and σ 2 is a measure of the spread of mass about μ . The parameter μ is called the mean value and σ 2 is the variance . The parameter σ , the positive square root of the variance, is called the standard deviation . While we have an explicit formula for the density function, it is known that the distribution function, as the integral of the density function, cannot be expressedin terms of elementary functions. The usual procedure is to use tables obtained bynumerical integration.
    Since there are two parameters, this raises the question whether a separate table is needed for each pair of parameters. It is a remarkable fact that this is not the case.We need only have a table of the distribution function for X N ( 0 , 1 ) . This is refered to as the standardized normal distribution. We use φ and Φ for the standardized normal density and distribution functions, respectively.
    Standardized normal φ ( t ) = 1 2 π e - t 2 / 2 so that the distribution function is Φ ( t ) = - t φ ( u ) d u .
    The graph of the density function is the well known bell shaped curve, symmetrical about the origin (see [link] ). The symmetry about the origin contributes to its usefulness.
    P ( X t ) = Φ ( t ) = area under the curve to the left of t

    Note that the area to the left of t = - 1 . 5 is the same as the area to the right of t = 1 . 5 , so that Φ ( - 2 ) = 1 - Φ ( 2 ) . The same is true for any t , so that we have
    Φ ( - t ) = 1 - Φ ( t ) t

    This indicates that we need only a table of values of Φ ( t ) for t > 0 to be able to determine Φ ( t ) for any t . We may use the symmetry for any case. Note that Φ ( 0 ) = 1 / 2 ,
    A graph of the density function for the standardized normal distribution. The plotted distributions rises and falls at an equal rate. The distribution peaks at a density of 0.4 and a t value of 0. A graph of the density function for the standardized normal distribution. The plotted distributions rises and falls at an equal rate. The distribution peaks at a density of 0.4 and a t value of 0.
    The standardized normal distribution.

    Standardized normal calculations

    Suppose X N ( 0 , 1 ) . Determine P ( - 1 X 2 ) and P ( | X | > 1 ) .

    Solution

    1. P ( - 1 X 2 ) = Φ ( 2 ) - Φ ( - 1 ) = Φ ( 2 ) - [ 1 - Φ ( 1 ) ] = Φ ( 2 ) + Φ ( 1 ) - 1

    2. P ( | X | > 1 ) = P ( X > 1 ) + P ( X < - 1 ) = 1 - Φ ( 1 ) + Φ ( - 1 ) = 2 [ 1 - Φ ( 1 ) ]

    From a table of standardized normal distribution function (see Appendix D ), we find

    Φ ( 2 ) = 0 . 9772 and Φ ( 1 ) = 0 . 8413 which gives P ( - 1 X 2 ) = 0 . 8185 and P ( | X | > 1 ) = 0 . 3174

    Got questions? Get instant answers now!
    General gaussian distribution
    For X N ( μ , σ 2 ) , the density maintains the bell shape, but is shifted with different spread and height. [link] shows the distribution function and density function for X N ( 2 , 0 . 1 ) . The density is centered about t = 2 . It has height 1.2616 as compared with 0.3989 for the standardized normal density. Inspection shows that the graph is narrower than that for thestandardized normal. The distribution function reaches 0.5 at the mean value 2.
    Density and Distribution Function for X normal(2,0.1). The x-axis represents the range of t values 1-3, while the y-axis show the range of values for f(t) or F(t) ranging from 0-1.4. There are two distributions plotted. The first rises and falls at an equal rate, with its peak at (1,1.3). It is labeled density. The other function rises gradually and plateaus at (1.8,1). It is labeled Distribution Function. Density and Distribution Function for X normal(2,0.1). The x-axis represents the range of t values 1-3, while the y-axis show the range of values for f(t) or F(t) ranging from 0-1.4. There are two distributions plotted. The first rises and falls at an equal rate, with its peak at (1,1.3). It is labeled density. The other function rises gradually and plateaus at (1.8,1). It is labeled Distribution Function.
    The normal density and distribution functions for X N ( 2 , 0 . 1 ) .

    A change of variables in the integral shows that the table for standardized normal distribution function can be used for any case.

    F X ( t ) = 1 σ 2 π - t exp - 1 2 x - μ σ 2 d x = - t φ x - μ σ 1 σ d x

    Make the change of variable and corresponding formal changes

    u = x - μ σ d u = 1 σ d x x = t u = t - μ σ

    to get

    F X ( t ) = - ( t - μ ) / σ φ ( u ) d u = Φ t - μ σ

    General gaussian calculation

    Suppose X N ( 3 , 16 ) (i.e., μ = 3 and σ 2 = 16 ). Determine P ( - 1 X 11 ) and P ( | X - 3 | > 4 ) .

    SOLUTION

    1. F X ( 11 ) - F X ( - 1 ) = Φ 11 - 3 4 - Φ - 1 - 3 4 = Φ ( 2 ) - Φ ( - 1 ) = 0 . 8185
    2. P ( X - 3 < - 4 ) + P ( X - 3 > 4 ) = F X ( - 1 ) + [ 1 - F X ( 7 ) ] = Φ ( - 1 ) + 1 - Φ ( 1 ) = 0 . 3174

    In each case the problem reduces to that in [link]

    Got questions? Get instant answers now!
    We have m-functions gaussian and gaussdensity to calculate values of the distribution and density function for any reasonable value of the parameters.
    The following are solutions of [link] and [link] , using the m-function gaussian.

    [link] And [link] (continued)

    >>P1 = gaussian(0,1,2) - gaussian(0,1,-1) P1 = 0.8186>>P2 = 2*(1 - gaussian(0,1,1)) P2 = 0.3173>>P1 = gaussian(3,16,11) - gaussian(3,16,-1) P2 = 0.8186>>P2 = gaussian(3,16,-1)) + 1 - (gaussian(3,16,7) P2 = 0.3173

    The differences in these results and those above (which used tables) are due to the roundoff to four places in the tables.

    Got questions? Get instant answers now!
  6. Beta ( r , s ) , r > 0 , s > 0 . f X ( t ) = Γ ( r + s ) Γ ( r ) Γ ( s ) t r - 1 ( 1 - t ) s - 1 0 < t < 1
    Analysis is based on the integrals
    0 1 u r - 1 ( 1 - u ) s - 1 d u = Γ ( r ) Γ ( s ) Γ ( r + s ) with Γ ( t + 1 ) = t Γ ( t )
    [link] and [link] show graphs of the densities for various values of r , s . The usefulness comes in approximating densities on the unit interval. By using scaling andshifting, these can be extended to other intervals. The special case r = s = 1 gives the uniform distribution on the unit interval. The Beta distribution is quite usefulin developing the Bayesian statistics for the problem of sampling to determine a population proportion.If r , s are integers, the density function is a polynomial. For the general case we have two m-functions, beta and betadbn to perform the calculatons.
    A graph displaying Beta(r,s) density--r=2. The x-axis represents the range of t values 0-1, while the y-axis show the range of values for density ranging from 0-4.5. There are three distributions plotted. The first rises at a rapid rate, with its peak at (0.1,4.25). It is labeled s=10. The next function looks like a half circle with its peak at (0.5,1.5). It is labeled s=2. The final distribution is a straight line beginning at (0,0) and ending at (1,2). It is labeled s=1. A graph displaying Beta(r,s) density--r=2. The x-axis represents the range of t values 0-1, while the y-axis show the range of values for density ranging from 0-4.5. There are three distributions plotted. The first rises at a rapid rate, with its peak at (0.1,4.25). It is labeled s=10. The next function looks like a half circle with its peak at (0.5,1.5). It is labeled s=2. The final distribution is a straight line beginning at (0,0) and ending at (1,2). It is labeled s=1.
    The Beta(r,s) density for r = 2 , s = 1 , 2 , 10 .
    A graph displaying Beta(r,s) density--r=5. The x-axis represents the range of t values 0-1, while the y-axis show the range of values for density 0-3.5. There are three distributions plotted. The first rises at a rapid rate, with its peak at (0.3,3.25). It is labeled s=10. The next function rises and falls at an equal rate with its peak at (0.5,2.5). It is labeled s=5. The final distribution rises gradually and peaks at (0.8,2.5) and then falls rapidly. It is labeled s=2. A graph displaying Beta(r,s) density--r=5. The x-axis represents the range of t values 0-1, while the y-axis show the range of values for density 0-3.5. There are three distributions plotted. The first rises at a rapid rate, with its peak at (0.3,3.25). It is labeled s=10. The next function rises and falls at an equal rate with its peak at (0.5,2.5). It is labeled s=5. The final distribution rises gradually and peaks at (0.8,2.5) and then falls rapidly. It is labeled s=2.
    The Beta(r,s) density for r = 5 , s = 2 , 5 , 10 .
  7. Weibull ( α , λ , ν ) F X ( t ) = 1 - e - λ ( t - ν ) α α > 0 , λ > 0 , ν 0 , t ν
    The parameter ν is a shift parameter. Usually we assume ν = 0 . Examination shows that for α = 1 the distribution is exponential ( λ ) . The parameter α provides a distortion of the time scale for the exponential distribution. [link] and [link] show graphs of the Weibull density for some representative values of α and λ ( ν = 0 ). The distribution is used in reliability theory. We do not make much use of it. However, we have m-functions weibull (density) and weibulld (distribution function) for shift parameter ν = 0 only. The shift can be obtained by subtracting a constant from the t values.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Applied probability. OpenStax CNX. Aug 31, 2009 Download for free at http://cnx.org/content/col10708/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Applied probability' conversation and receive update notifications?

Ask