<< Chapter < Page Chapter >> Page >

Size of a sample

The size of a sample (often called the number of observations, usually given the symbol n) is important. The examples you have seen in this book so far have been small. Samples of only a few hundred observations, or even smaller, are sufficient for many purposes. In polling, samples that are from 1,200 to 1,500 observations are considered large enough and good enough if the survey is random and is well done. Later we will find that even much smaller sample sizes will give very good results. You will learn why when you study confidence intervals.

Be aware that many large samples are biased. For example, call-in surveys are invariably biased, because people choose to respond or not.

Critical evaluation

We need to evaluate the statistical studies we read about critically and analyze them before accepting the results of the studies. Common problems to be aware of include

  • Problems with samples: A sample must be representative of the population. A sample that is not representative of the population is biased. Biased samples that are not representative of the population give results that are inaccurate and not valid.
  • Self-selected samples: Responses only by people who choose to respond, such as call-in surveys, are often unreliable.
  • Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if possible. In some situations, having small samples is unavoidable and can still be used to draw conclusions. Examples: crash testing cars or medical testing for rare conditions
  • Undue influence:  collecting data or asking questions in a way that influences the response
  • Non-response or refusal of subject to participate:  The collected responses may no longer be representative of the population.  Often, people with strong positive or negative opinions may answer surveys, which can affect the results.
  • Causality: A relationship between two variables does not mean that one causes the other to occur. They may be related (correlated) because of their relationship through a different variable.
  • Self-funded or self-interest studies: A study performed by a person or organization in order to support their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not automatically assume that the study is good, but do not automatically assume the study is bad either. Evaluate it on its merits and the work done.
  • Misleading use of data: improperly displayed graphs, incomplete data, or lack of context
  • Confounding:  When the effects of multiple factors on a response cannot be separated.  Confounding makes it difficult or impossible to draw valid conclusions about the effect of each factor.


Gallup-Healthways Well-Being Index. http://www.well-beingindex.com/default.asp (accessed May 1, 2013).

Gallup-Healthways Well-Being Index. http://www.well-beingindex.com/methodology.asp (accessed May 1, 2013).

Gallup-Healthways Well-Being Index. http://www.gallup.com/poll/146822/gallup-healthways-index-questions.aspx (accessed May 1, 2013).

Data from http://www.bookofodds.com/Relationships-Society/Articles/A0374-How-George-Gallup-Picked-the-President

Dominic Lusinchi, “’President’ Landon and the 1936 Literary Digest Poll: Were Automobile and Telephone Owners to Blame?” Social Science History 36, no. 1: 23-54 (2012), http://ssh.dukejournals.org/content/36/1/23.abstract (accessed May 1, 2013).

“The Literary Digest Poll,” Virtual Laboratories in Probability and Statistics http://www.math.uah.edu/stat/data/LiteraryDigest.html (accessed May 1, 2013).

“Gallup Presidential Election Trial-Heat Trends, 1936–2008,” Gallup Politics http://www.gallup.com/poll/110548/gallup-presidential-election-trialheat-trends-19362004.aspx#4 (accessed May 1, 2013).

The Data and Story Library, http://lib.stat.cmu.edu/DASL/Datafiles/USCrime.html (accessed May 1, 2013).

LBCC Distance Learning (DL) program data in 2010-2011, http://de.lbcc.edu/reports/2010-11/future/highlights.html#focus (accessed May 1, 2013).

Data from San Jose Mercury News

Chapter review

Data are individual items of information that come from a population or sample. Data may be classified as qualitative, quantitative continuous, or quantitative discrete.

Because it is not practical to measure the entire population in a study, researchers use samples to represent the population. A random sample is a representative group from the population chosen by using a method that gives each individual in the population an equal chance of being included in the sample. Random sampling methods include simple random sampling, stratified sampling, cluster sampling, and systematic sampling. Convenience sampling is a nonrandom method of choosing a sample that often produces biased data.

Samples that contain different individuals result in different data. This is true even when the samples are well-chosen and representative of the population. When properly selected, larger samples model the population more closely than smaller samples. There are many different potential problems that can affect the reliability of a sample. Statistical data needs to be critically analyzed, not simply accepted.

Questions & Answers

richa Reply
if sinx°=sin@, then @ is - ?
the value of tan15°•tan20°•tan70°•tan75° -
0.037 than find sin and tan?
Jon Reply
cos24/25 then find sin and tan
Deepak Reply
Santosh Reply
At the start of a trip, the odometer on a car read 21,395. At the end of the trip, 13.5 hours later, the odometer read 22,125. Assume the scale on the odometer is in miles. What is the average speed the car traveled during this trip?
Kimberly Reply
-3 and -2
Julberte Reply
tan(?cosA)=cot(?sinA) then prove cos(A-?/4)=1/2?2
Chirag Reply
tan(pi.cosA)=cot(?sinA) then prove cos(A-?/4)=1/2?2
Chirag Reply
sin x(1+tan x)+cos x(1+cot x) = sec x +cosec
Ankit Reply
let p(x)xq
Sophie Reply
To the nearest whole number, what was the initial population in the culture?
Cheyenne Reply
do posible if one line is parallel
Fran Reply
The length is one inch more than the width, which is one inch more than the height. The volume is 268.125 cubic inches.
Vamprincess Reply
Using Earth’s time of 1 year and mean distance of 93 million miles, find the equation relating ?T??T? and ?a.?
James Reply
Need to simplify the expresin. 3/7 (x+y)-1/7 (x-1)=
Crystal Reply
. After 3 months on a diet, Lisa had lost 12% of her original weight. She lost 21 pounds. What was Lisa's original weight?
Chris Reply
what is nanomaterials​ and their applications of sensors.
Ramkumar Reply
what is nano technology
Sravani Reply
preparation of nanomaterial
Victor Reply
Yes, Nanotechnology has a very fast field of applications and their is always something new to do with it...
Himanshu Reply
can nanotechnology change the direction of the face of the world
Prasenjit Reply
At high concentrations (>0.01 M), the relation between absorptivity coefficient and absorbance is no longer linear. This is due to the electrostatic interactions between the quantum dots in close proximity. If the concentration of the solution is high, another effect that is seen is the scattering of light from the large number of quantum dots. This assumption only works at low concentrations of the analyte. Presence of stray light.
Ali Reply
the Beer law works very well for dilute solutions but fails for very high concentrations. why?
bamidele Reply
Got questions? Join the online conversation and get instant answers!
QuizOver.com Reply

Get the best Algebra and trigonometry course in your pocket!

Source:  OpenStax, Business statistics -- bsta 200 -- humber college -- version 2016reva -- draft 2016-04-04. OpenStax CNX. Apr 05, 2016 Download for free at http://legacy.cnx.org/content/col11969/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Business statistics -- bsta 200 -- humber college -- version 2016reva -- draft 2016-04-04' conversation and receive update notifications?