# 12.3 The regression equation  (Page 2/8)

 Page 2 / 8

In the diagram in [link] , y 0 ŷ 0 = ε 0 is the residual for the point shown. Here the point lies above the line and the residual is positive.

ε = the Greek letter epsilon

For each data point, you can calculate the residuals or errors, y i - ŷ i = ε i for i = 1, 2, 3, ..., 11.

Each | ε | is a vertical distance.

For the example about the third exam scores and the final exam scores for the 11 statistics students, there are 11 data points. Therefore, there are 11 ε values. If yousquare each ε and add, you get

This is called the Sum of Squared Errors (SSE) .

Using calculus, you can determine the values of a and b that make the SSE a minimum. When you make the SSE a minimum, you have determined the points that are on the line of best fit. It turns out thatthe line of best fit has the equation:

$\stackrel{^}{y}=a+bx$

where $a=\overline{y}-b\overline{x}$ and $b=\frac{\Sigma \left(x-\overline{x}\right)\left(y-\overline{y}\right)}{\Sigma {\left(x-\overline{x}\right)}^{2}}$ .

The sample means of the x values and the y values are $\overline{x}$ and $\overline{y}$ , respectively. The best fit line always passes through the point $\left(\overline{x},\overline{y}\right)$ .

The slope b can be written as $b=r\left(\frac{{s}_{y}}{{s}_{x}}\right)$ where s y = the standard deviation of the y values and s x = the standard deviation of the x values. r is the correlation coefficient, which is discussed in the next section.

## Least squares criteria for best fit

The process of fitting the best-fit line is called linear regression . The idea behind finding the best-fit line is based on the assumption that the data are scattered about a straight line. The criteria for the best fit line is that the sum of the squared errors (SSE) is minimized, that is, made as small as possible. Any other line you might choose would have a higher SSE than the best fit line. This best fit line is called the least-squares regression line .

## Note

Computer spreadsheets, statistical software, and many calculators can quickly calculate the best-fit line and create the graphs. The calculations tend to be tedious if done by hand. Instructions to use the TI-83, TI-83+, and TI-84+ calculators to find the best-fit line and create a scatterplot are shown at the end of this section.

## Third exam vs final exam example:

The graph of the line of best fit for the third-exam/final-exam example is as follows:

The least squares regression line (best-fit line) for the third-exam/final-exam example has the equation:

$\stackrel{^}{y}=-173.51+4.83x$

## Reminder

Remember, it is always important to plot a scatter diagram first. If the scatter plot indicates that there is a linear relationship between the variables, then it is reasonable to use a best fit line to make predictions for y given x within the domain of x -values in the sample data, but not necessarily for x -values outside that domain. You could use the line to predict the final exam score for a student who earned a grade of 73 on the third exam. You should NOT use the line to predict the final exam score for a student who earned a grade of 50 on the third exam, because 50 is not within the domain of the x -values in the sample data, which are between 65 and 75.

## Understanding slope

The slope of the line, b , describes how changes in the variables are related. It is important to interpret the slope of the line in the context of the situation represented by the data. You should be able to write a sentence interpreting the slope in plain English.

Write a short note on skewness
and on kurtosis too
Hiren
What is events
who is a strong man?
Can you sir plz provide all the multiple choice questions related to Index numbers.?
about probabilty i have some questions and i want the solution
What is hypothesis?
its a scientific guess
ted
A hypothesis in a scientific context, is a testable statement about the relationship between two or more variables or a proposed explanation for some observed phenomenon. In a scientific experiment or study, the hypothesis is a brief summation of the researcher's prediction of the study's findings.
Hamzah
Which may be supported or not by the outcome. Hypothesis testing is the core of the scientific method.
Hamzah
statistics means interpretation analysis and representation of numerical data
Ramzan
To check the statment or assumption about population parameter is xalled hypothesis
Ali
what is the d.f we know that how to find but basically my question is what is the d.f? any concept please
Degrees of freedom aren’t easy to explain. They come up in many different contexts in statistics—some advanced and complicated. In mathematics, they're technically defined as the dimension of the domain of a random vector.
Hamzah
d.f >> Degrees of freedom aren’t easy to explain. They come up in many different contexts in statistics—some advanced and complicated. In mathematics, they're technically defined as the dimension of the domain of a random vector.
Hamzah
But we won't get into that. Because degrees of freedom are generally not something you needto understand to perform a statistical analysis—unless you’re a research statistician, or someone studying statistical theory.
Hamzah
And yet, enquiring minds want to know. So for the adventurous and the curious, here are some examples that provide a basic gist of their meaning in statistics.
Hamzah
The Freedom to Vary First, forget about statistics. Imagine you’re a fun-loving person who loves to wear hats. You couldn't care less what a degree of freedom is. You believe that variety is the spice of life Unfortunately, you have constraints. You have only 7 hats. Yet you want to wear a different
Hamzah
hat every day of the week. On the first day, you can wear any of the 7 hats. On the second day, you can choose from the 6 remaining hats, on day 3 you can choose from 5 hats, and so on.
Hamzah
When day 6 rolls around, you still have a choice between 2 hats that you haven’t worn yet that week. But after you choose your hat for day 6, you have no choice for the hat that you wear on Day 7. You must wear the one remaining hat. You had 7-1 = 6 days of “hat” freedom—in which the hat you wore
Hamzah
That’s kind of the idea behind degrees of freedom in statistics. Degrees of freedom are often broadly defined as the number of "observations" (pieces of information) in the data that are free to vary when estimating statistical parameters.
Hamzah
binomial distribution and poisson both are used to estimate the number of successes probable against the. probable failures. the difference is only that BINOMIAL dist. is for discrete data while POISSON is used for continuous data.
Salman
What do you need to understand?
Angela
The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution
Hamzah
poisson distribution is also for discrete data set. The difference is when the probability of occurring an event is very little and the sample size is extra large then we use poisson distribution.
Neil
Neil yes you got it and very interested answer you gave
jamilu
How to know if the statement is 1 tail or 2 tail?
1 tail if greater than pr less than.2 tail if not equal.
Jojo
in such a case there is no sufficient information provided to develop an alternative hypothesis and we can decide between only two states i.e either the statement is EQUAL TO or NOT EQUAL TO under given conditions
Salman
for 1tail there must be certain criteria like the greater than or less than or some probability value that must be achieved to accept or reject the original hypothesis.
Salman
for example if we have null hypothesis Ho:u=25 Ha:u#25(not equal to 25) it would be two tail if we say Ho:u=25 Ha:u>or Ha:u<25 it would be consider as one tail I hope you will be understand #Coleen
Shabir
yes its true. now you have another problem. so share.
ibrar
what is z score
How to find z score through calculator
Esperanza
Different data sets will have different means and standard deviations, so values from one set cannot always be compared directly with those from another. The z-score standardizes normally distributed data sets, allowing for a proper comparison and a consistent definition of percentiles across data s
Hamzah
what are random number
how to compute the mean with a long method
there is a shortcut method for calculating mean long methid doesn't make any sense.
Neil
what are probability
Saif
probability mass function
Saif
probability density function
Saif
there are many definitions of probability. which one is, the ratio of favourable outcomes & total outcomes.
suhail
distribution used for modeling/(find probabilities) of discrete r.v. is called p.m.f
suhail
distribution used for modeling/(find probabilities) of continued r.v, called p.d.f
suhail
lets use short method using calculator.... store yo data n smply get your mean
Flavian
if 1 calorie =4.12 kj, what is the total kj value of this dish
summation of values of x1 x2 x3 ,,,,xn divided by total number n if it is with frequency its like this summation of values of x1f1+x2f2+x3f3+xnfk divided by summation of frequencies like f1+f2+f3+fk
Please keep in mind that it's not allowed to promote any social groups (whatsapp, facebook, etc...), exchange phone numbers, email addresses or ask for personal information on our platform.