<< Chapter < Page | Chapter >> Page > |
A more complex interaction between a dummy variable and the dependent variable can also be estimated. It may be that the dummy variable has more than a simple shift effect on the dependent variable, but also interacts with one or more of the other continuous independent variables. While not tested in the example above, it could be hypothesized that the impact of gender on salary was not a one-time shift, but impacted the value of additional years of experience on salary also. That is, female school teacher’s salaries were discounted at the start, and further did not grow at the same rate as male school teachers. This would show up as a different slope for the relationship between total years of experience for males than for females. If this is so then females school teachers would not just start behind their male colleagues (as measured by the shift in the estimated regression line), but would fall further and further behind as time and experienced increased.
The graph below shows how this hypothesis can be tested with the use of dummy variables and an interaction variable.
The estimating equation shows how the slope of X _{1} , the continuous random variable experience, contains two parts, b _{1} and b _{3} . This occurs because of the new variable X _{1} X _{2} , called the interaction variable, was created to allow for an effect on the slope of X _{1} from changes in X _{2} , the binary dummy variable. Note that when the dummy variable, X _{2} = 0 the interaction variable has a value of 0, but when X _{2} = 1 the interaction variable has a value of X _{1} . The coefficient b _{3} is an estimate of the difference in the coefficient of X _{1} when X _{2} = 1 compared to when X _{2} = 0. In the example of teacher’s salaries, if there is a premium paid to male teachers that affects the rate of increase in salaries from experience, then the rate at which male teachers’ salaries rises would be b _{1} + b _{3} and the rate at which female teachers’ salaries rise would be simply b _{1} . This hypothesis can be tested with the hypothesis:
This is a t-test using the test statistic for the parameter β _{3} . If we cannot accept the null hypothesis that β _{3} =0 we conclude there is a difference between the rate of increase for the group for whom the value of the binary variable is set to 1, males in this example. This estimating equation can be combined with our earlier one that tested only a parallel shift in the estimated line. The earnings/experience functions in [link] are drawn for this case with a shift in the earnings function and a difference in the slope of the function with respect to total years of experience.
A random sample of 11 statistics students produced the following data, where x is the third exam score out of 80, and y is the final exam score out of 200. Can you predict the final exam score of a randomly selected student if you know the third exam score?
x (third exam score) | y (final exam score) |
---|---|
65 | 175 |
67 | 133 |
71 | 185 |
71 | 163 |
66 | 126 |
75 | 198 |
67 | 153 |
70 | 163 |
71 | 159 |
69 | 151 |
69 | 159 |
It is hoped that this discussion of regression analysis has demonstrated the tremendous potential value it has as a tool for testing models and helping to better understand the world around us. The regression model has its limitations, especially the requirement that the underlying relationship be approximately linear. To the extent that the true relationship is nonlinear it may be approximated with a linear relationship or nonlinear forms of transformations that can be estimated with linear techniques. Double logarithmic transformation of the data will provide an easy way to test this particular shape of the relationship. A reasonably good quadratic form (the shape of the total cost curve from Microeconomics Principles) can be generated by the equation:
where the values of X are simply squared and put into the equation as a separate variable.
There is much more in the way of econometric "tricks" that can bypass some of the more troublesome assumptions of the general regression model. This statistical technique is so valuable that further study would provide any student significant, statistically significant, dividends.
Notification Switch
Would you like to follow the 'Introductory statistics' conversation and receive update notifications?