Final Project

Multiple Linear Regression

Your task for this project is to apply the tools we’ve learned (will learn) to answer questions about the relationship between multiple explanatory variables and one response variable.

The report should include:

• Literature Review (completed previously)

• Introduction — briefly refresh the reader’s mind as to the variables of interest

• The regression model you’ll be fitting. Use at least 4 explanatory variables. If a factor

variable has more than one level, it will need to be made into g-1 indicator variables,

where g is the number of levels. Note, R will do this automatically if the variable is

non-numeric. If the variable is numeric, you may have to use the code as.factor().

• Before running the analysis, create a pairs plot on the explanatory and response vari-

ables. Comment on any interesting relationships you see. Are any of the explanatory

variables highly correlated? Is there any reason to fit a quadratic term? Or do a log

transformation?

• Comment on whether or not you are using interaction variables. (You can use in-

teractions with non-factor variables, it’s just that they are slightly more difficult to

interpret.) If you think interaction variables are necessary, comment on why the slope

of the equation would change based on the level of one of the other variables.

• Fit your model. Include quadratic, log, or interaction terms as you see fit. Interpret your beta coefficients to the best of your ability. Are your coefficients significant?

• Use the F test to compare two nested models. (That means that the larger model

contains all the variables in the smaller model.) Make the smaller model have at least

2 fewer variables than the larger model. Comment on the soundness of the model.

Which one would you report to your boss if you could only give her one model?

• Report the R2 and Adjusted-R2 values. Comment on the fit of the model as determined

by how much variability is explained. Is this a guarantee that the model will accurately

describe the population? Why or Why not?

• A complete analysis of the residuals. Use plots to get an idea

of which points may be contributing to the fit. Consider re-fitting a model with and

without certain data that have both high leverage and large residuals. Do not include

every plot, but consider including plots that give the reader an idea of your analysis.

• Try to give an interpretation of the model that makes sense. Why do you think some

variables stayed significant and others dropped out? Are any of your variables highly

correlated (could one have taken the place of another?)

• Give CIs for a mean predicted value and a future predicted value for at least one

combination of X’s (from your final generalized linear model).

• Summarize your report.

• As an aside / follow-up. Count the number of total hypothesis tests that you ran

(including all the ones you didn’t include in the report). What is that number? Call

the number m. If you multiplied every single p-value in this report by that number,

would any of your conclusions / analyses have fundamentally changed? Which ones?

How? [The answer to this bullet point can be very short and not integrated into the

rest of the write-up.]

Format

• The assignment should be turned in using Word or a pdf file.

• Only print code that is interesting to the reader.

• Do not print lists of data.

• If you are commenting on the significance of a variable in your text, you should report the p-value.

• Residuals determine model appropriateness, not p-values or R2.

• Summarize any output for R; do not include technical calculations. Use complete sentences.

• I’ve asked you to do a series of things above, make sure the sections flow nicely into one another. This is a report on the data not a homework assignment. (Try to tell a good story.) You do not need to answer the questions above in any order, and certainly not with bullet points or enumeration.

• Do not be tempted to turn in everything you do. Only turn in the interesting parts of the analysis. One of the hardest parts of being a consultant is figuring out what to tell the researchers.

• Remember to label all graphs.