HW 03: Multiple linear regression

due Sun, Nov 8 at 11:59p

Clone assignment repo + start new project

A private repository has already been created for you.

Data

In 1975 the Biomedical Department of the Highway Safety Research Insitute (HSRI) under the sponsorship of the Consumer Product Safety Commission completed an anthropometric survey of infants and children.

The goal of the study was to provide child measurement data for design, hazard assessment, and standards for products including car seats, restraints, children’s furniture, playground equipment, toys, bicycles, and more. See here and here for more information (“Anthropometry of Infants, Children, and Youths to Age 18 for Product Safety Design”, 1977).

The file kids.csv contains a subset of these data for 1000 children between the ages of 8 and 16 and includes the following variables:

Exercises

  1. Mutate the sex variable to be a factor with levels “male” and “female” and set “male” to be the reference level.

  2. Fit a linear model with height as the response and sex and age as predictors. Display the model output in tidy format and write out the linear model.

  3. Now let’s interpret the terms in the model.

    • Interpret the age and sex in the context of the problem.
    • Does the intercept have a useful interpretation? If so, interpret the intercept in the context of the problem. Otherwise, briefly explain why not.
  4. Construct a 90% confidence interval for \(\beta_{\text{age}}\) and interpret the interval in the context of the problem.

  5. Report the value of \(R^2\) and interpret it in the context of the problem.

  6. What is the predicted height for a fourteen-and-a-half-year-old male?

  7. Create an effective, well-labeled scatterplot of height versus age with the points colored by sex. Include a visualization of your linear model from Exercise #2 on this scatterplot.

  8. Fit a linear model with height as the response and sex, age, and their interaction as predictors. Display the model output in tidy format and write out the linear model.

  9. Write out the model equations for males and females.

  10. Is there evidence that the relationship between age and height depends on sex? Answer comprehensively using a formal hypothesis test.

  11. Create an effective well-labeled scatterplot of height versus age with points colored by sex. Visualize your linear model from Exercise #10 on this scatterplot.

  12. Create appropriate and effective diagnostic plots to check the conditions for multiple linear regression (same conditions as for simple linear regression). For each plot, provide a one or two sentence comment describing which condition(s) you are checking, what you observe, and whether or not the condition is met.

  13. Briefly describe any potential limitations of this model beyond your comments from Exercise #12.

Submission

Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Please only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Make sure to associate the “Overall” section with the first page.