Data analysis life cycle

Data science life cycle from [*R for Data Science*](https://r4ds.had.co.nz/) with modifications from *The Art of Statistics: How to Learn from Data*

Data science life cycle from R for Data Science with modifications from The Art of Statistics: How to Learn from Data

Examining data visualizations

Discuss the following questions for each visualization:

Figure 1

Figure from the New York Times ["What's Going on with this Graph?"](https://www.nytimes.com/2020/04/02/learning/whats-going-on-in-this-graph-bus-ridership-in-metropolitan-areas.html) series.

Figure from the New York Times “What’s Going on with this Graph?” series.

Figure 2

Figure originally seen on  [Twitter](https://twitter.com/reina_sabah/status/1291509085855260672).

Figure originally seen on Twitter.

Clone a repo + start a new project

Configure git

Before we start the exercise, we need to configure your git so that RStudio can communicate with GitHub. This requires two pieces of information: your email address and your name.

Type the following lines of code in the Console in RStudio filling in your GitHub username and email address associated with your GitHub account.

library(usethis)
use_git_config(user.name= "github username", user.email="your email")

RStudio and GitHub can now communicate with each other and you are ready to do the exercise!

Practice with ggplot

Step 1

Modify the following plot to change the color of all points to "pink". Knit the document to observe the changes.

ggplot(data = starwars, 
       mapping = aes(x = height, y = mass, color = gender, size = birth_year)) +
  geom_point(color = "#30509C") +
  labs(title = "_____" , size = "_____", x = "_____", y = "_____")
## Warning: Removed 51 rows containing missing values (geom_point).

Step 2

Add labels for title, x and y axes, and size of points. Knit again.

Step 3

Fill in the code below to make a histogram of a numerical variable of your choice. Once you have modified the code, remove the option eval = FALSE from the code chunk header. Knit again to see the updates.

See the ggplot2 reference page for help to create histograms.

ggplot(data = starwars, 
       mapping = aes(x = _____)) +
  ___________ +
  labs(title = "_____" , x = "_____", y = "_____")

Step 4: Stretch goal!

  1. Modify the histogram by adding color = "blue" inside of the geom_XX function. (Feel free to use a different color besides blue!) Knit to see the updated histogram.

  2. Now modify the histogram by adding fill = "pink" inside of the geom_XX function. (Feel free to use a different color besides pink!) Knit to see the updated histogram.

  3. What is the difference between color and fill?

Knit, commit, and push

  1. If you made any changes since the last knit, knit again to get the final version of the AE.

  2. Check the box next to each document in the Git tab (this is called “staging” the changes). Commit the changes you made using an simple and informative message.

  3. Use the green arrow to push your changes to your repo on GitHub.

  4. Check your repo on GitHub and see the updated files.


This exercise was modified from “Starwars + Data visualization” in Data Science in Box.