library(tidyverse)

Announcements

Questions?

Identify variable type (Zoom poll)

The way data is displayed matters

Source: [Twitter](https://twitter.com/CoralTKrueger/status/1296425438403796992)

Source: Twitter

Clone a repo + start a new project

Configure git

Before we start the exercise, we need to configure your git so that RStudio can communicate with GitHub. This requires two pieces of information: your email address and your name.

Type the following lines of code in the Console in RStudio filling in your GitHub username and email address associated with your GitHub account.

library(usethis)
use_git_config(user.name= "github username", user.email="your email")

RStudio and GitHub can now communicate with each other and you are ready to do the exercise!

Practice with ggplot

The data contains information about Airbnb listings in Edinburgh Scotland. The data originally come from Kaggle, and it has been modified for this exercise.

Use the code below to load the data from the .csv file.

edibnb <- read_csv("data/edibnb.csv")

Part 0

The dataset you’ll be using is called edibnb. Run View(edibnb) in the console to view the data in the data viewer. What does each row in the dataset represent?

Part 1

The edibnb data set set has 13245 observations (rows).

How many columns (variables) does the dataset have? Instead of hard coding the number in your answer, use the function ncol() to write your answer in inline code. Hint: Use the statement above as a guide.

Knit to see the updates.

Part 2

Fill in the code below to create a histogram to display the distribution of price. Once you have modified the code, remove the option eval = FALSE from the code chunk header. Knit again to see the updates.

ggplot(data = ___, mapping = aes(x = ___)) +
  geom_histogram()

Part 3

Now let’s look at the distribution of price for each neighborhood. To do so, we will make a faceted histogram where each facet represents a neighborhood and displays the distribution of price for that neighborhood.

Fill in the code below to create the faceted histogram with informative labels. Once you have modified the code, remove the option eval = FALSE from the code chunk header. Knit again to see the updates.

Hint: Run names(edibnb) in the console to get a list of variable names. Note how the variable for neighborhood is spelled in the data set.

ggplot(data = ___, mapping = aes(x = ___)) +
  geom_histogram() +
  facet_wrap(~___) +
  labs(x = "______", 
      title = "_______", 
      subtitle = "Faceted by ______")

Part 4

How would you describe the distribution of price in general? How do neighborhoods compare to each other in terms of price?

Knit, commit, and push

  1. If you made any changes since the last knit, knit again to get the final version of the AE.

  2. Check the box next to each document in the Git tab (this is called “staging” the changes). Commit the changes you made using an simple and informative message.

  3. Use the green arrow to push your changes to your repo on GitHub.

  4. Check your repo on GitHub and see the updated files.

Resources




This exercise was modified from Hotels in Edinburgh in Data Science in Box.