NYC Hotel Analysis

lksfr

Here you'll find the rendered version of my Rmd file, which includes the code I used to produce the visuals in my article "A Visual Analysis of NYC's Hotel Industry" on LinkedIn.

NYC Hotel Analysis/dynamic_visuals.html

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
\n\n\n\n\n\n\n\n\n\n\n\n
\n\n\n\n

dynamic_visuals

\n

Lukas Frei

\n

10/29/2019

\n\n
\n\n\n
\n

NYC Hotel Analysis

\n
\n

Loading Required Libraries

\n
## \u2500\u2500 Attaching packages \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 tidyverse 1.2.1 \u2500\u2500
\n
## \u2714 ggplot2 3.0.0     \u2714 purrr   0.2.5\n## \u2714 tibble  2.1.3     \u2714 dplyr   0.8.3\n## \u2714 tidyr   0.8.1     \u2714 stringr 1.3.1\n## \u2714 readr   1.1.1     \u2714 forcats 0.3.0
\n
## Warning: package 'tibble' was built under R version 3.5.2
\n
## Warning: package 'dplyr' was built under R version 3.5.2
\n
## \u2500\u2500 Conflicts \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 tidyverse_conflicts() \u2500\u2500\n## \u2716 dplyr::filter() masks stats::filter()\n## \u2716 dplyr::lag()    masks stats::lag()
\n
## Warning: package 'gganimate' was built under R version 3.5.2
\n
## \n## Attaching package: 'zoo'
\n
## The following objects are masked from 'package:base':\n## \n##     as.Date, as.Date.numeric
\n
\n
\n

Reading in Data

\n

First, I read in the data I found on NYC & Company\u2019s website:

\n
occ_adr <- read_excel("hotel_reports.xlsx", sheet="occ_adr")\ndemand <- read_excel("hotel_reports.xlsx", sheet="demand")
\n
\n
\n

Data Cleaning

\n

The data was pretty clean to begin with but I still had to make some adjustments:

\n
# checking data types \nstr(occ_adr)
\n
## Classes 'tbl_df', 'tbl' and 'data.frame':    56 obs. of  6 variables:\n##  $ Month : chr  "January" "February" "March" "April" ...\n##  $ Year  : num  2015 2015 2015 2015 2015 ...\n##  $ Occ   : num  0.704 0.773 0.871 0.913 0.932 0.932 0.923 0.907 0.925 0.874 ...\n##  $ ADR   : num  212 222 251 286 320 308 267 261 378 357 ...\n##  $ Month2: num  1 2 3 4 5 6 7 8 9 10 ...\n##  $ Month3: chr  "2015-01" "2015-02" "2015-03" "2015-04" ...
\n
str(demand)
\n
## Classes 'tbl_df', 'tbl' and 'data.frame':    56 obs. of  5 variables:\n##  $ Month : chr  "January" "February" "March" "April" ...\n##  $ Year  : num  2015 2015 2015 2015 2015 ...\n##  $ Demand: num  2.35 2.25 2.78 2.69 3 ...\n##  $ Month2: num  1 2 3 4 5 6 7 8 9 10 ...\n##  $ Month3: chr  "2015-01" "2015-02" "2015-03" "2015-04" ...
\n
# rounding the occupancy percentages \nocc_adr <- occ_adr %>% mutate(Occ = as.numeric(Occ)) %>% mutate_if(is.numeric, round, 3)\n# transforming the month column into a factor \nocc_adr$Month <- factor(occ_adr$Month, levels=c("January", "February", "March", "April", "May", "June", "July",    "August", "September", "October", "November", "December") )
\n
\n
\n
\n

Dynamic Visualizations

\n

I used gganimate for all dynamic visualizations in the article. It is built on top of ggplot2 and therefore really easy to get used to if you have used ggplot2 before. I\u2019ll go through each of the plots from the article and share my code along with comments.

\n
\n

Hotel Room Nights Sold Plot

\n

As I wanted to start out with the hotel room nights sold, I decided to go with a line plot. I used the tranistion_reveal() funcion to animate the line plot. This function lets the data gradually appear along a specified dimension. In this case, I wanted to use months 1-12 on the x-axis. The function starts out in regular ggplot2 fashion: I used the data from the \u201cdemand\u201d tibble and used the \u201cMonth2\u201d variable on the x- and the \u201cDemand\u201d variable on the y-axis. I grouped the data by year and colored my lines according to the \u201cYear\u201d variable. I then added a the line geom and \u201cscale_color_viridis_d()\u201d, which simply adds the viridis color palette. After that, I added labels for the x- and y-axis along with a centered title before using the transition_reveal function to animate this static ggplot2 plot.

\n
p <- ggplot(\n     demand,\n     aes(Month2, Demand, group = Year, color = factor(Year))\n     ) +\n     geom_line() +\n     scale_color_viridis_d() +\n     labs(x = "Month", y = "Millions of Stays") +\n     theme(legend.position = "top", plot.title = element_text(hjust = 0.5),\n           axis.text=element_text(size=11)) + \n     scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10,11,12), \n                        labels=c("Jan", "Feb", "Mar", "Apr", \n                                 "May", "June", "July", "Aug", \n                                 "Sep", "Oct", "Nov", "Dec")\n     ) +\n     labs(color='Year') +\n     ggtitle("Hotel Room Nights Sold in NYC 2015-2019") +\n     transition_reveal(Month2) +\n     theme_light()
\n

In order to render this plot, I used the following:

\n
animate(p, height=500, width=500)
\n