Tim Burton vs Steven Spielberg

IMDB ratings: Differences between directors

We are going to explore whether the mean IMDB rating for Steven Spielberg and Tim Burton are the same or not.

We write down the null and alternative hypotheses.

Null Hypothesis (H0) | The mean IMDB rating for Spielberg films = The mean IMDB rating for Burton films (Mean IMDB rating for Spielberg films - Mean IMDB rating for Burton films = 0) Alternative Hypothesis (H1) | The mean IMDB rating for Spielberg films != The mean IMDB rating for Burton films (Mean IMDB rating for Spielberg films - Mean IMDB rating for Burton films != 0 )

Load the data

movies <- read_csv(here::here("data", "movies.csv"))
glimpse(movies)
## Rows: 2,961
## Columns: 11
## $ title               <chr> "Avatar", "Titanic", "Jurassic World", "The Aveng…
## $ genre               <chr> "Action", "Drama", "Action", "Action", "Action", …
## $ director            <chr> "James Cameron", "James Cameron", "Colin Trevorro…
## $ year                <dbl> 2009, 1997, 2015, 2012, 2008, 1999, 1977, 2015, 2…
## $ duration            <dbl> 178, 194, 124, 173, 152, 136, 125, 141, 164, 93, …
## $ gross               <dbl> 7.61e+08, 6.59e+08, 6.52e+08, 6.23e+08, 5.33e+08,…
## $ budget              <dbl> 2.37e+08, 2.00e+08, 1.50e+08, 2.20e+08, 1.85e+08,…
## $ cast_facebook_likes <dbl> 4834, 45223, 8458, 87697, 57802, 37723, 13485, 92…
## $ votes               <dbl> 886204, 793059, 418214, 995415, 1676169, 534658, …
## $ reviews             <dbl> 3777, 2843, 1934, 2425, 5312, 3917, 1752, 1752, 3…
## $ rating              <dbl> 7.9, 7.7, 7.0, 8.1, 9.0, 6.5, 8.7, 7.5, 8.5, 7.2,…
skim(movies)
(#tab:load_movies_data)Data summary
Name movies
Number of rows 2961
Number of columns 11
_______________________
Column type frequency:
character 3
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
title 0 1 1 83 0 2907 0
genre 0 1 5 11 0 17 0
director 0 1 3 32 0 1366 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 2.00e+03 9.95e+00 1920.0 2.00e+03 2.00e+03 2.01e+03 2.02e+03 ▁▁▁▂▇
duration 0 1 1.10e+02 2.22e+01 37.0 9.50e+01 1.06e+02 1.19e+02 3.30e+02 ▃▇▁▁▁
gross 0 1 5.81e+07 7.25e+07 703.0 1.23e+07 3.47e+07 7.56e+07 7.61e+08 ▇▁▁▁▁
budget 0 1 4.06e+07 4.37e+07 218.0 1.10e+07 2.60e+07 5.50e+07 3.00e+08 ▇▂▁▁▁
cast_facebook_likes 0 1 1.24e+04 2.05e+04 0.0 2.24e+03 4.60e+03 1.69e+04 6.57e+05 ▇▁▁▁▁
votes 0 1 1.09e+05 1.58e+05 5.0 1.99e+04 5.57e+04 1.33e+05 1.69e+06 ▇▁▁▁▁
reviews 0 1 5.03e+02 4.94e+02 2.0 1.99e+02 3.64e+02 6.31e+02 5.31e+03 ▇▁▁▁▁
rating 0 1 6.39e+00 1.05e+00 1.6 5.80e+00 6.50e+00 7.10e+00 9.30e+00 ▁▁▆▇▁
# We use the duplicated() function to identify duplicates, and find that there are none
summary(duplicated(movies))
##    Mode   FALSE 
## logical    2961
director_rating <- movies %>% 
  filter(director %in% c("Steven Spielberg","Tim Burton"))

# We summarise data using mean and CI functions
director_rating2 <- director_rating %>%
  group_by(director) %>%
  dplyr::summarise(mean_rating = mean(rating),
            uci_rating = CI(rating)[1],
            lci_rating = CI(rating)[3])

# We use geom_errorbar and geom_react in order to reproduce the graph
ggplot(director_rating2, aes(x = mean_rating, y = reorder(director, mean_rating), colour=director)) +
  geom_point(size=5, show.legend = FALSE)+
  # We use geom_rect to draw overlapping sections
  geom_rect(aes(xmin=uci_rating[2], xmax=lci_rating[1], ymin=-Inf, ymax=Inf), colour = "light grey", alpha = 0.2)+
  
  # We use geom_errorbar to draw CI intervals
  geom_errorbar(aes(xmin = lci_rating, xmax = uci_rating), size=1.3, width=.05, show.legend = FALSE)+
  geom_text(aes(label = round(lci_rating,2), x = lci_rating), size= 4.5, vjust = -1, colour ="black")+
  geom_text(aes(label = round(mean_rating,2), x = mean_rating), size= 6, vjust = -1, colour ="black")+
  geom_text(aes(label = round(uci_rating,2), x = uci_rating), size= 4, vjust = -1, colour ="black")+
  
  labs(x="Mean IMBD Rating", 
       y= "", title= "Do Spielberg and Burton have the same mean IMDB ratings?", 
       subtitle= "95% confidence intervals overlap")+
  theme_bw()+
  NULL

In addition, we run a hypothesis test using both the t.test command and the infer package to simulate from a null distribution, where we assume zero difference between the two.

Hypothesis testing

director_rating <- movies %>%
    filter(director %in% c("Steven Spielberg","Tim Burton")) # select directors
  t.test(rating~director, director_rating)
## 
##  Welch Two Sample t-test
## 
## data:  rating by director
## t = 3, df = 31, p-value = 0.01
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.16 1.13
## sample estimates:
## mean in group Steven Spielberg       mean in group Tim Burton 
##                           7.57                           6.93

First, we use the t.test to calculate the p-Value. We conclude that there is a statistically significant difference in means, since our p-value is 0.01 and hence |p|<a (where a=0.05) (or |t| > 2.00). Therefore, we reject the null hypothesis. Now we run the simulation with the infer package:

set.seed(1234) # We set a specific seed value

# We run a simulation with infer package

director_hypothesis_test<- director_rating %>%
  
  specify(rating ~ director) %>%
  
  # Here, `hypothesize` is used to set the null hypothesis as a test for independence, i.e., that there is no difference between the two population means.
  hypothesise(null="independence") %>% 
  # The `type` argument within generate is set to permute, which is the argument when generating a null distribution for a hypothesis test.
  generate(reps=1000, type="permute") %>% 
  #calculate the observed difference in means with bootstrap
  calculate(stat="diff in means",order = c("Steven Spielberg","Tim Burton"))


diff_rating <- director_rating %>% 
  specify(rating ~ director) %>% 
  calculate(stat = "diff in means", order = c("Steven Spielberg","Tim Burton"))
#get p_Value through infer package 
director_p_value <- director_hypothesis_test %>% 
    get_p_value(obs_stat = diff_rating[1,1], direction = "both")

kbl(director_p_value,col.names=c("p-value"), caption="Simulation-based null distribution using the 'infer' package") %>%
kable_styling()
(#tab:infer_hypothesis_test)Simulation-based null distribution using the ‘infer’ package
p-value
0.008

Again, on the basis of this simulation-based hypothesis test we reject the null hypothesis stating that there is no difference in means of IMDB rating for Spielberg films and for Burton films as the p-Value is smaller than the alpha value (0.05). Therefore there is statistically significant difference between the mean ratings of the films of the two film directors. We visualise the simulation-based null distribution:

#We calculate CI using infer package
director_percentile_ci <- director_hypothesis_test %>%
  get_confidence_interval(level = 0.95, type = "percentile")

#We show a plot with shaded confidence interval using infer package
director_hypothesis_test %>% visualise() +
  
  shade_confidence_interval(endpoints = director_percentile_ci, size=0.4, color= "yellow", fill="light yellow") +
  shade_p_value(obs_stat = diff_rating, direction = "both", size=1.1) +
  
  labs(title = "Are Spielberg and Burton Equally Good Directors?", 
       subtitle="Simulation-based null distribution: the p-value (red) lies well outside the null distribution (yellow), allowing us to reject the null hypothesis", x="Difference in Mean IMDB Rating Between Directors", y="Count")+theme_fivethirtyeight() + theme(axis.title = element_text()) + scale_x_continuous(breaks=seq(-1.0, 1.0, by = 0.1), limits=c(-1.0,1.0))

In the graph it is shown that the p-value is outside of the 95% confidence interval (alpha = 0.05). We therefore reject the null hypothesis (H0) that there is no difference in the mean IMDB ratings of the films of Tim Burton and Steven Spielberg.