Tim Burton vs Steven Spielberg

IMDB ratings: Differences between directors

We are going to explore whether the mean IMDB rating for Steven Spielberg and Tim Burton are the same or not.

We write down the null and alternative hypotheses.

Null Hypothesis (H0) | The mean IMDB rating for Spielberg films = The mean IMDB rating for Burton films (Mean IMDB rating for Spielberg films - Mean IMDB rating for Burton films = 0) Alternative Hypothesis (H1) | The mean IMDB rating for Spielberg films != The mean IMDB rating for Burton films (Mean IMDB rating for Spielberg films - Mean IMDB rating for Burton films != 0 )

Load the data

movies <- read_csv(here::here("data", "movies.csv"))
glimpse(movies)

## Rows: 2,961
## Columns: 11
## $ title               <chr> "Avatar", "Titanic", "Jurassic World", "The Aveng…
## $ genre               <chr> "Action", "Drama", "Action", "Action", "Action", …
## $ director            <chr> "James Cameron", "James Cameron", "Colin Trevorro…
## $ year                <dbl> 2009, 1997, 2015, 2012, 2008, 1999, 1977, 2015, 2…
## $ duration            <dbl> 178, 194, 124, 173, 152, 136, 125, 141, 164, 93, …
## $ gross               <dbl> 7.61e+08, 6.59e+08, 6.52e+08, 6.23e+08, 5.33e+08,…
## $ budget              <dbl> 2.37e+08, 2.00e+08, 1.50e+08, 2.20e+08, 1.85e+08,…
## $ cast_facebook_likes <dbl> 4834, 45223, 8458, 87697, 57802, 37723, 13485, 92…
## $ votes               <dbl> 886204, 793059, 418214, 995415, 1676169, 534658, …
## $ reviews             <dbl> 3777, 2843, 1934, 2425, 5312, 3917, 1752, 1752, 3…
## $ rating              <dbl> 7.9, 7.7, 7.0, 8.1, 9.0, 6.5, 8.7, 7.5, 8.5, 7.2,…

skim(movies)

(#tab:load_movies_data)Data summary

Name	movies
Number of rows	2961
Number of columns	11
_______________________
Column type frequency:
character	3
numeric	8
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
title	1	1	83	2907
genre	1	5	11	17
director	1	3	32	1366

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	2.00e+03	9.95e+00	1920.0	2.00e+03	2.00e+03	2.01e+03	2.02e+03	▁▁▁▂▇
duration	1	1.10e+02	2.22e+01	37.0	9.50e+01	1.06e+02	1.19e+02	3.30e+02	▃▇▁▁▁
gross	1	5.81e+07	7.25e+07	703.0	1.23e+07	3.47e+07	7.56e+07	7.61e+08	▇▁▁▁▁
budget	1	4.06e+07	4.37e+07	218.0	1.10e+07	2.60e+07	5.50e+07	3.00e+08	▇▂▁▁▁
cast_facebook_likes	1	1.24e+04	2.05e+04	0.0	2.24e+03	4.60e+03	1.69e+04	6.57e+05	▇▁▁▁▁
votes	1	1.09e+05	1.58e+05	5.0	1.99e+04	5.57e+04	1.33e+05	1.69e+06	▇▁▁▁▁
reviews	1	5.03e+02	4.94e+02	2.0	1.99e+02	3.64e+02	6.31e+02	5.31e+03	▇▁▁▁▁
rating	1	6.39e+00	1.05e+00	1.6	5.80e+00	6.50e+00	7.10e+00	9.30e+00	▁▁▆▇▁

# We use the duplicated() function to identify duplicates, and find that there are none
summary(duplicated(movies))

##    Mode   FALSE 
## logical    2961

director_rating <- movies %>% 
  filter(director %in% c("Steven Spielberg","Tim Burton"))

# We summarise data using mean and CI functions
director_rating2 <- director_rating %>%
  group_by(director) %>%
  dplyr::summarise(mean_rating = mean(rating),
            uci_rating = CI(rating)[1],
            lci_rating = CI(rating)[3])

# We use geom_errorbar and geom_react in order to reproduce the graph
ggplot(director_rating2, aes(x = mean_rating, y = reorder(director, mean_rating), colour=director)) +
  geom_point(size=5, show.legend = FALSE)+
  # We use geom_rect to draw overlapping sections
  geom_rect(aes(xmin=uci_rating[2], xmax=lci_rating[1], ymin=-Inf, ymax=Inf), colour = "light grey", alpha = 0.2)+
  
  # We use geom_errorbar to draw CI intervals
  geom_errorbar(aes(xmin = lci_rating, xmax = uci_rating), size=1.3, width=.05, show.legend = FALSE)+
  geom_text(aes(label = round(lci_rating,2), x = lci_rating), size= 4.5, vjust = -1, colour ="black")+
  geom_text(aes(label = round(mean_rating,2), x = mean_rating), size= 6, vjust = -1, colour ="black")+
  geom_text(aes(label = round(uci_rating,2), x = uci_rating), size= 4, vjust = -1, colour ="black")+
  
  labs(x="Mean IMBD Rating", 
       y= "", title= "Do Spielberg and Burton have the same mean IMDB ratings?", 
       subtitle= "95% confidence intervals overlap")+
  theme_bw()+
  NULL

In addition, we run a hypothesis test using both the t.test command and the infer package to simulate from a null distribution, where we assume zero difference between the two.

Hypothesis testing

director_rating <- movies %>%
    filter(director %in% c("Steven Spielberg","Tim Burton")) # select directors
  t.test(rating~director, director_rating)

## 
##  Welch Two Sample t-test
## 
## data:  rating by director
## t = 3, df = 31, p-value = 0.01
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.16 1.13
## sample estimates:
## mean in group Steven Spielberg       mean in group Tim Burton 
##                           7.57                           6.93

First, we use the t.test to calculate the p-Value. We conclude that there is a statistically significant difference in means, since our p-value is 0.01 and hence |p|<a (where a=0.05) (or |t| > 2.00). Therefore, we reject the null hypothesis. Now we run the simulation with the infer package:

set.seed(1234) # We set a specific seed value

# We run a simulation with infer package

director_hypothesis_test<- director_rating %>%
  
  specify(rating ~ director) %>%
  
  # Here, `hypothesize` is used to set the null hypothesis as a test for independence, i.e., that there is no difference between the two population means.
  hypothesise(null="independence") %>% 
  # The `type` argument within generate is set to permute, which is the argument when generating a null distribution for a hypothesis test.
  generate(reps=1000, type="permute") %>% 
  #calculate the observed difference in means with bootstrap
  calculate(stat="diff in means",order = c("Steven Spielberg","Tim Burton"))


diff_rating <- director_rating %>% 
  specify(rating ~ director) %>% 
  calculate(stat = "diff in means", order = c("Steven Spielberg","Tim Burton"))
#get p_Value through infer package 
director_p_value <- director_hypothesis_test %>% 
    get_p_value(obs_stat = diff_rating[1,1], direction = "both")

kbl(director_p_value,col.names=c("p-value"), caption="Simulation-based null distribution using the 'infer' package") %>%
kable_styling()

(#tab:infer_hypothesis_test)Simulation-based null distribution using the ‘infer’ package
p-value
0.008

Again, on the basis of this simulation-based hypothesis test we reject the null hypothesis stating that there is no difference in means of IMDB rating for Spielberg films and for Burton films as the p-Value is smaller than the alpha value (0.05). Therefore there is statistically significant difference between the mean ratings of the films of the two film directors. We visualise the simulation-based null distribution:

#We calculate CI using infer package
director_percentile_ci <- director_hypothesis_test %>%
  get_confidence_interval(level = 0.95, type = "percentile")

#We show a plot with shaded confidence interval using infer package
director_hypothesis_test %>% visualise() +
  
  shade_confidence_interval(endpoints = director_percentile_ci, size=0.4, color= "yellow", fill="light yellow") +
  shade_p_value(obs_stat = diff_rating, direction = "both", size=1.1) +
  
  labs(title = "Are Spielberg and Burton Equally Good Directors?", 
       subtitle="Simulation-based null distribution: the p-value (red) lies well outside the null distribution (yellow), allowing us to reject the null hypothesis", x="Difference in Mean IMDB Rating Between Directors", y="Count")+theme_fivethirtyeight() + theme(axis.title = element_text()) + scale_x_continuous(breaks=seq(-1.0, 1.0, by = 0.1), limits=c(-1.0,1.0))

In the graph it is shown that the p-value is outside of the 95% confidence interval (alpha = 0.05). We therefore reject the null hypothesis (H0) that there is no difference in the mean IMDB ratings of the films of Tim Burton and Steven Spielberg.