This blog entry aims to estimate the strengths of NFL teams for the last 20 regular seasons in a way that allows to make predictions not only on the outcome of a game, i.e. win or loss but also exploiting the difference in points scored. The main idea behind this model follows Mark Glickman’s (2001) paper which proposes the blueprint for the Glicko-2 rating system. Multiple commonly used rating systemes alreday exist like the well known ELO or Glicko rating systems which are also available in the PlayerRatings package on CRAN. The model behind the Glicko-2 rating system seems to be one of the most advanced dynamic paired comparison models. Unfortunately there does not exist an implementation in R (at least not on CRAN). Moreover this model only uses the binary outome of a game and does not exploit differences in scores. So i went ahead and implemented the models proposed in Glickman (2001) in R using the rstan package and then extended the model. I will start with the data first, then propose the model and its implementation. Finally i will present the results and make predictions for Super Bowl 50.

The Data

library(data.table)
library(knitr)
library(ggplot2)
library(rstan)

# change locale for strptime function since
# raw data use abbreviated month names in english locale
Sys.setlocale(category = "LC_ALL", locale = "en_US.utf8")

# load function for data processing
source("../../nfl_data_processing.R")
nfl.data <- GetNflData(as.in.paper = FALSE, file = "../../nfl_games_96_15.csv")

The following table shows the first and the last five records on nfl games played from 1996 to the 2015 from the full dataset nfl.data$nfl.games containing 5041 decisive regular season games. winner and loser denote the winning and losing teams respectively, ptsw and ptsl are the respective points scored. score_delta contains the difference in scores, is_home takes the value 1 if the winning team played on the home field and -1 otherwise. period numbers the seasons sequentially form 1 (1996) to 20 (2015). date and play_week contain additional information on when the games took place. Note: The Houston Oilers and Tennessee Oilers were renamed to Tennessee Titans. Furthermore not all teams were part of the NFL in each of the years, e.g. the Houston Texans joined the NFL in 2002 and are observed from then on.

winner loser date play_week ptsw ptsl score_delta is_home period
Indianapolis Colts Arizona Cardinals 1996-09-01 1 20 13 7 1 1
Carolina Panthers Atlanta Falcons 1996-09-01 1 29 6 23 1 1
St. Louis Rams Cincinnati Bengals 1996-09-01 1 26 16 10 1 1
Minnesota Vikings Detroit Lions 1996-09-01 1 17 13 4 1 1
Miami Dolphins New England Patriots 1996-09-01 1 24 10 14 1 1
Detroit Lions San Francisco 49ers 2015-12-27 16 32 17 15 1 20
St. Louis Rams Seattle Seahawks 2015-12-27 16 23 17 6 -1 20
Chicago Bears Tampa Bay Buccaneers 2015-12-27 16 26 21 5 -1 20
Houston Texans Tennessee Titans 2015-12-27 16 34 6 28 -1 20
Denver Broncos Cincinnati Bengals 2015-12-28 16 20 17 3 1 20

The second table shows 5 of the 32 unique teams that were part of the NFL at any period and a sequential team_id which will later serve as an index variable into arrays that store strenghts for all teams. The model will later assume that all teams have existed at every period even if this is not the case. The stength of teams in periods with no data will then depend on the prior set for the teams’ strengths. The results will later show a very high uncertainty on these teams’ strength in periods with no data, since the priors are chosen quite uninformative.

team_name team_id
Arizona Cardinals 1
Atlanta Falcons 2
Baltimore Ravens 3
Buffalo Bills 4
Carolina Panthers 5

The following barchart shows the frequency of the absolute value of the score difference - a count variable - for all observed games. It is clearly visible that this distribution does not follow a smooth curve. This is beacuse of the NFL’s scoring scheme. The achieved results are most likely sums of multiples of touchdowns with the extra point yielding 7 points and field goals yielding 3 points.

plot of chunk score_differences

I will therefore propose an ordered logistic model instead of a classical poisson distribution to model the scores since this allows for a felxible dirstibution without the introduction of many additional parameters. I will also censor the variable of score differences at a value of 30 and -30 respectively so that all values greater (lower) than 30 (-30) are grouped into one bin, whereas all lower values form a bin of their own. The result is an orderd variable with 31 bins leading to an additional 30 parameters to be estimated compared to a poisson model or bernoulli model when only taking the binary event of win or loss into account.Note: In this case i will create a single bin for every single score difference lower than 31. This is possible since data exist in every bin. If this was not the case one could choose different bins as long as these bins span consecutive score differences and thus the resulting variable keeps the ordinal scale.

The Model Basics

The models main assumption is that a games outcome depends solely on the opponents strengths and a set of team specific covariates and their effect on the score via , e.g. the effect of playing on the home field. The teams strengths are unobserved (latent) variables. The model further assumes that the likelihood of team defeating team and the margin by which team wins - the score difference - rises monotonically in the difference of strenghts (and covariates): Note: This section is very condensed and sacrifices some mathematical precision for the sake of brevity. The code in the implementation section will reveal the precise definitions of all variables.

Where takes the value one if team won and zero if team won. is an integer value containing the censored difference in scores as described above, e.g. if team wins with a scoreline of 24:7 then takes the value 18. If team loses 10:50 this variable takes the value -31, since it will be censored for differences exceeding 30. So if team wins this is a positive integer, else it is a negative integer. is a (strictly) increasing function, e.g. the logistic function. Thus by observing multiple game outcomes between two teams at a time it is possible to infer teams’ strengths and the effects of covariates. The final state-space model described in the following also introduces time periods, i.e. seasons, autocorrelation between these periods and stochastic variances.

Orderd Logistic Stochastic Variance Model

Let where is an arbitrary error term that follows a standard logistic distribution. Let be a symetrical top and bottom censored version of the uncensored difference in scores with an arbitrary censoring limit . This will reduce the length of the parameter vector to be estimated in the case of extreme score differences, and can be necessary for model identification.

This yiels the following probability statements, with being the logistic function. Because of the symmerty imposed in the definition of above and by using the logsitic function for transforming to the ordinal scale the estimated values of can directly be compared to Glickmans models which only use the binary outcome variable :

In the further equations denotes a normal distribution with mean and standard deviation . denotes a log normal distribution with mean and standard deviation on the log scale. denotes an inverse gamma distribution with shape and scale . is the vector of strengths for the first teams, is the vector for the second teams. is the vector of the covariate for the home advantage, which equals 1 if team plays on the home field and -1 otherwise.Note that here are vectors the length of all matches observed containing the appropriate parameter at each position. This leads to multiple occurences of the same parameter within the vector. This also holds for the covariate vector . is the autocorrelation parameter, the parameter for the home field advantage and is the inital standard deviation in period zero. is the parameter for the change in variances and responsible for possible sudden shifts in teams’ strengths between periods. is a vector containing the teams’ stengths in period . Likewise is a vector containing the teams’ variances in period .Note that these vectors have the length of unique teams as opposed to that have the length of all observed games.

Some important implications of this model:

  • The teams’ strength increases in tendency if it wins against opponents:
    • The higher the score margin the greater the increase
    • The better the opponent the higher the increase
    • All of these also hold vice versa
  • Teams’ strenghts and variances remain constant within a period, i.e. season and vary only between seasons, so the order of matches within the regular season does not have any impact.
  • introduces regression to the mean, i.e. it is unlikely that strong teams in one season get even stronger in subsequent seasons This is in reality (at least partially) prevented through the NFL’s draft system that benefits weak teams of season.
  • Though sudden shifts in teams’ strengths between periods are allowed (via )

The Model implementation and Estimation

The model is implemented using the rstan package for R which provides an interface to the Stan language. The Stan language offers a rich and very felxible scripting language for bayesian models. It is similar to R in many ways and features easy to use sampling statements. Stan’s main benefits are that it is very flexible, uses the very efficient NUTS algorithm for MCMC sampling and compiles the model code in C++ which makes it fast. All of this comes at the cost of an own language with stronger typing as the typical R user (or at least me) is used to. The model code for Stan is stored in the file nfl_model_sv_delta.stan and looks as follows:

data {
  int<lower=1> T;  // total number of time periods
  int<lower=0> Nt[T+1];  // cumulative number of games per period
  int<lower=1> N;  // number of total games
  int<lower=1> P;  // number of unique players
  int<lower=2> K;  // number of total bins for the score delta
  int<lower=1, upper=K> d[N];  // vector of the score deltas
  vector[N] x;  // vector of home field dummy variable
  int f[N];  // vector of ids for team i
  int s[N];  // vector of ids for team j
}
parameters {
  vector[P] g[T+1];  // array of teams strength vectors for all periods
  vector<lower=0>[P] sigma_sq[T];  // array of teams variance vectors for all periods
  real<lower=0> omega_sq;  // variance of teams strengths in t=0
  real<lower=0> tau;  // parameter responsible for shifts in strength between periods
  real beta;  // parameter effect of playing on the home field
  real<lower=0, upper=1> rho;  // autoregression parameter
  positive_ordered[K-1] eta;  // cutoff parameters for ordered logistic model
}
transformed parameters {
  // restrict first parameter of eta to zero for symmetry:
  // this forms a base category from -infty to zero that
  // is not covered by any data. This is necessary since
  // Stan does not allow to modify the very first cutoff
  // which is set to -infty for convention.
  positive_ordered[K-1] eta_zero;
  eta_zero[1] <- 0;
  for (e in 2:(K-1)) {
    eta_zero[e] <- eta[e];
  }
}
model {
  // create initial vectors
  vector[N] gf;
  vector[N] gs;
  vector[P] sigma[T];
  for (t in 1:T) {
    for (i in (1+Nt[t]):Nt[t+1]) {
      gf[i] <- g[t+1][f[i]];
      gs[i] <- g[t+1][s[i]];
    }
    for (p in 1:P) {
      sigma[t][p] <- sqrt(sigma_sq[t][p]);
    }
  }

  // uninformative priors
  omega_sq ~ inv_gamma(4, 2);
  tau ~ inv_gamma(4, 1.5);
  beta ~ normal(0, 25);

  // teams strengths
  g[1] ~ normal(0, sqrt(omega_sq));
  for (t in 1:T) {
    g[t+1] ~ normal(rho * g[t], sigma[t]);
  }

  // teams variances
  sigma_sq[1] ~ lognormal(log(omega_sq), tau);
  for (t in 2:T) {
    sigma_sq[t] ~ lognormal(log(sigma_sq[t-1]), tau);
  }

  // ordered logistic model
  for (n in 1:N) {
    d[n] ~ ordered_logistic(gf[n] - gs[n] + beta * x[n], eta_zero);
  }
}

Calling the estimation procedure from R is pretty straightforward once the data is transformed in R to create the variables defined in the data part of the above code. This boils in this case down to a simple list of vectors and scalars using the names as defined in the data section.

# detect cores and tell rstan for parallel computing
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# create bins for the ordered logit
# this will censore all differences at a maximum of 30
bins <- 1:30

# create the censored variable delta
nfl.data$stan.data$d <- rowSums(sapply(bins, 
                                       function(x) {
                                         nfl.data$stan.data$yd > x
                                       })) + 1

# check if there are observations for all bins
# if not this would lead to an identification problem
sum(unique(nfl.data$stan.data$d)) == sum(1:max(bins + 1))

# modify data for the built-in stan ordered logistic function
# this is necessary to form an unused (i.e. no data) and fixed 
# bin 1 that ranges from -infty to zero. (see eta_zero modification
# in the stan code. Reason: Stan does not allow to set the very first 
# which is usually set to -infty)
nfl.data$stan.data$d <- nfl.data$stan.data$d + 1
nfl.data$stan.data$K <- max(nfl.data$stan.data$d)

Here is an overview for the variables that are passed over to stan. This should be pretty clear except maybe for Nt which contains the cumulative sum of games played per period or put otherwise the cutoffs for the periods in the table of all games as shown above. E.g. the first period is formed by observations 1 to 240, the second from 241 to 478, and so on. Note: The dataset must be orderd by the period variable in ascending order for this to make sense. Also note that the score is always positive. This is because the table is sorted in a way that the winning team is always set at the first position. This is not a restriction because of the model’s symmetry. s and d contain the indices of teams and facing each other in a game, that can be found in the above table of teams in the column team_id. These are necessary to create the vectors and .Note that yd is the vector of real differences and d is the censored version increased by one. See the code and the comments as why the increase by one is necessary for pure technical reasons.

str(nfl.data$stan.data)
## List of 11
##  $ T : int 20
##  $ Nt: int [1:21] 0 240 478 718 966 1214 1462 1717 1973 2229 ...
##  $ N : int 5041
##  $ P : int 32
##  $ y : int [1:5041] 1 1 1 1 1 1 1 1 1 1 ...
##  $ yd: int [1:5041] 7 23 10 4 14 16 3 25 5 15 ...
##  $ x : num [1:5041] 1 1 1 1 1 1 -1 1 1 1 ...
##  $ f : int [1:5041] 14 5 29 18 17 27 4 10 3 15 ...
##  $ s : int [1:5041] 1 2 7 11 19 20 21 22 23 25 ...
##  $ d : num [1:5041] 8 24 11 5 15 17 4 26 6 16 ...
##  $ K : num 32

The call to compile and run the model is also straightforward. I usually use four independent chains (chains) that can be calucated in parallel. I find that 20k (5k per chain (iter)) iterations are way enough and stationarity in the traceplots is reached after less than 100 iterations. All of the post warmup iterations are stored (thin = 1) and used for inference. Calculation takes about 15 minutes on my Intel Xeon E3-1231v3 CPU including model compilation.

# run stan model
nfl.est.ologit <- stan(file = "../../nfl_model_sv_delta.stan", 
                       data = nfl.data$stan.data, 
                       chains = 4, iter = 5000, 
                       warmup = 1000, seed = 7, thin = 1)

Estimation Results

Finally this section deals with the results of the parameter inference in the proposed model. The function GetNflSummary transforms Stan’s output (i.e. the siumulated values for all parameters) into a set of reasonable summaries. It basically calculates posterior means and pointwise 95 percent confidence intervalls on the parameters of interest.

# logistic helper function
ll <- function(u) {
  1 / (1 + exp(-u))
}

# load function to generate model summaries
source("../../nfl_model_diagnose.R")
nfl.summary <- GetNflSummary(nfl.data, nfl.est.ologit)

At first the results for the scalar parameters , , and with pointwise 95 percent confidence intervalls. The home field effect is estimated to be at 0.37 with a pretty tight confidence intervall. This means a game between teams with equal strength leads to a 59.03 percent chance of the home team winning in expectation. The parameter value of is quite small indicating that sudden shifts in teams’ strengths are quite unlikely. indicates a siginficant regession to the mean.

parameter mean 2.5% 50% 97.5%
omega_sq 0.3322 0.2572 0.3302 0.4193
tau 0.1553 0.0994 0.1525 0.2297
beta 0.3652 0.3164 0.3651 0.4142
rho 0.6185 0.5229 0.6196 0.7083
lp__ -17595.6251 -17844.2981 -17598.8971 -17333.1110

The next chart displays the teams’ strengths over time with the corresponding 95 percent confidence intervalls depicted as dotted lines.

plot of chunk stan_merit_plot

One can see especially large confidence intervalls for the Houston Texans up to period 7. This is because there is no data on this team before period 7 (i.e. season 2002). So the model assumes a strength for that team that is based on the prior on with a large variance since there is no evidence on the teams’ strenght unil this period. It shows that the Arizona Cardinals improved quite substantially over the last years. The San Francisco 49ers faced a downturn in strength over the last two years after a peak in periods 17 and 18 (2012 and 2013 seasons). The New England Patriots were strong in every season of the past 10 years it seems. The St. Louis Rams seem to have the most volatility in their strength in the last 20 seasons, with a peak around period 5 (season 2000) and some low performances around period 13 (season 2008). The model estimates that the Seatlle Seahwaks and the Kansas City Chiefs were particulary strong within the last three seasons. And finally an overview of the teams’ strengts gamma in the 2015 regular season including posterior standard deviations gama_std_dev of the estimated strength:

team_name team_id gamma gamma_std_dev
Arizona Cardinals 1 1.2042 0.4424
New England Patriots 19 1.0832 0.4066
Seattle Seahawks 28 1.0170 0.4069
Cincinnati Bengals 7 1.0075 0.3941
Carolina Panthers 5 0.9941 0.4083
Kansas City Chiefs 16 0.8763 0.4186
Denver Broncos 10 0.7399 0.3804
Pittsburgh Steelers 25 0.7215 0.3722
Green Bay Packers 12 0.6364 0.3887
Minnesota Vikings 18 0.5655 0.4040
New York Jets 22 0.1825 0.3640
Buffalo Bills 4 0.1313 0.3365
Houston Texans 13 0.1304 0.3660
Detroit Lions 11 -0.0473 0.3738
Philadelphia Eagles 24 -0.0892 0.3963
St. Louis Rams 29 -0.0986 0.3911
Baltimore Ravens 3 -0.1712 0.3535
New York Giants 21 -0.1796 0.3451
San Diego Chargers 26 -0.2211 0.3500
Oakland Raiders 23 -0.2436 0.3629
Atlanta Falcons 2 -0.2606 0.3602
Washington Redskins 32 -0.2683 0.3572
Indianapolis Colts 14 -0.3187 0.3975
Chicago Bears 6 -0.3514 0.3576
New Orleans Saints 20 -0.3618 0.3862
Dallas Cowboys 9 -0.4389 0.3767
Miami Dolphins 17 -0.4603 0.3743
San Francisco 49ers 27 -0.7056 0.4055
Tampa Bay Buccaneers 30 -0.7229 0.3789
Cleveland Browns 8 -0.7354 0.3658
Jacksonville Jaguars 15 -0.9065 0.3683
Tennessee Titans 31 -1.1318 0.3948

The model finds the Arizona Cardinals as best team of the 2015 season closely followed by the New Englad Patriots. Then come the Seattle Seahwaks and the Carolina Pathers with almost equal strength. This seasons’ Super Bowl contenders are estimated to be the fourth (Carolina Panthers) and seventh (Denver Broncos) strongest teams in the 2015 regular season. Also note that the uncertainty of the teams’ strength is quite large and between about 0.3 and 0.4 for all teams, so it is not possible to say that one of the named teams is statistically significantly stronger or weaker than the another team.

Predictions for Super Bowl 50

The model allows to make predictions not only on which team is likely to win or loose but also by which margin. For the margin the estimated cutoff parameters are important, since these cut the continous latent predictor into the observed bins of scorelins. The estimated pointwise posterior confidence intervalls are very narrow indicating that the model can precisly estimate these variables.Note that the parameter displayed as eta_0 in the table is not an estimation but is fixed for symmetry.

parameter mean 2.5% 50% 97.5%
eta_0 0.0000 0.0000 0.0000 0.0000
eta_1 0.1014 0.0884 0.1012 0.1153
eta_2 0.1943 0.1762 0.1942 0.2135
eta_3 0.5810 0.5510 0.5809 0.6122
eta_4 0.7127 0.6795 0.7125 0.7471
eta_5 0.7998 0.7643 0.7997 0.8365
eta_6 0.9511 0.9128 0.9509 0.9905
eta_7 1.2259 1.1821 1.2259 1.2701
eta_8 1.3312 1.2853 1.3313 1.3775
eta_9 1.3825 1.3356 1.3826 1.4301
eta_10 1.5887 1.5376 1.5886 1.6406
eta_11 1.6830 1.6309 1.6827 1.7366
eta_12 1.7373 1.6841 1.7370 1.7916
eta_13 1.8581 1.8020 1.8580 1.9150
eta_14 2.0551 1.9945 2.0550 2.1164
eta_15 2.1210 2.0589 2.1209 2.1844
eta_16 2.2181 2.1539 2.2181 2.2823
eta_17 2.3883 2.3204 2.3880 2.4572
eta_18 2.5172 2.4468 2.5169 2.5894
eta_19 2.5911 2.5188 2.5911 2.6652
eta_20 2.7166 2.6408 2.7166 2.7929
eta_21 2.9322 2.8496 2.9322 3.0158
eta_22 3.0164 2.9304 3.0163 3.1029
eta_23 3.1058 3.0174 3.1056 3.1944
eta_24 3.3167 3.2203 3.3164 3.4151
eta_25 3.4431 3.3429 3.4428 3.5453
eta_26 3.5277 3.4232 3.5274 3.6333
eta_27 3.7408 3.6298 3.7404 3.8548
eta_28 4.0311 3.9089 4.0302 4.1567
eta_29 4.1355 4.0080 4.1353 4.2672
eta_30 4.2508 4.1153 4.2502 4.3889

The probability statements from above make it easy to calculate probabilities for score differences for given teams. Here is a graph of the probabilities for Super Bowl 50 as seen for the Carolina Panthers, e.g. a positve score difference, depicted as the green distribution, indicates that the Carolina Panthers win the Superbowl.

plot of chunk prediction

The probability of the Panthers winning the Super Bowl () is predicted to be 56.32 percent, vice versa the Denver Broncos win with a probability of 43.68 percent. Through the imposed symmetry contraints for the cutoff paramters the distribution of score differences is the same in both scenarios. May the Super Bowl be as close as predicted!

Final Remark

Thank you for reading through this lengthy blog entry! The full code is already availabe on GitHub in my private repository, but needs a bit of cleaning and will then be made available for everyone that is interested. Please let me know what you think, either in the comments or contact me directly.

References & Resources

  • Glickman, Mark E. (2001): Dynamic paired comparison models with stochastic variances. Journal of Applied Statistics, 28(6), pp. 673–689.
  • Stan Development Team (2015): Modeling Language User’s Guide and Reference Manual, v2.9.0. http://mc-stan.org/documentation/.
  • Pro Football Reference. (2015) NFL Data for Seasons 1996-2015. http://www.pro-football-reference.com/boxscores.

Software Used

sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 15.10
## 
## locale:
##  [1] LC_CTYPE=en_US.utf8        LC_NUMERIC=C              
##  [3] LC_TIME=en_US.utf8         LC_COLLATE=en_US.utf8     
##  [5] LC_MONETARY=en_US.utf8     LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets 
## [7] base     
## 
## other attached packages:
## [1] rstan_2.8.2      ggplot2_1.0.1    knitr_1.12.3    
## [4] data.table_1.9.4
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2      servr_0.2        MASS_7.3-44     
##  [4] munsell_0.4.2    colorspace_1.2-6 stringr_0.6.2   
##  [7] highr_0.5.1      plyr_1.8.1       tools_3.2.3     
## [10] parallel_3.2.3   grid_3.2.3       gtable_0.1.2    
## [13] digest_0.6.8     gridExtra_2.0.0  reshape2_1.4.1  
## [16] formatR_1.2.1    codetools_0.2-14 inline_0.3.14   
## [19] evaluate_0.8     labeling_0.3     scales_0.2.4    
## [22] stats4_3.2.3     httpuv_1.3.3     chron_2.3-45    
## [25] proto_0.3-10