This blog entry aims to estimate the strengths of NFL teams for the last 20 regular seasons in a way that allows to make predictions not only on the outcome of a game, i.e. win or loss but also exploiting the difference in points scored. The main idea behind this model follows Mark Glickman’s (2001) paper which proposes the blueprint for the Glicko-2 rating system. Multiple commonly used rating systemes alreday exist like the well known ELO or Glicko rating systems which are also available in the PlayerRatings package on CRAN. The model behind the Glicko-2 rating system seems to be one of the most advanced dynamic paired comparison models. Unfortunately there does not exist an implementation in R (at least not on CRAN). Moreover this model only uses the binary outome of a game and does not exploit differences in scores. So i went ahead and implemented the models proposed in Glickman (2001) in R using the rstan package and then extended the model. I will start with the data first, then propose the model and its implementation. Finally i will present the results and make predictions for Super Bowl 50.
The following table shows the first and the last five records on nfl games played from 1996 to the 2015 from the full dataset
nfl.data$nfl.games containing 5041 decisive regular season games. winner and loser denote the winning and losing teams respectively, ptsw and ptsl are the respective points scored. score_delta contains the difference in scores, is_home takes the value 1 if the winning team played on the home field and -1 otherwise. period numbers the seasons sequentially form 1 (1996) to 20 (2015). date and play_week contain additional information on when the games took place. Note: The Houston Oilers and Tennessee Oilers were renamed to Tennessee Titans. Furthermore not all teams were part of the NFL in each of the years, e.g. the Houston Texans joined the NFL in 2002 and are observed from then on.
|Indianapolis Colts||Arizona Cardinals||1996-09-01||1||20||13||7||1||1|
|Carolina Panthers||Atlanta Falcons||1996-09-01||1||29||6||23||1||1|
|St. Louis Rams||Cincinnati Bengals||1996-09-01||1||26||16||10||1||1|
|Minnesota Vikings||Detroit Lions||1996-09-01||1||17||13||4||1||1|
|Miami Dolphins||New England Patriots||1996-09-01||1||24||10||14||1||1|
|Detroit Lions||San Francisco 49ers||2015-12-27||16||32||17||15||1||20|
|St. Louis Rams||Seattle Seahawks||2015-12-27||16||23||17||6||-1||20|
|Chicago Bears||Tampa Bay Buccaneers||2015-12-27||16||26||21||5||-1||20|
|Houston Texans||Tennessee Titans||2015-12-27||16||34||6||28||-1||20|
|Denver Broncos||Cincinnati Bengals||2015-12-28||16||20||17||3||1||20|
The second table shows 5 of the 32 unique teams that were part of the NFL at any period and a sequential team_id which will later serve as an index variable into arrays that store strenghts for all teams. The model will later assume that all teams have existed at every period even if this is not the case. The stength of teams in periods with no data will then depend on the prior set for the teams’ strengths. The results will later show a very high uncertainty on these teams’ strength in periods with no data, since the priors are chosen quite uninformative.
The following barchart shows the frequency of the absolute value of the score difference - a count variable - for all observed games. It is clearly visible that this distribution does not follow a smooth curve. This is beacuse of the NFL’s scoring scheme. The achieved results are most likely sums of multiples of touchdowns with the extra point yielding 7 points and field goals yielding 3 points.
I will therefore propose an ordered logistic model instead of a classical poisson distribution to model the scores since this allows for a felxible dirstibution without the introduction of many additional parameters. I will also censor the variable of score differences at a value of 30 and -30 respectively so that all values greater (lower) than 30 (-30) are grouped into one bin, whereas all lower values form a bin of their own. The result is an orderd variable with 31 bins leading to an additional 30 parameters to be estimated compared to a poisson model or bernoulli model when only taking the binary event of win or loss into account.Note: In this case i will create a single bin for every single score difference lower than 31. This is possible since data exist in every bin. If this was not the case one could choose different bins as long as these bins span consecutive score differences and thus the resulting variable keeps the ordinal scale.
The Model Basics
The models main assumption is that a games outcome depends solely on the opponents strengths and a set of team specific covariates and their effect on the score via , e.g. the effect of playing on the home field. The teams strengths are unobserved (latent) variables. The model further assumes that the likelihood of team defeating team and the margin by which team wins - the score difference - rises monotonically in the difference of strenghts (and covariates): Note: This section is very condensed and sacrifices some mathematical precision for the sake of brevity. The code in the implementation section will reveal the precise definitions of all variables.
Where takes the value one if team won and zero if team won. is an integer value containing the censored difference in scores as described above, e.g. if team wins with a scoreline of 24:7 then takes the value 18. If team loses 10:50 this variable takes the value -31, since it will be censored for differences exceeding 30. So if team wins this is a positive integer, else it is a negative integer. is a (strictly) increasing function, e.g. the logistic function. Thus by observing multiple game outcomes between two teams at a time it is possible to infer teams’ strengths and the effects of covariates. The final state-space model described in the following also introduces time periods, i.e. seasons, autocorrelation between these periods and stochastic variances.
Orderd Logistic Stochastic Variance Model
Let where is an arbitrary error term that follows a standard logistic distribution. Let be a symetrical top and bottom censored version of the uncensored difference in scores with an arbitrary censoring limit . This will reduce the length of the parameter vector to be estimated in the case of extreme score differences, and can be necessary for model identification.
This yiels the following probability statements, with being the logistic function. Because of the symmerty imposed in the definition of above and by using the logsitic function for transforming to the ordinal scale the estimated values of can directly be compared to Glickmans models which only use the binary outcome variable :
In the further equations denotes a normal distribution with mean and standard deviation . denotes a log normal distribution with mean and standard deviation on the log scale. denotes an inverse gamma distribution with shape and scale . is the vector of strengths for the first teams, is the vector for the second teams. is the vector of the covariate for the home advantage, which equals 1 if team plays on the home field and -1 otherwise.Note that here are vectors the length of all matches observed containing the appropriate parameter at each position. This leads to multiple occurences of the same parameter within the vector. This also holds for the covariate vector . is the autocorrelation parameter, the parameter for the home field advantage and is the inital standard deviation in period zero. is the parameter for the change in variances and responsible for possible sudden shifts in teams’ strengths between periods. is a vector containing the teams’ stengths in period . Likewise is a vector containing the teams’ variances in period .Note that these vectors have the length of unique teams as opposed to that have the length of all observed games.
Some important implications of this model:
- The teams’ strength increases in tendency if it wins against opponents:
- The higher the score margin the greater the increase
- The better the opponent the higher the increase
- All of these also hold vice versa
- Teams’ strenghts and variances remain constant within a period, i.e. season and vary only between seasons, so the order of matches within the regular season does not have any impact.
- introduces regression to the mean, i.e. it is unlikely that strong teams in one season get even stronger in subsequent seasons This is in reality (at least partially) prevented through the NFL’s draft system that benefits weak teams of season.
- Though sudden shifts in teams’ strengths between periods are allowed (via )
The Model implementation and Estimation
The model is implemented using the rstan package for R which provides an interface to the Stan language. The Stan language offers a rich and very felxible scripting language for bayesian models. It is similar to R in many ways and features easy to use sampling statements. Stan’s main benefits are that it is very flexible, uses the very efficient NUTS algorithm for MCMC sampling and compiles the model code in C++ which makes it fast. All of this comes at the cost of an own language with stronger typing as the typical R user (or at least me) is used to. The model code for Stan is stored in the file
nfl_model_sv_delta.stan and looks as follows:
Calling the estimation procedure from R is pretty straightforward once the data is transformed in R to create the variables defined in the
data part of the above code. This boils in this case down to a simple list of vectors and scalars using the names as defined in the
Here is an overview for the variables that are passed over to stan. This should be pretty clear except maybe for
Nt which contains the cumulative sum of games played per period or put otherwise the cutoffs for the periods in the table of all games as shown above. E.g. the first period is formed by observations 1 to 240, the second from 241 to 478, and so on. Note: The dataset must be orderd by the period variable in ascending order for this to make sense. Also note that the score is always positive. This is because the table is sorted in a way that the winning team is always set at the first position. This is not a restriction because of the model’s symmetry.
d contain the indices of teams and facing each other in a game, that can be found in the above table of teams in the column team_id. These are necessary to create the vectors and .Note that
yd is the vector of real differences and
d is the censored version increased by one. See the code and the comments as why the increase by one is necessary for pure technical reasons.
The call to compile and run the model is also straightforward. I usually use four independent chains (
chains) that can be calucated in parallel. I find that 20k (5k per chain (
iter)) iterations are way enough and stationarity in the traceplots is reached after less than 100 iterations. All of the post warmup iterations are stored (
thin = 1) and used for inference. Calculation takes about 15 minutes on my Intel Xeon E3-1231v3 CPU including model compilation.
Finally this section deals with the results of the parameter inference in the proposed model. The function
GetNflSummary transforms Stan’s output (i.e. the siumulated values for all parameters) into a set of reasonable summaries. It basically calculates posterior means and pointwise 95 percent confidence intervalls on the parameters of interest.
At first the results for the scalar parameters , , and with pointwise 95 percent confidence intervalls. The home field effect is estimated to be at 0.37 with a pretty tight confidence intervall. This means a game between teams with equal strength leads to a 59.03 percent chance of the home team winning in expectation. The parameter value of is quite small indicating that sudden shifts in teams’ strengths are quite unlikely. indicates a siginficant regession to the mean.
The next chart displays the teams’ strengths over time with the corresponding 95 percent confidence intervalls depicted as dotted lines.
One can see especially large confidence intervalls for the Houston Texans up to period 7. This is because there is no data on this team before period 7 (i.e. season 2002). So the model assumes a strength for that team that is based on the prior on with a large variance since there is no evidence on the teams’ strenght unil this period. It shows that the Arizona Cardinals improved quite substantially over the last years. The San Francisco 49ers faced a downturn in strength over the last two years after a peak in periods 17 and 18 (2012 and 2013 seasons). The New England Patriots were strong in every season of the past 10 years it seems. The St. Louis Rams seem to have the most volatility in their strength in the last 20 seasons, with a peak around period 5 (season 2000) and some low performances around period 13 (season 2008). The model estimates that the Seatlle Seahwaks and the Kansas City Chiefs were particulary strong within the last three seasons. And finally an overview of the teams’ strengts gamma in the 2015 regular season including posterior standard deviations gama_std_dev of the estimated strength:
|New England Patriots||19||1.0832||0.4066|
|Kansas City Chiefs||16||0.8763||0.4186|
|Green Bay Packers||12||0.6364||0.3887|
|New York Jets||22||0.1825||0.3640|
|St. Louis Rams||29||-0.0986||0.3911|
|New York Giants||21||-0.1796||0.3451|
|San Diego Chargers||26||-0.2211||0.3500|
|New Orleans Saints||20||-0.3618||0.3862|
|San Francisco 49ers||27||-0.7056||0.4055|
|Tampa Bay Buccaneers||30||-0.7229||0.3789|
The model finds the Arizona Cardinals as best team of the 2015 season closely followed by the New Englad Patriots. Then come the Seattle Seahwaks and the Carolina Pathers with almost equal strength. This seasons’ Super Bowl contenders are estimated to be the fourth (Carolina Panthers) and seventh (Denver Broncos) strongest teams in the 2015 regular season. Also note that the uncertainty of the teams’ strength is quite large and between about 0.3 and 0.4 for all teams, so it is not possible to say that one of the named teams is statistically significantly stronger or weaker than the another team.
Predictions for Super Bowl 50
The model allows to make predictions not only on which team is likely to win or loose but also by which margin. For the margin the estimated cutoff parameters are important, since these cut the continous latent predictor into the observed bins of scorelins. The estimated pointwise posterior confidence intervalls are very narrow indicating that the model can precisly estimate these variables.Note that the parameter displayed as
eta_0 in the table is not an estimation but is fixed for symmetry.
The probability statements from above make it easy to calculate probabilities for score differences for given teams. Here is a graph of the probabilities for Super Bowl 50 as seen for the Carolina Panthers, e.g. a positve score difference, depicted as the green distribution, indicates that the Carolina Panthers win the Superbowl.
The probability of the Panthers winning the Super Bowl () is predicted to be 56.32 percent, vice versa the Denver Broncos win with a probability of 43.68 percent. Through the imposed symmetry contraints for the cutoff paramters the distribution of score differences is the same in both scenarios. May the Super Bowl be as close as predicted!
Thank you for reading through this lengthy blog entry! The full code is already availabe on GitHub in my private repository, but needs a bit of cleaning and will then be made available for everyone that is interested. Please let me know what you think, either in the comments or contact me directly.
References & Resources
- Glickman, Mark E. (2001): Dynamic paired comparison models with stochastic variances. Journal of Applied Statistics, 28(6), pp. 673–689.
- Stan Development Team (2015): Modeling Language User’s Guide and Reference Manual, v2.9.0. http://mc-stan.org/documentation/.
- Pro Football Reference. (2015) NFL Data for Seasons 1996-2015. http://www.pro-football-reference.com/boxscores.