At the end of every NFL regular season, football analysts and fans everywhere try their best to predict who will end up in the Super Bowl. Analyst predictions often have merit, coming from experts who study the sport in great depths or have insider knowledge about nuances and subtleties that may tip a game in favor of a given team. However, they are primarily subjective or qualitative, and they may be colored by inherent biases leading to inaccurate predictions.
Predictions using mathematical models based on statistical measures are becoming more commonplace. These models can be based on a variety of factors, such as player performance, wins and losses and strength of schedule. I developed a mathematical model based on regular-season team statistics to predict which teams would represent their respective conferences in Super Bowl 50. Using data from 2004 to 2011, the model correctly predicted the last two Super Bowl matchups (the Denver Broncos and Seattle Seahawks in Super Bowl XLVIII and the Seahawks and New England Patriots in Super Bowl XLIX), suggesting that it has some value.
By applying the same model to the 2015 regular season, I developed the following predictions:
That's right -- based on this model, you can expect the Bengals and Cardinals to square off in Super Bowl 50.
The model incorporates numerous factors, including three (expected points contributed by offense; simple rating system, or SRS; and offense simple rating system, or OSRS) used by Pro Football Reference to measure teams' offensive efficiency and quality. The expected points statistic reflects the fact that all yards are not created equal, illustrating which teams are able to make the most of their offensive opportunities. That is, a 12-yard gain on third-and-20 adds more to a team's yardage total than a 3-yard gain on third-and-1, but the latter play is more valuable. This measurement is meant to account for that difference. The simple rating system statistics are a measure of a team's caliber relative to the league average, based on margin of victory and strength of schedule. The model also rewards wins and penalizes teams that punt or turn the ball over frequently. In other words, teams that are efficient on offense, play well against tough opponents, and take care of the football will rate higher than those that don't.
Perhaps somewhat surprisingly, the Bengals (third seed) and Steelers (sixth seed) are favored to represent the AFC over the higher-seeded Broncos (No. 1) and Patriots (No. 2). The NFC predictions are more in line with seeding, as the top two seeds -- the Cardinals (No. 2) and Panthers (No. 1) -- are ranked highest.
The model suggests New England was a much stronger team relative to the competition heading into the 2014 playoffs than it is going into the 2015 postseason -- regardless of Tom Brady and Bill Belichick's reputation for playoff dominance. It also suggests the Bengals are better poised to make the Super Bowl than in previous years -- regardless of their reputation for playoff futility.
Of course, the modeling approach is not perfect. First, it does not account for changes in key personnel, such as the fact that the Bengals will start backup AJ McCarron at quarterback, or that the Steelersexpect to be without running back DeAngelo Williams, or that the Patriots could welcome Julian Edelman back to the lineup. Consider that the model predicted the Steelers would thrive in the 2014 postseason, a prediction that was thwarted in part because Pittsburgh's replacement running backs (Ben Tate and Josh Harris at the time) could not fill the shoes of injured starter Le'Veon Bell in the wild-card loss to the Ravens. That said, McCarron's passer rating (97.1) bodes well for the Bengals this season, while the effect of Williams' absence is tougher to predict, as we don't know what to expect from projected replacements Jordan Todman and Fitzgerald Toussaint.
Second, since the model was created using data from previous years, the PAT rule change (longer extra-point kicks) might add an unexpected wrinkle that was not quantified.
The model also does not put more weight on recent performance, meaning the Seahawks' 6-2 finish, the Chiefs' 10-game win streak, the Steelers' unexpected loss to the Ravens in Week 16 or the Bengals' 4-4 second half are not necessarily reflected. However, between 2012 and 2014, just one of 10 teams that won their last four regular-season games made the Super Bowl. This is not a definitive conclusion, given the small sample size, but that data suggests late-season streaks and momentum might not be as important as some think.
Interestingly, punt return stats ended up having some predictive value. This aspect of the game often gets overshadowed, but it suggests a dynamic return man like Seahawks rookie Tyler Lockett could provide a significant boost.
It's important to remember something about football: a few key plays can drastically change the outcome of a game. Consider, for example, the 2014 NFC title game, in which the Seahawks completed a late comeback thanks to a number of seemingly improbable plays, including a botched onside kick recovery. This model can predict who's more likely to win, but it can't, obviously, account for this unpredictable element of football. A play such as a red zone pick-six could, potentially, cause a 14-point swing. This makes modeling football challenging -- but it also makes it fun.
So while this model is far from perfect, it does offer an objective prediction based on team performance over the entire season. This was sufficient to predict the last two Super Bowl matchups. If the same holds for this year, then we can plan on watching Cincinnati and Arizona battle on football's biggest stage a few weeks from now.
For more than a decade, Nasir Bhanpuri, PhD, has been applying analytics and modeling techniques to address challenges in a wide range of fields, including sports, healthcare, fitness, education, neuroscience, robotics, wearables and music. He is currently a member of the Clinical Analytics team at NorthShore University HealthSystem, a Chicago-area hospital network.