TNG Ratings Origin and Methodology
A deep dive on how our ranking system was derived and what it all means
As we mentioned on our About page, the model we’ve developed which is the basis for nearly everything that we’ll discuss on this site is built on the foundation of the Elo rating system. In this post, we’ll walk you through the basics of the Elo system itself, how we arrived at the decision to use it as our basis for analysis, and finally how we prepared the model for the 2022 6A season. We’re going deep in this one because what we’ll share here underlies everything you’ll see on this site.
The highlights first…
In case you just want the quick rundown or in case you’re coming back to this for a refresher, here are the highlights:
Elo ratings are simple at their core and have been used extensively in football
Before we knew of Elo, we had created our own metrics to aid in comparing teams, called AOS, ADS, and DOM and leaned on those heavily in our time running 5ATexasFootball.com (2001-2011)
Last fall, we returned to those old metrics, curious to see how some recent teams compared to those of the past. One thing lead to another and we ended up creating a 20,000 game dataset of all top classification games played from 2008-2021
We found that DOM retroactively picked 85% of winners and had a link to point spreads, but it was limited in predictive capabilities. Eventually we found Elo and were able to link the two.
We built working Elo models in Excel and Python to test and tune the model, eventually creating a tool that optimized picking winners just over 76% of the time with acceptable point spread error, using our entire 2008-2021 scores database
Further tuned our system to account for regional disparities based on results from over 2,000 games between 8 sub-regions
To give the 2022 model the best chance at success, we analyzed all 249 teams recent histories and looked at returning players to set a Preseason Elo rating
With all of that in place, we had the basis to provide preseason rankings, district picks, projected D1/D2 playoff teams, regional rankings and much, much more
That’s it! If you want to read the complete story of our journey with more detail on each of the above bullets, read on…
So what is Elo and how does it work?
You can read the finer details of the inner-workings of the Elo system on this great wiki page that we’ve linked several times, but here we’ll try to break it down in simple terms and relate it directly to football and to our specific model. At its most basic, the Elo rating system, or just “Elo” (or ELO), creates a rating for each contestant based on the idea that an “average” rating should be roughly 1500 (we’ll come back to that). In a matchup between two contestants, the one with the higher Elo rating would be favored by some percentage over the other depending on the values of the parameters in use in a particular model. Not only does the Elo rating difference equate to a win probability for both teams, it turns out that, as referenced in fivethirtyeight.com’s NFL model, an Elo rating difference of 25 equates to a 1 point projected margin for higher rated team, in football terms. So, an Elo difference of 250 between two teams would result in higher rated team being a 10 point favorite going into the game. While all of that is pretty darn cool in itself, the real magic of Elo happens between games.
Once a game is played, Elo applies an adjustment upward for the winner, and an adjustment downward for the loser. The magnitude of that adjustment is the same for each team, just in opposite directions. Just how great the magnitude of those adjustments are depends on the expectations for each team going into the game. A team that’s a heavy favorite that wins will get a small positive rating adjustment and the losing underdog would get a small negative adjustment. In that case, each team did what was expected, so the ratings don’t change much. If the heavily favored team were to lose, the post game adjustments would be much, much larger with the losing favorite getting a rather massive negative correction to their rating and the winning underdog receiving a substantial boost to their rating. Once those adjustments are applied, each team then has a new rating to set the projection/expectation for their next matchup, and on and on it goes throughout the season. That is Elo in a nutshell. At its core, it’s very simple. It’s a system that allows comparison of any two teams, normalizing an average team to a rating of 1500, that can generate new ratings, new win probabilities, and new predicted point spreads each time a game is played, theoretically getting better as it takes in more information.
How did we arrive at our model and why use Elo?
Way back when we created and ran 5ATexasFootball.com (2001-2011), knowing nothing of Elo, we took on the challenge of ranking the top teams across the great state of Texas week to week and were quickly and repeatedly humbled by the sheer difficulty of that task. How could we fairly and accurately compare teams in West Texas to teams in Houston; heck, how could we really compare teams in Katy to teams as near as those in Cypress? Sure there’s the good ole “eye test”, but how many teams could we really see? Game stats were and, sadly, still are difficult to come by for many teams (why is that?!). We could go to the “Team A beat Team B, and Team B beat Team C, so Team A must be better than Team C” bit, but that was as unreliable as it always had been and always will be.
Frustrated by all of that and wanting to do better, we developed a few tools to give us some deeper insight based on what was available for every team…scores. We ended up developing and tracking some relative scoring metrics which we dubbed AOS (adjusted offensive scoring), ADS (adjusted defensive scoring), and DOM (short for dominance; DOM = AOS - ADS). These metrics looked beyond simple points per game, point allowed per game, and scoring margin to consider how a team performed relative to what their opponents had otherwise allowed, and we found them to correlate extremely well with a team’s overall season performance and they served us well in our analysis throughout our time running the site.
Fast forward to the fall of 2021. Somewhat by chance, we started messing around with our old AOS, ADS, and DOM data. We were curious to see, according to our old methods, how dominant was this ‘21 Westlake team? How was ‘21 Katy performing compared to years past? From there, we started looking back. How about North Shore ‘18 & ‘19? How did they compare to ‘16 Lake Travis, ’15 Katy, 13 Allen? We even ran some one-off manual calculations of our old game prediction model applied to some recent games and found that it still worked fairly decent. The results we found were compelling and left us hungry to collect more data.
We went back all the way through the 2008 season and amassed a database of roughly 20,000 game results comprising every top classification game over those 14 seasons. We ran our calculations for AOS, ADS, and DOM on that dataset and found that, our metrics didn’t just hold up well, but that DOM alone retroactively identified the winner 85% of the time. We even found a link between DOM and point spreads. We were beyond excited, but also realized the limitations of what we were looking at since we were using a completed season’s data to look back at results that happened within it. For DOM to be predictive, we’d need to calculate it as a running average up to each game’s start, but we knew from our prior experience and from our massive dataset that it took until Week 9 or 10 for DOM to smooth out within a given season. What could we do to maybe preset a rating and discount that rating as we gathered real scores in the early weeks of a season? We now had historical DOM data for every program for the last 14 years, so we had a great basis for seasonal starting values, but how could we link it all together to fill in that massive gap from Week 0 to Week 9 or 10? That’s when we found our way to Elo.
Any Elo model can be started simply by guessing at reasonable parameters and by starting every team at the average score of 1500. As scores are entered, the teams will separate in rating according to their performance from week to week for as long as you track them. Curious to see how well that most basic Elo model would do with our data, we started small, using just the score data from 2019 to 2021 and built an Elo model in Excel. Since DOM worked so well retroactively and since Elo was built to be predictive, could we link the two in any way to take advantage of all of our prior work? Even in the earliest iterations of our 3 year Excel Elo model, the answer was a resounding, yes! We tuned the model until we found a set of parameters that optimized correctly picked games while minimizing point spread error and found that our model could proactively pick winners at a 76+% clip. Because of the sheer size of our database and the number of calculations being run, we moved on to Python for the next testing phase. We then loaded all 14 years of data into Python to see if the model would hold up over the long haul. Even with an additional 11 years of data, our Python model arrived at similar rating endpoints as as did our 3 year Excel model tuned with starting values. Our Python model was still picking winners at over 76% and it was predicting similar point spreads as the Excel Elo model and the retroactive DOM model. At that point, it was on. We had our model.
Preparing the model for 2022
Having built a huge dataset and having developed a working predictive model, we turned our eyes toward what we could do with it. We now had the basis for comparing any two teams, current and past, and we could project winners between any of them, but there was still some clean-up work to be done.
One of our long-standing challenges going back to when we ran the old website was comparing results from across the state due to our observations of differences in quality between different areas of the state. In order for our ratings and projections to fully work, we needed to solve for the magnitude of those differences. We took every team in our database and tagged them in one of 8 sub-regions. We analyzed over 2,000 games occurring between each of those sub-regions and were able to quantify those differences, if they existed, based on the actual results of those game compared to the expected results according to our metrics.
The last, and most important, part of our process in preparing our model for the upcoming season was to generate our starting Elo ratings for each team. The better job we could do there, the better our model outputs would be. As mentioned above, Elo can work by simply starting each team at 1500, but clearly it would not be acceptably accurate in those early weeks of the season. We could also simply use the ending 2021 Elo ratings and use those to feed our 2022 model, but that doesn’t account for teams who lost key players or those who were bringing back large percentages of their contributors from last season. To get the best possible early season ratings, we needed to analyze each of the 249 6A teams and what they had coming back from last year, also taking into account their historical performance and apparent ability to replace successful classes. So, we did just that and created preseason ratings for every team, complete with sub-regional adjustments to provide a platform to deliver preseason rankings, district picks, projected D1/D2 playoff teams, regional rankings and much, much more right out of the gate. Over the summer, we’ll try to highlight some of our historical findings as far as best rated teams, best games, and biggest upsets. Once the season starts, we’ll be able to generate game-by-game picks, updated ratings, trending teams, and we’ll be able to update our projected playoff picture as we go along. We’ll look back on how the model performed each week and we’ll continuously look back to these preseason picks to see how we did and to ultimately improve the model.
We’re not exactly sure where this is going to go, but we’re excited for y’all to join us along the way!