Statistical model
Consider a game between team A and team B. In our model, the number of goals scored by team A are Poisson distributed with parameter (a number) L(A,B). This means that we expect that team A will score about L(A,B) goals in a match against team B. Here
\[ L(A,B)=\text{Normal number of goals}\times \frac{\text{Strength of team A}}{\text{Strength of team B}} \]
“Normal number of goals” is a parameter (a number) which is interpreted as the average number of goals scored by one team in a match between to equally good teams. “Strength of team A” is a parameter (a number), which tells how good team A is, whereas “Strength of team B” tells how good team B is. The strength of Germany is fixed to 100, and the strengths of all other teams are given relative to this.
Further, the number of goals scored by team B are Poisson distributed with parameter \(L(B,A)\), independent of the number of goals scored by team A.
This means that if team A has a higher strength than team B, we will expect that team A wins, because L(A,B) will be greater than L(B,A). However, it will still be a possibility for a draw, or that team B wins.
Of course, our simple model is not able to cover all important aspects of a football match. However, in the championship, few match results will be available for parameter estimation, so a simple model is needed to avoid overfitting. However, in a national league, with many games over a season, one may consider several extensions to our model. These include
- Home-team advantage
- Offense strength and defense strength
- That the strengths of the teams vary over the season
- Number of goals of each team are correlated
In the later years, several articles about this theme have been published in the statistical literature. A good introduction is Lee, A. (1997), “Modeling Scores in the Premier League: Is Manchester United Really the Best?”, Chance, Vol 10, pp. 15-19.
Parameter estimation
The parameters in the model are “Normal number of goals” and the strengths of each team. These parameters must be estimated before any probability calculations can be done. In previous championships, we have used evaluations from Norwegian football experts. This time, we use result from the six previous European championships (2000-2021) together with each team’s FIFA ranking at the start of each tournament and estimate a relation between each team’s FIFA ranking and their strength. We then use another statistical model, which in addition to the FIFA rankings accounts for the advantage for a team to play in its own country (home advantage) and an extra effect beyond what can be explained by the FIFA rankings for each of those teams that have played in at least four of the last six championships.
The home advantage is estimated to be quite large. This year, all matches are played in Germany, and the estimated home advantage is therefore included in the strength of Germany. Italy has the largest extra effect beyond what can be explained by the FIFA rankings, though much less than the home advantage. This means that Italy often has performed better in the previous championships than their FIFA ranking would indicate. The estimated strength for Italy in this year’s tournament is therefore slightly better than their FIFA ranking would indicate. In total, this means that the strength of each team will mainly follow the FIFA rankings before the tournament starts, but with small adjustments for some teams and with a large upward adjustment for Germany.
When the tournament starts, the real games are taken into account as well. The information value of previous championships is weighted versus the match results in this year’s tournament, such that they are equally important when all teams have played three matches.
Estimating the parameters means to find the parameter values that fit the data (the match results) as well as possible. In our case, the parameters are estimated by maximizing a modified Poisson likelihood. The difference from an ordinary Poisson likelihood is that it is robustified by downweighting large victories.
Estimated strength of all teams
The current estimate of “Normal number of goals” is 1.12.
The estimated strength values of each team are given in the table to the right (sorted), together with the FIFA ranking per April 4, 2024. The discrepancy between the strength and the FIFA ranking may differ slightly before the tournament, as explained above. As the tournament progresses, teams with good results will obtain high strength, even though they might have a low FIFA ranking.
Last update: Jul 14 2024 23:02