Statistical model

Consider a game between team A and team B. In our model, the number of goals scored by team A are Poisson distributed with parameter (a number) L(A,B). This means that we expect that team A will score about L(A,B) goals in a match against team B. Here

\[ L(A,B)=\text{Normal number of goals}\times \frac{\text{Strength of team A}}{\text{Strength of team B}} \]

“Normal number of goals” is a parameter (a number) which is interpreted as the average number of goals scored by one team in a match between to equally good teams. “Strength of team A” is a parameter (a number), which tells how good team A is, whereas “Strength of team B” tells how good team B is. The strength of Germany is fixed to 100, and the strengths of all other teams are given relative to this.

Further, the number of goals scored by team B are Poisson distributed with parameter \(L(B,A)\), independent of the number of goals scored by team A.

This means that if team A has a higher strength than team B, we will expect that team A wins, because L(A,B) will be greater than L(B,A). However, it will still be a possibility for a draw, or that team B wins.

Of course, our simple model is not able to cover all important aspects of a football match. However, in the championship, few match results will be available for parameter estimation, so a simple model is needed to avoid overfitting. However, in a national league, with many games over a season, one may consider several extensions to our model. These include

  • Home-team advantage
  • Offense strength and defense strength
  • That the strengths of the teams vary over the season
  • Number of goals of each team are correlated

In the later years, several articles about this theme have been published in the statistical literature. A good introduction is Lee, A. (1997), “Modeling Scores in the Premier League: Is Manchester United Really the Best?”, Chance, Vol 10, pp. 15-19.

Parameter estimation

The parameters in the model are “Normal number of goals” and the strengths of each team. These parameters must be estimated before any probability calculations can be done. Before the start of the tournament, the estimation is based on evaluations from several Norwegian football experts. The experts have guessed the results of several hundred hypothetical games, and these results are tranferred to number values of the parameters.

When the tournament starts, the real games are taken into account as well. The information value of the hypotethical games (the expert guesses) are weighted versus the real games, such that the hypothetical games are equally important the real games when all teams have played two matches. When all teams have played three or more matches, the real games are the most important part in the estimation of the parameters.

Estimating the parameters means to find the parameter values that fit the data (the match results) as good as possible. In our case, the parameters are estimated by maximising a modified Poisson likelihood. The difference from an ordinary Poisson likelihood is that it is robustified by downweighting large victories and by adding a penalty term that shrinks the individual strength parameters towards a common mean.

Estimated strength of all teams

The current estimate of “Normal number of goals” is 1.17.

The estimated strength values of each team is given in the table to the right (sorted), together with the FIFA ranking per May 10, 2021. The discrepancy between the strength and the FIFA ranking is due to the fact that the expert opinions differ somewhat from the FIFA ranking. In addition, as the tournament progresses, teams with good results will obtain high strength, even though they might have a low FIFA ranking.

Last update: Jul 12 2021 00:01