How The FA used tech to get the ball rolling

For millions of football fans across the U.K. and around the world, the return of live matches in the English Premier League was a long-awaited milestone in the recovery from the COVID-19 pandemic.

Enter Project Restart: the nickname given to the Premier League’s attempts to resume the season while ensuring the safety of players and fans. But with self-distancing as one of the key preventive measures against COVID-19, how could the safety of players be ensured when they’re interacting on the pitch? We at The Football Association (FA) were proud to have partnered with the Premier League to help in this aspect of the project.

A critical area addresses the challenge of ensuring players can interact at peak levels while observing the self-distancing norms still recommended by health authorities. To do this, we created a new analysis of thousands of hours of match play, and used machine learning technology to tell us about contact risk during a 90-minute football match.

We looked at all 380 games from the 2018/19 Premier League season, and the 288 pre-lockdown games from the 2019/20 Premier League season. Incredibly, this showed us over 40 billion interactions between players, captured in 100 million video frames which collectively made up 10 terabytes of data. Even the longtime players, coaches, and fans among us were staggered by how much goes on, even in one game.

Our system tracked players on the field at a rate of four-one hundreths of every second, ensuring we could analyse every interaction for concern about possible exposure. We employed the Exponential Model, developed by Danish public health academics, which at the time was considered the most accurate modelling of virus transmission during a football match. 

The model focuses on the 1.5 metre radius around each player, paying strict attention to the two second rate of decay, or half life, that COVID particles typically have in infecting a person in certain environmental conditions. Staying on the safe side, we employed a simplified model, which considered a player that is within two metres of an infected player during the half-life of the virus to be 100% exposed. 

As you may have guessed, all of this work involved gathering and analysing a tremendous amount of data from multiple sources, on some of the most advanced computing available. Working with Google Cloud, we used Google BigQuery to store the data and run a built-in machine learning model based on the simplified Model. BigQuery looked at an average of 145,000 rows of data per game analysed, examining every frame of tracking data for distance between all pairs of players on the pitch throughout an entire match. This fast and powerful toolset was critical to our success.