We use a detection-based approach to localize players on the pitch. Specifically, we use YOLO, an open-source object detection framework built on top of a deep learning library called Darknet (by Joseph Redmon). In the rest of this blog, we will take you through some of the key features of this framework, as well as the domain-specific training schema we came up with to customize YOLO for use in football.
Why we like YOLO
One of the most important aspects of the BallJames system is the ability to detect and track the players and the ball in real-time using deep learning approaches. This allows for applications in match tactics, live monitoring of player fitness, fan engagement in media, and betting industries. However, deep learning models are notorious for their computational complexity, given the large number of computations required to process every video frame. This adds up quickly for a tracking system that needs to process 25-50 video frames per second (FPS) from multiple high-resolution cameras.
YOLO has been created with such real-time applications in mind. With a standard video resolution of 640*480 pixels, a 31-layer YOLO model can run at 25 FPS on a reasonable consumer GPU. In our case, 14 cameras each generate videos of 3840×2160 pixels. To get YOLO running in real-time on this amount of data, we utilize massive parallelizations with an elastic GPU cluster in the cloud. But even with such parallelization, we need YOLO models of less than 31 layers to scale up the processing of the BallJames video streams.