試合映像を使ったボディポジショニングの自動追跡

FC Barcelona Sports Analytics Summit（2019）で発表された論文のサマリーの日本語訳です。

力不足でうまく訳せていないところがあります、すんませんがご了承ください。

また、こなれた訳ではないのは明らかなので、都度更新していく予定です、よろしくお願いします。

特有あるいは専門的で、一般的でない表現についてはAppendix 1をご参照ください。

AUTOMATED TRACKING OF BODY POSITIONING USING MATCH FOOTAGE
試合映像を使ったボディポジショニングの自動追跡

A team of imaging processing experts from the Universitat Pompeu Fabra in Barcelona have recently developed a technique that identifies a player’s body orientation on the field within a time series simply by using video feeds of a match of football.
バルセロナのポンペウファブラ大学の画像処理の専門家チームは、最近、サッカーの試合のビデオフィードを使用して、時系列内のフィールドでのプレーヤーの体の向きを識別する技術を開発しました。

Adrià Arbués-Sangüesa, Gloria Haro, Coloma Ballester and Adrián Martín (2019) leveraged computer vision and deep learning techniques to develop three vector probabilities that, when combined, estimated the orientation of a player’s upper-torso using his shoulder and hips positioning, field view and ball position.
AdriàArbués-Sangüesa、Gloria Haro、Coloma Ballester、AdriánMartín（2019）は、肩と腰の位置、視野、ボールの位置を使用し、それらを組み合わせプレーヤーの上体の方向を推定する3つのベクトル確率を開発するために、コンピュータービジョンとディープラーニングのテクニックを活用しました。

This group of researchers argue that due to the evolution of football orientation has become increasingly important to adapt to the increasing pace of the game.
この研究者グループは、サッカーの方向性の進化により、試合のペースの増加に適応することがますます重要になっていると主張しています。

Previously, players often benefited from sufficient time on the ball to control, look up and pass. Now, a player needs to orientate their body prior to controlling the ball in order to reduce the time it takes him to perform the next pass.
以前は、プレーヤーはボールをコントロールし、顔を上げてパスするのに十分な時間がありました。今や、プレーヤーは次のパスを実行するのにかかる時間を短縮するために、ボールをコントロールする前に体の向きを調整する必要があります。

Adrià and his team defined orientation as the direction in which the upper body is facing, derived by the area edging from the two shoulders and the two hips.
Adriàと彼のチームは、上半身が向いている方向としてオリエンテーションを定義しました。これは、肩の2点と腰の２点から縁取りされた領域によって導き出されます。

Due to their dynamic and independent movement, legs, arms and face were excluded from this definition.
動的で独立した動きのため、脚、腕、顔はこの定義から除外されました。

To produce this orientation estimate, they first calculated different estimates of orientation based on three different factors:pose orientation (using OpenPose and super-resolution for image enhancing), field orientation (the field view of a player relative to their position on the field) and ball position (effect of ball position on orientation of a player).
オリエンテーションの推定値を作成するために、彼らは最初に3つの異なる要因に基づいてオリエンテーションの異なる推定値を計算しました。姿勢のオリエンテーション（OpenPoseと高解像化のための超解像を使用）、フィールドのオリエンテーション（フィールド上の位置に対するプレイヤーの視野）そしてボールの位置（ボールの位置がプレーヤーの向きに与える影響）。

These three estimates were combined together by applying different weightings and produce the final overall body orientation of a player.
これらの3つの推定値は、異なる重みを適用することで結合され、プレーヤーの最終的な全身のオリエンテーションが生成されます。

1. Body Orientation Calculated From Pose
1. 姿勢から計算された身体のオリエンテーション

The researchers used the open source library of OpenPose. This library allows you to input a frame and retrieve a human skeleton drawn over an image of a person within that frame. It can detect up to 25 body parts per person, such as elbows, shoulders and knees, and specify the level of confidence in identifying such parts.It can also provide additional data points such as heat maps and directions.
研究者はOpenPoseのオープンソースライブラリを使用しました。このライブラリを使用すると、（映像）フレームを入力し、そのフレーム内の人物の画像上に描かれた人間の骨格を取得できます。肘、肩、膝など、以下にあるように1人あたり最大25個の身体部位を検出し、そのような部位を特定する際の信頼度を指定できます。また、ヒートマップや向きなどの追加のデータポイントも提供できます。

However, unlike in a closeup video of a person, in sports events like a match of football players can appear in very small portions of the frame, even in full HD frames like broadcasting frames. Adrià and team solved this issue by upscaling the image through super-resolution, an algorithmic method to image resolution by extracting details from similar images in a sequence to reconstruct other frames.In their case, the researcher team applied a Residual Dense Network model to improve the image quality of faraway players. This deep learning image enhancement technique helped researchers preserve some image quality and detect the player’s faces through OpenPose thanks to the clearer images. They were then able to detect additional points of the player’s body and accurately define the upper-torso position using the points of the shoulders and hips.
ただし、人物のクローズアップビデオとは異なり、サッカーの試合などのスポーツイベントでは、放送などのフルHDフレームでも、選手はフレームの小さなところに表示されることがあります。Adriàとチームは、シーケンス内の類似した画像から詳細を抽出して他のフレームを再構築することにより、画像の解像度を求めるアルゴリズム的な手法である超解像により画像をアップスケーリングすることでこの問題を解決しました。彼らの場合、研究チームは、遠隔地のプレイヤーの画質を改善するために、Residual Dense Networkモデルを適用しました。この深層学習画像強化技術により、研究者は、より鮮明な画像のおかげで、ある程度の画像品質を維持し、OpenPoseを通じてプレイヤーの顔を検出することができました。その後、プレーヤーの体の追加ポイントを検出し、肩と腰のポイントを使用して胴体上部の位置を正確に定義することができました。

Once the issue with image quality was solved by researchers and the player’s pose data was then extracted through OpenPose, the orientation in which a player was facing was derived by using the angle of the vector extracted from the centre point of the upper-torse (shoulders and hips area). OpenPose provided the coordinates of both shoulders and both hips, indicating the position of these specific points in a player’s body relative to each other. From these 2D vectors, researchers could determine whether a player was facing right or left using the x and y axis of the shoulder and hips coordinates.
For example, if the angle of the shoulders shown in OpenPose is 283 degrees with a confidence of 0.64, while the angle of the hips is 295 degrees with a confidence level of 0.34, researchers will use the shoulders’ angle to estimate the orientation of the player due to its higher confidence level. In cases where a player is standing parallel to the camera and the angles of either the hips or the shoulders are impossible to establish as they are all within the same coordinate in the frame, then researchers used the facial features (nose, eyes and ears) as a reference to a player’s orientation, using the neck as the x axis.
画質の問題が研究者によって解決され、プレーヤーの姿勢データがOpenPoseによって抽出されると、上部トルソー（肩と腰の領域）の中心点から抽出されたベクトルの角度を使用して、プレーヤーが向いている方向が導出されます。
OpenPoseは両肩と両腰の座標を提供し、プレーヤーの身体におけるこれらの特定のポイントの相対的な位置を示します。研究者はこれらの2Dベクトルから、肩と腰の座標のx軸とy軸を使用して、プレイヤーが右向きか左向きかを判断できました。たとえば、OpenPoseに示されている肩の角度が0.64の信頼度で283度であるのに対し、腰の角度は0.34の信頼レベルで295度である場合、研究者は、信頼度が高いため肩の角度を使用して、プレーヤーのオリエンテーションを推定します
プレーヤーがカメラと平行に立っており、腰または肩の角度がすべてフレーム内の同じ座標内にあるため、確定できない場合、研究者は首をx軸として、顔の特徴（鼻、両目、両耳）をプレーヤーのオリエンテーションの基準として使用しました

This player and ball 2D information was then projected into the football pitch footage showing players from the top to see their direction. Using the four corners of the pitch, researchers could reconstruct a 2D pitch positioning that allowed them to match pixels from the footage of the match to the coordinates derived from OpenPose. Therefore, they were now able to clearly observe whether a player in the footage was going left or right as derived by their model’s pose results.
この選手とボールの2D情報は、サッカーのピッチの映像に投影され、上から選手の方向を見ることができます。
ピッチの4つの隅を使用して、研究者は2Dのピッチの位置を再構築し、試合映像のピクセルをOpenPoseから派生した座標に一致させることができました。したがって、モデルの姿勢推定の結果に基づいて、映像内のプレーヤーが左に行くのか右に行くのかを明確に観察できるようになりました。

In order to achieve the right level of accuracy in exchange for precision, researchers clustered similar angles to create a total of 24 different orientation groups (i.e. 0-15 degree, 15-30 degrees and so on), as there was not much difference in having a player face an angle of 0 degrees or 5 degrees.
精度と引き換えに適切なレベルの正確度を達成するには、研究者は、プレーヤーが0度または5度の角度を向いていることに大きな差はなかったため、同様の角度をクラスター化して、合計24の異なるオリエンテーショングループ（つまり、0-15度、15-30度など）を作成しました。

2. Body Orientation Calculated From Field View Of A Player
2. プレーヤーの視野から計算された身体のオリエンテーション

Researchers then quantified field orientation of a player by setting the player’s field of view during a match to around 225 degrees. This value was only used as a backup value in case of everything else fails, since it was a least effective method to derive orientation as the one previously described.
The player’s field of view was transformed into probability vectors with values similar to the ones with pose orientation that are based on y coordinates. For example, a right back on the side of the pitch will have its field of view reduced to about 90 degrees, as he is very unlikely to be looking outside of the pitch.
次に、研究者は試合中にプレーヤーの視野を約225度に設定することにより、プレーヤーの視野のオリエンテーションを定量化しました。この値は、他のすべてが失敗した場合のバックアップ値としてのみ使用されました。これは、前述のようにオリエンテーションを導出するのに最も効果的な方法ではなかったためです。プレーヤーの視野は、y座標に基づく姿勢のオリエンテーションの値と同様の値を持つ確率ベクトルに変換されました。
たとえば、ピッチの外を見る可能性が非常に低いので、ピッチの右側の後ろの視野は約90度に縮小されました。

3. Orientation Calculated From Ball Positioning
3. ボールの位置から計算されたオリエンテーション

The third estimation of player orientation was related to the position of the ball on the pitch.This assumed that players are affected by their relative position in relation to the ball, where players closer to the ball are more strongly oriented towards it while the orientation of players further away from it may be less impacted by the ball position. This step of player orientation based on ball position accounts for the relative effect of ball position.Each player is not only allocated a particular angle in relation to the ball but also a specific distance to it, which is converted into probability vectors.
プレーヤーのオリエンテーションの3番目の推定は、ピッチ上のボールの位置に関連していました。これは、プレーヤーがボールに対する相対的な位置の影響を受けることを前提としており、ボールに近いプレーヤーの方がより強くボールに向かっている一方、ボールから遠いプレーヤーのオリエンテーションはボールの位置による影響が少ない可能性があります。ボールの位置に基づいたプレーヤーの方向付けのこのステップは、ボールの位置の相対的な効果を説明します。各プレーヤーには、ボールに対する特定の角度だけでなく、ボールまでの特定の距離も割り当てられ、確率ベクトルに変換されます。

Combination Of All The Three Estimates Into A Single Vector
3つの推定値すべてを1つのベクトルに統合

Adrià and the research team contextualized these results by combining all three estimates into as single vector by applying different weights to each metric. For instance, they found that field of view corresponded to a very small proportion of the orientation probability than the other two metrics. The sum of all the weighted multiplications and vectors from the three estimates will correspond to the final player orientation, the final angle of the player. By following the same process for each player and drawing their orientation onto the image of the field, player movements can be tracked during the duration of the match while the remain on frame.
Adriàと研究チームは、各メトリックに異なる重みを適用することにより、3つの推定値をすべて単一のベクトルとして組み合わせることにより、これらの結果を文脈化して解釈することを可能にしました。たとえば、彼らは視野が他の2つのメトリックよりもオリエンテーション確率の非常に小さな割合に対応することを発見しました。3つの推定値からのすべての加重乗算とベクトルの合計は、最終的なプレーヤーのオリエンテーション、つまりプレーヤーの最終的な角度に対応します。各プレーヤーで同じプロセスを実行し、フィールドの画像に方向を描画することにより、映像に残る試合を通じてプレイヤーの動きを追跡できます。

In terms of the accuracy of the method, this method managed to detect at least 89% of all required body parts for players through OpenPose, with the left and right orientation rate achieving a 92% accuracy rate when compared with sensor data. The initial weighting of the overall orientation became 0.5 for pose, 0.15 for field of view and <0.5 for ball position, suggesting the pose data is the highest predictor of body orientation. Also, field of view was the least accurate one with an average error of 59 degrees and could be excluded altogether. Ball orientation performs well in estimating orientation but pose orientation is a stronger predictor in relation to the degree of error. However, the combination of all three outperforms the individual estimates.
メソッドの正確度に関しては、このメソッドは、OpenPoseを介してプレイヤーに必要なすべての身体部分の少なくとも89％を検出し、センサーデータと比較した場合、左右方向のレートは92％の正確度を達成しました。オリエンテーション全体の初期の重みは、姿勢の場合0.5、視野の場合0.15、ボールの位置の場合<0.5となり、姿勢データが体のオリエンテーションの最高の予測因子であることを示唆しています。また、視野は、59度の平均誤差で最も正確度が低く、完全に除外することもできます。ボールのオリエンテーションは、オリエンテーションの推定に適していますが、姿勢のオリエンテーションは、エラーの程度に関連してより強力な予測子です。それでも、3つすべての組み合わせは、個々の推定よりも優れています。

Some limitations the researchers found in their approach is the varying camera angles and video quality available by club or even within teams of the same club. For example, matches from youth teams had poor quality footage and camera angles making it impossible for OpenPose to detect players at certain times, even when on screen.
研究者が彼らのアプローチで見つけたいくつかの限界は、クラブまたは同じクラブのチームでも利用できるさまざまなカメラのアングルとビデオの品質です。たとえば、ユースチームの試合は質の悪い映像とカメラアングルだったため、OpenPoseが画面上であっても特定の時間に選手を検出することができませんでした。

Finally, Adrià et al. suggest that video analysts could greatly benefir from this automated orientation detection capability when analyzing match footage by having directional arrows printed on the frame that facilitate the identification of cases where orientation can be critical to develop a player or a particular play. The highly visual aspect of the solution makes is very easily understood by players when presenting them with information about their body positioning during match play, for both first team and the development of youth players.
This metric could also be incorporated into the calculation of the conditional probability of scoring a goal in various game situations, such as its inclusion during modeling of Expected Goals.
Ultimately, these innovative advances in automatic data collection can relief many Performance Analyst from hours of manual coding of footage when tracking match events.
最後に、Adrià達は、ビデオアナリストはフレームに方向矢印をつけることで試合映像を分析するときに、この自動オリエンテーション検出機能から大きな恩恵を受けることができることを示唆します。
このソリューションの非常に視覚的な側面は、トップのチームとユースの育成者の両方にとって、試合中の体の位置に関する情報を提示することで、プレーヤーは非常に理解しやすくなることです。
このメトリックは、Expected Goalsのモデリング中に含めるなど、さまざまなゲームの状況でゴールを決める条件付き確率の計算に組み込むこともできます。
最終的に、自動データ収集におけるこれらの革新的な進歩により、試合イベントを追跡する際に映像を手動で数時間コーディングすることから多くのパフォーマンスアナリストを解放できます。

Appendix 1

ビデオフィード

DAZNのような配信動画のこと

フレーム

動画を構成する静止画１枚分。カメラで写された画像。

トルソー

美術（特に彫刻）の分野ではよく聞く言葉ですが…。

Wiki参照

超解像

解像度の低い観測画像では正しく表現できない高周波成分の情報を復元する技術のこと

この辺参照

ディープラーニングによる画像の拡大技術

正確度（accuracy）と精度（precision）の違い

「正確度（Accuracy）」とは、「真値」にどれだけ近い値であるかを示す尺度。一方、「精度（precision）」は、複数回の測定等の値の間での互のばらつきの度合いの尺度で、「再現性」ともいいう

Expected Goals

フットボール・データアナリティクス：得点期待値に関する新指標「xG」についての解説

FRONT

地図と画像のサイト

試合映像を使ったボディポジショニングの自動追跡

Be the first to comment

Leave a Reply コメントをキャンセル