My AI Soccer Players Injury Prediction

Rahul Bhattacharya Sep 13, 2023 04:32 PM

Live Demo GitHub

Synopsis: Sometimes ideas arrive from a moment of reflection rather than a moment of need. I once watched a sports match where a key player had to leave because of an injury that seemed predictable if someone had only looked closely at the data. That scene made me wonder if data...

Sometimes ideas arrive from a moment of reflection rather than a moment of need. I once watched a sports match where a key player had to leave because of an injury that seemed predictable if someone had only looked closely at the data. That scene made me wonder if data could be used to anticipate such risks before they happened. I imagined that a simple dashboard could score risk levels and highlight patterns that would otherwise remain hidden. The thought of being able to create such a tool with my own code stayed with me until I decided to attempt it. Dataset used here.

This project is my way of turning that idea into something real. The application is not just about algorithms, but also about building a working end‑to‑end solution. It takes a CSV file of player statistics and processes it with machine learning pipelines. The result is displayed in a Streamlit dashboard where risk scores are shown in tables and charts. It can run in supervised mode if labels are present or in unsupervised mode if they are not. What follows is a breakdown of every file I uploaded to GitHub and a detailed look at every code block inside the app.

Requirements File

The first file that had to be uploaded was the requirements.txt. This file lists all external libraries that my app needs. Without it, the deployment environment would not know what to install. The content is short but critical.

streamlit
scikit-learn
pandas
numpy
plotly

Each entry here plays a different role. Streamlit provides the web dashboard. Scikit‑learn provides the machine learning models, preprocessing, and pipelines. Pandas and NumPy are used for data handling and mathematical operations. Plotly is used for interactive visualizations. By declaring them in one file, I ensure that anyone who runs the app installs the same dependencies that I used.

README File

The README.md file introduces the project. It explains in plain text what the application does, what modes it supports, and how to run it. This file is important for GitHub because it is the first thing visitors see when they open the repository.

# Injury Risk Detection Dashboard

An end-to-end Streamlit app that scores player injury risk using your CSV data.

## Features
- Supervised mode (RandomForest) if you have a binary injury label.
- Unsupervised mode (IsolationForest) if you do not.
- Interactive table and top-30 risk chart.
- One-click CSV export with risk scores.

## Quickstart
```bash
pip install -r requirements.txt
streamlit run app.py

The README explains that the application supports both supervised and unsupervised risk scoring. It also mentions the interactive components like tables and charts. The quickstart section tells the user to install requirements and run the app with the Streamlit command. This document is not part of the runtime but it is essential for communication and clarity.

Dataset File

Another file I uploaded was players_data-2025_2026.csv. This is a sample dataset with player information. The application reads this file to generate predictions if no custom file is uploaded. The data provides structure so that the app can be demonstrated without needing an external source.

The dataset typically contains columns for player characteristics, performance metrics, and possibly labels indicating whether a player faced injury. Having such data available allows the supervised mode to train and evaluate models. In the absence of labels, the unsupervised mode makes use of clustering or anomaly detection methods to still provide a risk score. The dataset therefore acts as both input and a demonstration resource.

Application Code (app.py)

The app.py file is the heart of this repository. It contains all the logic for the Streamlit dashboard, the data preprocessing, the model training, and the visualization. I will go through it section by section, showing code blocks and then explaining them in detail.

import io
import re
import numpy as np
import pandas as pd
import streamlit as st
import plotly.express as px
from pathlib import Path
import io