Predicting Outcome of League of Legend Ranked games in ChampSelect via Machine Learning
Edit : you can now test it here : https://dodge-that.herokuapp.com/
League of Legends is a multiplayer online battle arena (MOBA) game where two teams of five players compete to destroy an enemy building called a Nexus. Before each game, the player select their champions and once everything is locked, enter the game.
This Machine learning project aim to answer the folowing :
- Is this game likely to be a win ? or should I dodge given my teamates picks and recents games ?
I build a Python app to retrieve data, explore and model this question via the Scikit-Learn Library over more than 2000 ranked games.
Players can now have the possibility to test the ML Model directly on : https://dodge-that.herokuapp.com/
My choice of feature is key in the way I want to acquire data.
Feature selected :
- Winrate of the player on his champion pick ( Planned )
- Number of game played with this champion ( Planned )
- Off Role Metrics : is the player playing his main role ?
- KDA over the last 5 Games : is the player a feeder or a challenger ?
- Number of win over the last 2 games — Metric checking if the player is in the right state of mind
- Number of win over the last 5 games — Metric checking the overall trend on the player mindset
- Experience of this player with his pick
- MMR, Which is his rank on the competitive ladder (like ELO on chess)
- Number of game played during the last week — Metric checking if the player is casual
Now where to get all these informations ?
- Riot API’s store the information regarding the games, player experience and more
- OP.GG store the information about player profile
Given the architecture of Riot API, which is in a diplomatic way of saying things, a challenge, I chained several API call per games in order to get the correct information.
Here is the logic behind :
1- get the player name
2- get the list of games he played during the last week
3- loop over this list and get information about the result, champ played , list of teammates
4- for each teammate, get their 5 last games, result, experience (30 api call per game)
5- store all the information in MongoDB
6- select a random teammate and restart the whole process
Once all set of games have been retrieved, and as it’s not possible with Riot Api to get the winrate of champion per player, we do some web scrapping on OP.GG to get the stats for each player.
This will allow us to get the winrate for each champion per player
We end up with 2 collection in our MongoDB
The raw data from Riot API and Champions GG need some tweaks
Our input data that will feed our models is the feature presented above.
- MMR, is a metric that is aggregated according the ladder of the ranked system. The ranking system in League of legends is split into several tiers and division. From Iron to Challenger, each tier contain 4 divisions, in order to be promoted to a superior division player need to get 100LP (Points).To get the estimated MMR, I gave 100pts for each division and add the remaining LP.
- Last2, as we retrieve the history of game played for each player, we count the number of win
- Last5, same as above
- Winrate, is directly scrapped from OPGG. If we did not retrieve any information for a specific champion, the default winrate is 50%
- Nbplayed, is directly scrapped from OPGG. If we did not retrieve any information for a specific champion, the default number of game played is 5
Lets have a quick look on our features :
Thanks to the violin plot from seaborn we can visualise the effect of our features.
Quicks assumptions frome these charts
- It seems that average experience of the team is positively correlated with chance of winning the game
- The better the MMR, the most likely we are to win the game
- We have a slight better chance of winning if our teammates plays a lot
- Paradoxically, if our team has a high winrate over the last 5 games, we are less likely to win. This is a phenomena that most player can feel. Once you are on a winning streak, Riot Matchmaking is often placing you against stronger opponent, resulting in harder match to win.
Time to play with the Scikit-Learn Library !
We will iterate over 4 different models
Logistic Regression (LR) :
Used to model the probability of a certain class event. The model works by predicting the probability that Y belongs to a particular category by first fitting the data to a linear regression model which is then passed to the sigmoid function. If the probability is higher than a predetermined threshold (usually P(Yes)>0.5) then the model will predict Yes (1)
Random Forest Classifier (RFC) :
We combine many classifiers/model into one predictive model (ensemble learning), the most predicted class will be the choosen one, using the idea of wisdom of crowds. On RFC we add bagging, by decorrelating the different trees. During every split, we do not choose the full set of p predictors but just a random sample.
Gradient Boosting Classifier (GBC) :
GBC is also based on ensemble learning, but the idea is that we improve the model by using information from previously constructed classifier. we can tweak this slow learner model with 3 parameters : Number of classifier B / Interaction depth d / Learning parameter Lamda
Multi-layer perceptron (MLPC) :
Neural network using at least three layers of nodes ( input layer / hidden layer / output layer)
Training our models
I’ve choosen to set our train set as 70% of total population
As we have continuous feature such as experience that can have high value we will also scale our data.
def getdata(matrix,gamenumber=50,split_percent=0.7,scale=True):df = pd.DataFrame(data=matrix) #full data
df = df.sample(gamenumber) #randomly select subset of data
data = df.values
np.random.shuffle(data)breakpoint = int(split_percent*len(data))
Xtrain = data[0:breakpoint,:-1]
Xtest = data[breakpoint:,:-1]
Ytrain = data[0:breakpoint,-1].T.astype(float)
Ytest = data[breakpoint:,-1].T.astype(float) if scale==True:
xScaler = StandardScaler()
Xtrain = xScaler.transform(Xtrain)
Xtest = xScaler.transform(Xtest)return Xtrain,Xtest,Ytrain,Ytest
We will use RandomGridSearch in order to tune our hyperparameters, and to control the number of search iterations and lower our processing time.
RandomGridSearch can be a good compromise when we do not have enough power to run all simulations.
randomizedsearch = RandomizedSearchCV(estimator = model, param_distributions = params, n_jobs=-1)
return randomizedsearch.cv_results_["mean_test_score"], randomizedsearch.best_params_, randomizedsearch.cv_results_["std_test_score"]
Checking the results and learning curve :
The overall accuracy of the different model is surprisingly high, we notice that Logistic Regression is not a good model for this specific work. RFC/GBC/MLPC gives about the same accuracy at about 70%+ and hit this threshold at about 2000 samples.
If we focus a bit more on loss per model we can see that the plot of training loss decreases consitently to a point of stability. The plot of test loss decreases to a point of stability and has a small gap with the training loss for Logistic regression. This seems to indicate that we are on a good fit.
Finally we can see our overall metrics score below :
Congratulation we just win vs Riot Matchmaking algorithm ! the model is able to predict the outcome of the game around 73%+ of the time while there is usually a 50/50 chance of wining the game.
What’s next ?
In order to remove potential biais I did not use the winrate feature (default 50% for everyone yet). But this item could be a key feature in increasing the accuracy of the model. The only problem is that I have to retrieve the winrate of the champion played before the player play the game. Which mean going live on data collection.
I created a website (Python-Flask) to be able to run the model while you are in champ select so you can significantly increase your overall winrate and doesn’t waste 30min in game that you will most likely loose !
It is available at : https://dodge-that.herokuapp.com/
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…
08/11/2020 : Added the webapp to run the model : https://dodge-that.herokuapp.com/
03/11/2020 : Added 2 Features