• Help Support The Rugby Forum :

Building a prediction model for game scores - what variables? (also MatchPint game)

andysimcoe

Academy Player
Joined
Jan 21, 2019
Messages
39
Country Flag
Scotland
Club or Nation
Edinburgh
Apologies this is long winded. I didn't want to take up the predictions thread and this is kind of two points, the MatchPint game and my personal predictions.

So if the ***le isn't clear, what would you consider when predicting a score? I will say while this is purely for my gain, if you're interested there's an app called MatchPint, I think its purpose is to get you into bars I dunno, but there's a new thing this year called Pint Predictor for the 6N. You stick in your score prediction and if you're within 7 points (overall points diff) you get a free pint of Guinness the following week, if you get it exactly correct you can redeem one immediately. I don't know how or what and I'm not affiliated with them but I have created a league if you fancy joining it.

All you have to do is select a winning team (or 0 for draw) and choose how many they'll win by, if you install the app and go to Pint Predictor and then join a league using code 'TRF'. - https://www.matchpint.co.uk/pintpredictor?dm_i=2AEF,1D47M,933T4E,4GWDT,1


If you're just interested in MatchPint, end here and avoid my bullshit below.

----
Now on to my real cry for help, me and a couple people are travelling to Paris for the France v Scotland game, so we have two games prior to this. As BA have cut their complementary drinks we came up with this plan, we'll predict the scores (again overall points diff) for the two games this Friday (can't change scores for the second game so no teamsheet for the Ireland game etc). The loser will buy the drinks on the plane and 2nd place will probably buy an extra round in the airport. - As a Scot I'm going to all lengths to avoid paying £15 for 3 **** cans of beer on a plane.

It was only after this I found the MatchPint app on Twitter so I figured I was half way there anyway. Also I wrote a quick Python script to make the predictions, what I currently have is:

  • Past 10 years of 6N results
  • Restricted to home games due to the advantage (though I have a model which uses all games)
  • Stats include - tries scored, conversions made, penalties, drop goals (all with conceded too)
  • Conversions are converted to a conversion percentage
  • The rest of the stats are averages and applied, so if conversion % is 100 and avg tries scored is 5, it'll assume all of these will be converted
So essentially scoring is averaged with the home advantage. Not very complex even though when I started this I named this model 'complex'.

What have I considered:
  • Team's world ranking
I found this had no impact, I'd take for example Scotland's WR at the time and Italy's, there seemed to be no correlation between the score, eg if one year they were 5 places apart and the next they were 10, the bigger gap didn't equate to a bigger gap in scoreline.

What could work but I don't know how to quantify it or translate it to points:
  • Weather - what was the weather like when Italy won at Murrayfield or a close game
  • Teamsheet
  • Club performance of top tacklers, try scorers etc - If they're not playing will there be less tries and more panalties? The top try scorer for Scotland last year, how have they been performing at club level? Is their club performance normally an indicator in previous years?
  • Bookies odds - It's easy to convert fractions to a percentage of outcome, but finding historical odds for small scoring bands hasn't been great but I can at least see what odds I'll get for my predictions and possibly manually adjust

Really, if you're still reading, what else could be a reliable impact for the scoreline? Here's the predictions the script spits out (with some verbose messages) for the Scotland - Italy and Scotland - Ireland game.

Complex home advantage model for Italy
Avg tries for per game - 3
Avg tries against per game - 1
Conversion percentage for - 85
Conversion percentage against - 60
Avg penalties for per game - 3
Avg penalties against per game - 1
Avg drop goals for per game - 0
Avg drop goals against per game - 0
**Predictions**
Scotland - 30 --- Italy - 10

Complex home advantage model for Ireland
Avg tries for per game - 1
Avg tries against per game - 2
Conversion percentage for - 100
Conversion percentage against - 83
Avg penalties for per game - 3
Avg penalties against per game - 2
Avg drop goals for per game - 0
Avg drop goals against per game - 0
**Predictions**
Scotland - 16 --- Ireland - 20


If you're interested, I can remove the data for 2018/2017 to predict the 2017 home scores and it's massively off:
**Predictions** (2017)
Scotland - 23 --- Italy - 10
**Real score**
Scotland - 29 --- Italy - 0

**Predictions** (2017)
Scotland - 12 --- Ireland - 20
**Real score**
Scotland - 27 --- Ireland - 22

Points of interest:
Both teams in the past ten years have kicked less drop goals, Dan Parks retiring Scotland rarely kick them now. Italy used to kick them nearly every year, O'gara used to kick a few etc, so the older data is arguably not relevant.

2017 by all accounts would look like an an anomaly - Scotland might have been expected to win but Italy would expect to get some points, Ireland were favourites at Murrayfield.

Any suggestions for quick wins to improve possible predictions?

Cheers,
 
Last edited:
You didn't mention what you did when not excluding but shoukd include home or away as a binary variable, as opposed to just excluding away games.
I'd also include an interaction effect with time on most of the variables. That would allow you to accommodate the changing players and styles of the team.
Form would be a useful indicator too. Perhaps have sort of time series variables as in the record in the last 4 or 5 games or else the record in the tournament so far.
Strength of domestic teams in europe may be a reasonable form guide too.
Even the bookies odds of win vs loss vs draw could be useful if it's hard to find odds of the scores historically. The chances of a win will be based upon the bookies underlying model which will account for a lot of these variables and higher certainty of win would be associated with higher score difference etc.
 
Not enough data to make a statistical model - particularly given you really need to include wet/dry into determining whether its an attritional or expansive game that is played.
 

Latest posts

Top