<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1657797781133784&amp;ev=PageView&amp;noscript=1">

How To Bust Buffet’s Brackets With Data Science

basketball_dataFace it: tourney time is just a fun way to be ridiculed by co-workers, in-laws, and old college buddies who have spent the last four months watching SportsCenter every night for Men’s College Basketball highlights.


This year, uber-billionaire Warren Buffett and his company, Berkshire Hathaway, are partnering with Quicken Loans to offer $1 billion to any person who can correctly pick the winners of all 63 games in this year’s NCAA men’s college basketball tournament.


That’s no small feat considering that the odds of doing so clock out to about 1 in 4.3 billion.


Picking your bracket mainly consists of deciding which three number one seeds are going to make it to the Final Four. The other slot is saved for your alma mater or the college you wished you went to (I always have Oregon going farther than they should).


In years when the bracket outcomes are "predictable"– meaning mainly top seeds will win – you’re part of large cohort of bracket owners separated by just a few points. It’ll come down to the final game before the elite brackets are determined.


Then there are the bracket buster years when the Cinderellas emerge to wreak havoc on everyone – years when your five-year-old gets more points than you. Why? Because you picked three out of four number one seeds in your final four. Hey, this happens. But what if you could take a chance on an upset that nobody else sees coming?


In 2006 I took a chance on a team that seemed liked a ridiculous choice, at least to my fellow data and analytics co-workers. I picked 11th seed George Mason to go to the Final Four – and they made it, and it wasn’t just luck.


Data is tracked on everything – and I mean everything. It’s interesting that all of my co-workers at the time read “Moneyball” about how data science was used to predict players for the Oakland A's and were well aware of Sabermetrics. Yet, none of them went out to seek if the same data is collected on college basketball teams. A site that collects data on every NCAA college basketball game since 2003, http://kenpom.com/ was invaluable when it came to picking my bracket.


That year I spent a ton of time working on this data just to see what I could learn and how it applies to bracket outcomes. I created dozens of statistical models and predictions on winning outcomes and spreads. Truth is, others have created better predictive models using the same data. But what I did learn is that, in average Joe bracket terms, after the first three seeds, not every like seed is equal.


Of all the, say, 12 seeds, there can be a dark horse that is going to make a run in the tourney. If you can pick the right one, then you’ll blow everyone else away.


An average Joe would look at 11th seed George Mason versus #6 Michigan State and say George who? Even if they did beat Michigan State, then their next opponent would probably be #3 North Carolina. Who would pick that?


I did, based on the following metrics:


Here is what to look for in the makeup of the next George Mason. I took these from Ken Pomeroy's website but also created some of my own metrics, too, for fun.


  1. Conference schedule strength: This is pretty important but not the only metric to look at.
  2. Winning percentage: Is this really important? Not always.
  3. Winning probability: This is a better metric as it projects how any one team would fair against a division one opponent. It also uses Pythagorean math so it must be accurate.
  4. Defensive and offensive efficiency: Important? Yes, the last three national champs have been in the top 25 in both categories.
  5. Tempo: Yes, tempo is the number of possessions per 40 minutes.
  6. Luck: Believe it or not, there are measures for luck. Ken Pomeroy's website has a formula for it but don't feel bad if your formula is actually a coin flip.


There are more but this is a way to look and compare teams that are playing in the tourney. Arm yourself with a few more metrics and you’ll be making more intelligent decisions.


So, why George Mason?


Metrics are a good way to evaluate match-ups, but a hot streak is intangible. A team on a hot streak is inherently dangerous and is far more likely to find that Cinderella’s slipper is just their size. So, strong in all the metric categories + blazing hot streak = Cinderella!


If you want to take Mr. Buffett up on his offer you can play around with the numbers a little here http://bracketodds.cs.illinois.edu/index.html before submitting your picks to the contest.


And it’s a no-lose proposition – even if the perfect mix of metrics doesn’t yield bragging rights or benjamins, there’ll be no tax consequences on a billion bucks to worry about.