INTRO AND EXPLANATION
Having just recently completed both a stats and a Python coding course, I figured it would be worthwhile to exercise these practices with something I enjoy... achievements. I was initially planning on doing this analysis for the Canadian provinces, since that's where I live, but unfortunately, the site is straight-up missing a province, soooooo I'm doing the US.
To give an idea of the "rules" here, I'm only including gamers in the >100kGS range. I used >100kGS for the ratio/completion percentage leaderboards, to weed out the people who sit at the top of those leaderboards, with just a few thousand gamerscore. And to make sure I'm sampling the same group of gamers, I only used the >100kGS people for the gamerscore leaderboard as well. This also helps to significantly
lessen the sample size, which made the data extraction into Excel actually bearable. To keep this analysis fair, I will be keeping the gamerscore, ratio and completion percentage weighted equally. I will go into more detail on that later.
So what exactly are
the stats I'm using?
By copying and pasting the regional leaderboards into Excel, I was able to take the average gamerscore, ratio and completion percentage of the sampled gamers in each state. I made sure to grab all the data over a time interval where the leaderboards were not updated, to ensure that if someone just so happens to surpass the 100k barrier after
I take all the gamerscore leaderboard data, that person isn't sampled in the ratio/completion percentage. This keeps everything as fair and consistent as possible. I chose to use these three stats specifically because they are independent of one another. Gamerscore, TA ratio and completion percentage have no correlation, and I believe these three stats cover all the bases.GAMERSCORE
Starting with gamerscore, when taking the averages of the 50 states, I found the average to be surprisingly consistent across most states, with the most falling into the 180k-200k range. Only one state fell below the 180k threshold, and that was Hawaii
, with an average of 178989.2G
16 states broke the 200k barrier, and 3
broke 210k. Those 3 were:
Tennessee was actually largely due to Stallion83, who single-handedly brings the average up over 8000, despite the sample size being 254 (let that sink in). As for New Hampshire, I was sure I made some sort of error because it blows every other state out of the water, but nope. New Hampshire has a relatively small population compared to many other states, having just 66 people over the 100k mark. Of those 66 people however, 14 of them fall into the 300k-700k range. That means over a fifth of the >100k gamers have >300k. No other state has that statistic this high.
The full list of states in descending order of their average gamerscore can be found in the spoiler tag below.
The average gamerscore across all 50 states is 194970.2
Much like gamerscore, TA ratio is mostly
consistent. All but one state surpass the 1.65 marker. That one state is Idaho
, which has a measly 1.620696
, not even breaking the 1.625 mark. Only 6 states have ratios above 1.7, 3 of which are nearly identical:
The top 3 states put the other 47 to shame:
Hawaii's second-place finish here almost makes up for its last-place finish in the gamerscore category... Almost.
The full list of states in descending order of their average TA ratio can be found in the spoiler tag below.
The average TA ratio across all 50 states is 1.679507
This was the closest of the three comparisons. All 50 states fell into a 10% range of 48%-58%, and 44 states fell within a 5% range of 50.5%-55.5% Only two states actually fell beneath the 50% margin, so Mississippi
(once again) are our two major outliers, with percentages of 48.96321%
respectively. If not for Idaho's decent 13th place in gamerscore, it would almost certainly be the 50th ranked state, but two bottom-three finishes still isn't pretty. Of the 48 other states, only 4 were above 56%:
Once again, Hawaii takes second place. If not for its last-place finish in the first category, it would have almost certainly taken home the prize as the #1 state. It's still in the running, but its average gamerscore is a major roadblock. North Dakota comes out of nowhere and makes a fool of everyone else. Unfortunately for it, it did terrible regarding its ratio, and did okay at best with gamerscore. West Virginia and New Jersey did okay
in the previous stats, but considering how close every state was in this statistic, neither state really stand a chance, and neither does North Dakota.
The full list of states in descending order of their average completion percentage can be found in the spoiler tag below.
The average TA ratio across all 50 states is 53.44378%
.OH BOY, GRAPHS
Up til now, it's just been simple averages. Now is for the tricky part. I have to somehow use the three statistics I've found, to fairly give scores to all 50 states. I started out by graphing all 50 states onto a 3D graph, composed of a gamerscore axis, a ratio axis and a completion percentage axis. It looked a little something like this:
Most states are clumped up in the middle. This is largely due to perception, but is also largely due to just how consistent the stats were, across most states. You'll also see a little green blip in the bottom left, labeled "ORIGIN". You'll see why it's there in a second.
I then went ahead and normalized the graph, meaning I essentially relabeled the ranges of all three axes, to go from 0 to 1, instead of from 178000-224000, 1.62-1.77 and 48-58. This was done by subtracting the minimum
value from each statistic, and then dividing it by the maximum subtract the minimum
. What this does is default the lowest value in each set to 0, and the highest value to 1, and it scales everything else around those. This is what it looked like, post-normalization:
If you cover up the numbers labeled on the axes, this is 100% identical to the previous plot. Now, this is where the green origin dot comes into play. I call it the origin dot, because it lies at what is now (0,0,0). What are we using it for? Well, since it represents the absolute minimum of all three statistics, the origin point will act as a reference point. The "score" we give to each state, is simply how far away
it is from the origin point. The farther, the better.
A quick example:
Connecting lines to Arkansas and Nevada, we can clearly see (and no, this isn't a perception trick), that the Nevada line is noticeably longer than the Arkansas line. This means, that Nevada is a higher ranked state regarding achievements. Now we draw lines to every
This is just horrendous, but now we have all the information we need! The minimum length of a blue line is 0, which would only happen if a state placed last in every statistic, and the maximum length is √3 ~ 1.73, which is what would happen if a state placed first in every statistic.
And so, the final scores are:
And if we order them accordingly, we'll find our rankings:FINAL RESULTS
Our top three
3. NEW HAMPSHIRE
...and our bottom three are:
Honestly, the most shocking thing to me is seeing Hawaii in second place. If it had put up even just a mediocre gamerscore average, it would have easily snagged number one. But no, it had to have the lowest. God damnit Hawaii.
Second most shocking thing to me is seeing Idaho not
last. Indiana and Kentucky both completely swept under my radar. Indiana was in the bottom 10 every time, but never the bottom three, and Kentucky was in the bottom 10 twice, and it broke into the 30s once. Idaho's two bottom-two finishes weren't enough to put it at the bottom of the list.
With that all said, I have no idea why I did this. I just thought it would be neat. Congrats to Vermont on being confirmed as the best state in the country, and congrats to Idaho for not being the worst. I've got some ideas for similar future blogs, so if people like this kind of statistic-y stuff, let me know.
Later bois. Let's all move to Hawaii and try to get it to #1