Noodles Jr's Blog - Jul to Sep 19 (24 followers)
Blog

Jul
11
PermalinkA Statistical Analysis of Achievements by State: The Good, The Bad and The Idaho
INTRO AND EXPLANATION

Having just recently completed both a stats and a Python coding course, I figured it would be worthwhile to exercise these practices with something I enjoy... achievements. I was initially planning on doing this analysis for the Canadian provinces, since that's where I live, but unfortunately, the site is straight-up missing a province, soooooo I'm doing the US.

To give an idea of the "rules" here, I'm only including gamers in the >100kGS range. I used >100kGS for the ratio/completion percentage leaderboards, to weed out the people who sit at the top of those leaderboards, with just a few thousand gamerscore. And to make sure I'm sampling the same group of gamers, I only used the >100kGS people for the gamerscore leaderboard as well. This also helps to significantly lessen the sample size, which made the data extraction into Excel actually bearable. To keep this analysis fair, I will be keeping the gamerscore, ratio and completion percentage weighted equally. I will go into more detail on that later.

So what exactly are the stats I'm using?

By copying and pasting the regional leaderboards into Excel, I was able to take the average gamerscore, ratio and completion percentage of the sampled gamers in each state. I made sure to grab all the data over a time interval where the leaderboards were not updated, to ensure that if someone just so happens to surpass the 100k barrier after I take all the gamerscore leaderboard data, that person isn't sampled in the ratio/completion percentage. This keeps everything as fair and consistent as possible. I chose to use these three stats specifically because they are independent of one another. Gamerscore, TA ratio and completion percentage have no correlation, and I believe these three stats cover all the bases.

GAMERSCORE

Starting with gamerscore, when taking the averages of the 50 states, I found the average to be surprisingly consistent across most states, with the most falling into the 180k-200k range. Only one state fell below the 180k threshold, and that was Hawaii, with an average of 178989.2G.

16 states broke the 200k barrier, and 3 broke 210k. Those 3 were:
-Arizona with 210468.9
-Tennessee with 210685.0
-New Hampshire with 224715.3

Tennessee was actually largely due to Stallion83, who single-handedly brings the average up over 8000, despite the sample size being 254 (let that sink in). As for New Hampshire, I was sure I made some sort of error because it blows every other state out of the water, but nope. New Hampshire has a relatively small population compared to many other states, having just 66 people over the 100k mark. Of those 66 people however, 14 of them fall into the 300k-700k range. That means over a fifth of the >100k gamers have >300k. No other state has that statistic this high.

The full list of states in descending order of their average gamerscore can be found in the spoiler tag below.

*** Spoiler - click to reveal ***


The average gamerscore across all 50 states is 194970.2.

TA RATIO

Much like gamerscore, TA ratio is mostly consistent. All but one state surpass the 1.65 marker. That one state is Idaho, which has a measly 1.620696, not even breaking the 1.625 mark. Only 6 states have ratios above 1.7, 3 of which are nearly identical:
-Delaware with 1.701940
-New Jersey with 1.701918
-South Dakota with 1.701870

The top 3 states put the other 47 to shame:
-Vermont with 1.764706
-Hawaii with 1.733047
-Alaska with 1.714231

Hawaii's second-place finish here almost makes up for its last-place finish in the gamerscore category... Almost.

The full list of states in descending order of their average TA ratio can be found in the spoiler tag below.

*** Spoiler - click to reveal ***


The average TA ratio across all 50 states is 1.679507.

COMPLETION PERCENTAGE

This was the closest of the three comparisons. All 50 states fell into a 10% range of 48%-58%, and 44 states fell within a 5% range of 50.5%-55.5% Only two states actually fell beneath the 50% margin, so Mississippi and Idaho (once again) are our two major outliers, with percentages of 48.96321% and 49.16549% respectively. If not for Idaho's decent 13th place in gamerscore, it would almost certainly be the 50th ranked state, but two bottom-three finishes still isn't pretty. Of the 48 other states, only 4 were above 56%:
-North Dakota with 57.60707%
-Hawaii with 56.76605%
-West Virginia with 56.18176%
-New Jersey with 56.12592%

Once again, Hawaii takes second place. If not for its last-place finish in the first category, it would have almost certainly taken home the prize as the #1 state. It's still in the running, but its average gamerscore is a major roadblock. North Dakota comes out of nowhere and makes a fool of everyone else. Unfortunately for it, it did terrible regarding its ratio, and did okay at best with gamerscore. West Virginia and New Jersey did okay in the previous stats, but considering how close every state was in this statistic, neither state really stand a chance, and neither does North Dakota.

The full list of states in descending order of their average completion percentage can be found in the spoiler tag below.

*** Spoiler - click to reveal ***


The average TA ratio across all 50 states is 53.44378%.

OH BOY, GRAPHS

Up til now, it's just been simple averages. Now is for the tricky part. I have to somehow use the three statistics I've found, to fairly give scores to all 50 states. I started out by graphing all 50 states onto a 3D graph, composed of a gamerscore axis, a ratio axis and a completion percentage axis. It looked a little something like this:

External image


Yuck.

Most states are clumped up in the middle. This is largely due to perception, but is also largely due to just how consistent the stats were, across most states. You'll also see a little green blip in the bottom left, labeled "ORIGIN". You'll see why it's there in a second.

I then went ahead and normalized the graph, meaning I essentially relabeled the ranges of all three axes, to go from 0 to 1, instead of from 178000-224000, 1.62-1.77 and 48-58. This was done by subtracting the minimum value from each statistic, and then dividing it by the maximum subtract the minimum. What this does is default the lowest value in each set to 0, and the highest value to 1, and it scales everything else around those. This is what it looked like, post-normalization:

External image


If you cover up the numbers labeled on the axes, this is 100% identical to the previous plot. Now, this is where the green origin dot comes into play. I call it the origin dot, because it lies at what is now (0,0,0). What are we using it for? Well, since it represents the absolute minimum of all three statistics, the origin point will act as a reference point. The "score" we give to each state, is simply how far away it is from the origin point. The farther, the better.

A quick example:

External image


Connecting lines to Arkansas and Nevada, we can clearly see (and no, this isn't a perception trick), that the Nevada line is noticeably longer than the Arkansas line. This means, that Nevada is a higher ranked state regarding achievements. Now we draw lines to every state...

External image


This is just horrendous, but now we have all the information we need! The minimum length of a blue line is 0, which would only happen if a state placed last in every statistic, and the maximum length is √3 ~ 1.73, which is what would happen if a state placed first in every statistic.

And so, the final scores are:
External image


And if we order them accordingly, we'll find our rankings:
External image


FINAL RESULTS

Our top three are:
1. VERMONT ~ 1.3227
2. HAWAII ~ 1.1931
3. NEW HAMPSHIRE ~ 1.1682

...and our bottom three are:
48. IDAHO ~ 0.4908
49. KENTUCKY ~ 0.4791
50. INDIANA ~ 0.4240

Honestly, the most shocking thing to me is seeing Hawaii in second place. If it had put up even just a mediocre gamerscore average, it would have easily snagged number one. But no, it had to have the lowest. God damnit Hawaii.

Second most shocking thing to me is seeing Idaho not last. Indiana and Kentucky both completely swept under my radar. Indiana was in the bottom 10 every time, but never the bottom three, and Kentucky was in the bottom 10 twice, and it broke into the 30s once. Idaho's two bottom-two finishes weren't enough to put it at the bottom of the list.

With that all said, I have no idea why I did this. I just thought it would be neat. Congrats to Vermont on being confirmed as the best state in the country, and congrats to Idaho for not being the worst. I've got some ideas for similar future blogs, so if people like this kind of statistic-y stuff, let me know.

Later bois. Let's all move to Hawaii and try to get it to #1
Posted by Noodles Jr on 11 July 19 at 09:19 | Last edited on 11 July 19 at 22:00 | There are 23 comments on this blog post - Please log in to comment on this blog.