Log in

No account? Create an account
color cycle (slow)

Kistaro Windrider, Reptillian Situation Assessor

Unfortunately, I Really Am That Nerdy

Previous Entry Share Next Entry
The Prevalence of Piracy on the Internet
nyah, tongueout, glasses, nerd
Electronic piracy is a popular topic of argument on the Internet, attracting the attention of everybody from random Slashdot users to highly-trained RIAA executives. As an employee of Microsoft, electronic piracy can affect me very directly, so I also pay attention to the topic. What I have discovered, however, is thtat nobody has really gone to collect the good information we really need to truly understand piracy online. For all the discussion of multi-milion dollar studies bought by the RIAA to allow them to claim that they are losing billions of dollars to this practice, none of them chose the simple and compelling research methodology that I have created: to get the information from Google.

Observant readers will note that the data axis of this graph is actually logarithmic. This is because the first trial (for "ar") created a large enough value that the interesting patterns (or lack thereof) in subsequent results were completely invisible on any reasonably-sized graph. The first search resulted in over 600,000,000 hits, while the smallest result (for "arrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr") had exactly 153 hits. All these values were calculated by entering "a", then a suitable number of "r"s (every integer from 1 to 50: zero was not tried as it is not a proper "arrr", I got bored after 50, and fractional "r"s produce a problem for the Unicode character set), and noting how many search results Google returned.

It is interesting to note that the prevalence of piracy is not monotonically decreasing with an increasing number of "r". It stops being monotonically decreasing from 16 to 17, when 16 "r"s have 847 hits but 17 have a surprising 17,300. 18 drops back to 694; the cause of this sharp numerical spike is unknown, and I invite hypotheses about the true meaning of this data.

I also kept track of suggestions Google made on "Did you mean:" for every search in which such an option was presented. Near the beginning, most of the suggestions- when they were made- suggested that I use fewer "r"s, while in the end, I almost always received a suggestion and it asked if I shouldn't be leaving off the "a". I suspect that by the end, Google was merely desparate for me to stop and was making these suggestions in a futile attempt to get me to shut up. Unfortunately, I have no way to test this hypothesis.

The entire collected data is available here. I invite further study and analysis of this important data about piracy on the Internet.

  • 1
yarr* and wryy* yield non-monotonically-decreasing curves, too.

Also, ninja (81,400,000) currently beats pirate (55,200,000), so the basis for the study is suspect.

> 16 "r"s have 847 hits but 17 have a surprising 17,300

Further research into this phenomenon reveals that the top hit for r=16 belongs to Patrick and Teresa Nielsen-Hayden, sci-fi editors. This suggests one hypothesis for the surprising lack of competitors: the mighty Tor Books' galleons, hoisting the black flag, were the first to sail the uncharted waters of the 16 Seas, and have solidified their power by sending competitors to Davy Jones' Locker.

Perhaps those who would otherwise gather at r=16 sought easier prey by simply adding an r -- and subsequently, others were drawn in by simple human nature, expecting that the disproportionate grouping of fortune-hunters at r=17 indicated the presence of buried treasure.

  • 1