Kistaro Windrider, Reptillian Situation Assessor (kistaro) wrote,
Kistaro Windrider, Reptillian Situation Assessor

  • Mood:
  • Music:

The Prevalence of Piracy on the Internet

Electronic piracy is a popular topic of argument on the Internet, attracting the attention of everybody from random Slashdot users to highly-trained RIAA executives. As an employee of Microsoft, electronic piracy can affect me very directly, so I also pay attention to the topic. What I have discovered, however, is thtat nobody has really gone to collect the good information we really need to truly understand piracy online. For all the discussion of multi-milion dollar studies bought by the RIAA to allow them to claim that they are losing billions of dollars to this practice, none of them chose the simple and compelling research methodology that I have created: to get the information from Google.

Observant readers will note that the data axis of this graph is actually logarithmic. This is because the first trial (for "ar") created a large enough value that the interesting patterns (or lack thereof) in subsequent results were completely invisible on any reasonably-sized graph. The first search resulted in over 600,000,000 hits, while the smallest result (for "arrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr") had exactly 153 hits. All these values were calculated by entering "a", then a suitable number of "r"s (every integer from 1 to 50: zero was not tried as it is not a proper "arrr", I got bored after 50, and fractional "r"s produce a problem for the Unicode character set), and noting how many search results Google returned.

It is interesting to note that the prevalence of piracy is not monotonically decreasing with an increasing number of "r". It stops being monotonically decreasing from 16 to 17, when 16 "r"s have 847 hits but 17 have a surprising 17,300. 18 drops back to 694; the cause of this sharp numerical spike is unknown, and I invite hypotheses about the true meaning of this data.

I also kept track of suggestions Google made on "Did you mean:" for every search in which such an option was presented. Near the beginning, most of the suggestions- when they were made- suggested that I use fewer "r"s, while in the end, I almost always received a suggestion and it asked if I shouldn't be leaving off the "a". I suspect that by the end, Google was merely desparate for me to stop and was making these suggestions in a futile attempt to get me to shut up. Unfortunately, I have no way to test this hypothesis.

The entire collected data is available here. I invite further study and analysis of this important data about piracy on the Internet.

  • Last LJ post

    Hey all, I joined the LJ exodus train before it was cool</hipster>, but with recent developments in LiveJournal server location (…

  • (no subject)

    I want to assemble things that nobody else could ever assemble, and when they are done, I want to have done it in ways that nobody of average skill…

  • Failing, etc.

    That feeling of being 99% sure a social space would have been better for everyone without you in it, but you can't apologize or talk about it or…

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded