Explaining what's new in the most recent update
We’ve been asked to explain what’s “new” in the most recent update to the data in our ongoing study of the impact of peer-to-peer (P2P) file sharing on paid book sales. At BookExpo America, we presented an update that included sales data for 21 O’Reilly Media 2008 front-list titles that we found on one or more P2P sites. This is an increase of 13 titles over the 8 that had been found when we first presented at Tools of Change in February 2009. It is still less than a third of all O’Reilly titles first published in 2008.
In trying to assess the impact of digital piracy on paid sales, we have been measuring paid sales four weeks before and four weeks after a title is first seeded. In our initial data set (eight titles), sales in the four weeks after a file was first seeded increased 6.5%; in the most recent report (all 21 titles), sales decreased 4.8% in the four weeks after seeding first occurred. The average lag time between first paid sale and first instance of seeding on a P2P site remained relatively constant at about 19 weeks.
With a larger data set, we tried plotting the average paid sales of pirated and un-pirated content using a common starting point (that is, we plotted sales data week-by-week after publication). The results of the week-by-week and four-week rolling averages are shown on slides 28 and 29 of the BEA presentation. Both pirated and un-pirated titles showed similar growth in sales in the first few weeks after a title is published, followed by a decline after peak. Average sales for unpirated content start higher and peak later, although this may reflect the specific nature of titles in a small sample.
The primary difference between sales of pirated and unpirated content appeared in weeks 19 through 25, when sales for pirated content peaked a second time at a level higher than that seen in the first, sell-in period. This second peak followed the time (19 weeks) at which the average pirated O’Reilly front-list title was first seeded on a P2P site.
We stress that this is correlation, not causality, but the difference in the sales profile is notable and persists even when using rolling averages. Data after about week 40 is not as reliable because the number of titles on sale for that length of time or more drops significantly. We will continue to monitor the data on an ongoing basis to establish a more complete profile. A download of the full research paper, which is published as a Rough Cut that includes access to any future updates, is now available for purchase ($99).
So, where is the full research paper seeded?
I’m sorry that you’re anonymous, because you’ve asked a great question. I haven’t looked for any pirated copies of the paper (we’ve been focused on the O’Reilly titles), but that’s secondary to what I read as the intent of your question.
I’ll talk with O’Reilly (publisher of the paper) about deliberately seeding the paper. It is one of the test cases we conceived when we first designed the study, and it is worth gathering data on its impact.
I’ve actually heard of similar results regarding audio sales (though in the opposite causal direction: something about people who pirate music buying 10x as much music as those who don’t). It would be incredibly interesting to see what results appear with further observation, especially with your thoughts on how generalizable those results are to other forms of media, or even the trends between subjects (are Linux geeks really as frugal as conventional wisdom suggests, do they buy fewer books per person given their estimated population size vs, say, Python or Perl users?).
Thanks for doing that research!