Archive your data!

When you submit a manuscript to the Journal of Animal Ecology you are asked the following question: ‘If your paper is accepted for publication where do you expect to archive your data or, if already archived where are the data held?’ Recently, we received a manuscript where the author had responded to this question with: ‘the data will be archived with the lead author’. This is not an appropriate digital data archive!

On about the same date that the previous was submitted, one of us wrote to the authors of a paper published just over a decade ago. In the email, data underpinning the paper were requested, as it seemed plausible that the paper’s conclusions were a consequence of an error in analysis. The authors were unable to provide the data because they had changed computers several times in the last decade and it had been lost somewhere along the line. This won’t be the only time this has happened. In fact, not so long ago, one of us has had to play the embarrassing role of replying to request a for data with the news that it was in a file format that was inaccessible on any current operating system. Fortunately, these sort of things shouldn’t happen in future for Journal of Animal Ecology papers as one of the reasons we require authors to upload data associated with their papers to a respected data repository is to avoid just such scenarios.

There are advantages of having data associated with a paper freely available other than allowing others to verify results. For example, new methods can be applied to old data to see whether methodological advances substantially alter conclusions, and data can be collated across species to allow new comparative and meta-analyses. There’s an obvious reason why Bumpus’s (1898) famous data on sparrows has been analysed in so many ways to quantify natural selection: because it was accessible. In addition, there is a compelling ethical reason to upload your data: If its collection was funded by the public purse, it should be made available for the public to access.

Both of us frequently work with long-term individual-based data sets. We have helped fund the collection of long-term data, spent time in the field collecting it, and (increasingly it seems…) spent time in the office analyzing it. Both of us have had many debates about data access with those in the community of researchers working with long-term data. And, as with any field, there are both advocates and opponents of data archiving. One frequently aired argument against it is that some unscrupulous bastard will download it, use it to address a question you were planning to address at some point in the future, and steal your intellectual thunder. They might even use inappropriate analyses and arrive at erroneous conclusions. This is a risk, but both of us consider it small, and outweighed by the benefits, and both of us advocate data archiving for all types of ecological and evolutionary data.

Over the past 20 odd years that each of us has been involved with long-term individual-based studies, we have both seen numerous approaches from researchers asking whether they can have access to data. In nearly all cases the request has been granted. Very occasionally a student or post-doc was already testing the same hypothesis as that proposed, using the same methods. In that case, access was denied, but it was also explained that data would be made available for other analyses if desired. It would of course be awful to be gazumped by someone who has downloaded your archived data, but is this sufficient argument not to archive data? We suspect that this would happen rarely, and that the risk is consequently low. It can also be easily addressed by proposing some simple data etiquette: If you are about to launch into an analysis of someone else’s data it would be strongly advisable to approach them and to let them know your plans. If they write back stating that someone else is just about to submit a manuscript on the same topic using that data, then you have potentially spared yourself considerable wasted time. It is also worth pointing out that restricting access to data doesn’t prevent people testing the same idea in other data sets, so the risk of losing out on your pet project isn’t at all prevented by data restrictions.

Another concern that we have heard raised, though not, we hasten to add, by any of our collaborators, is that someone has spent many years collecting data and they’re damned if they’re going to make it available to anyone else to parasitize. The most amusing version of this was from a researcher who told one of us that the reason they were not going to make their data available was they would rather die than let the RSPB have it! The reason had something to do with a dispute with the organization aeons ago. So long ago, in fact, that the data owner was unable to remember any real details about the conflict! If data were collected by funding from the public purse, and the data collector’s salary was paid from the same source at the time, this argument is really indefensible. It benefits science, and future generations of scientists, if you make your data available.

A possible final concern is that the data may be analyzed by people who don’t understand the system and that this might result in dubious insights. We always wondered what metadata were for. If methods sections are sufficiently detailed then it shouldn’t be too hard for major sources of bias in data to be flagged.

Finally, the number of people who really care about your data at this moment in time is probably quite small. Most people have their own data that they are interested in and pursue their own questions in those data. This is particularly true for long-term studies. How do we know this? In 2008 we made available for free download three long-term individual-based studies on birds http://lits.bio.ic.ac.uk:8080/litsproject/. A paper was published to accompany their release (Jones et al. 2008). Andy Village’s kestrel dataset, Ian Patterson’s rook dataset and Ian Newton’s sparrowhawk data set are all freely available. To our knowledge, they have very rarely, if ever, been used.

Neither of us is convinced by the arguments against data archiving. We believe the benefits far outweigh the costs. The British Ecological Society shares this view, and it requires data archiving for all papers published in all its journals. At the Journal of Animal Ecology, the only reason we would waive this requirement were for compelling ethical reasons. So next time you submit to us, the correct thing to write in response to the where will you archive data question is something like ‘the digital data archive, Dryad (http://datadryad.org/)’.

Tim Coulson (twitter: @tncoulson)
Ben Sheldon (twitter: @ben_sheldon_EGI)
(Senior Editors, Journal of Animal Ecology)

References

Bumpus, Hermon C. (1898) Eleventh lecture. The elimination of the unfit as illustrated by the introduced sparrow, Passer domesticus. (A fourth contribution to the study of variation.) Biological Lectures: Woods Hole Marine Biological Laboratory, 209-225.

Jones, O.R. et al. (2008) A web resource for the UK’s long-term individual-based time-series (LITS) data. Journal of Animal Ecology 77, 612–615.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s