Yesterday a paper from my lab (by Morgan Langille, with me as co-author) was published in PLoS On: BioTorrents: A File Sharing Service for Scientific Data
In it we describe a new website dedicated to the sharing of biology related files via BitTorrent, the popular distributed file sharing system. The abstract sums things up prety well:
The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net.
“Someone could download all the Nature papers and post them there, but we’re not encouraging that,” Eisen jokes. All PLoS papers are already on BioTorrents.
- The Scientist Blog from Bob Grant
- Amazing News post
- PLoS One press release
- GenomeWeb article by Matthew Dublinv
- John Timmer has written an article for ArsTechnica
- FileNetworks
- Tim O’Reilly on twitter
- Egon Willighagen at chem-bla-ics
http://friendfeed.com/search?q=biotorrents&embed=1
Older discussion on FriendFeed by Morgan et al.
http://friendfeed.com/betascience/3d17a069/biotorrents-manuscript-accepted?embed=1
Completely agree that whether or not BioTorrents becomes the best data sharing site is not really the point (that would be nice though), but that scientists start thinking about sharing their data and results more openly. We can develop the best file sharing tools in the world, but without the willingness for researchers to share their data they are not of much use.
I would really like to see more result based types of data on BioTorrents, since there isn't an existing repository for this types of data.
LikeLike
I really hope initiatives like this take-off, there's so much data out there that probably only gets used once, when actually it could get re-used and re-analysed multiple times in future analyses.
On a related bent; I'd like to see more policing and enforcement by editors and journals on commitments to data publishing. For example, in a recent Science paper, Nesbitt et al (2009) write that they will make their cladistic data publicly available on Morphobank (Supporting Online Materials, p18). Months after publication, and despite a 'reminder' email, their data is still not publicly available.
What can be done to stop 'empty promises' of Open data availability?
LikeLike
Ross – I agree this is a huge problem. One thing that can be done is that one could require data associated with a paper to be put into some location/repository that is NOT run by the authors, and has to be made available at the time of publishing. This is what is done with DNA sequence data (most of the time) with authors being required to deposit data in Genbank, EMBL, DDBJ or something similar. In fact most journals will not allow one to say “will be deposited” in these places, but require accession IDs. Perhaps for phylogenetics one could require accession IDs from morphobank, etc. In general, we need to do much better in making data available.
LikeLike
Yep. Couldn't agree more really. I've spent far too much of my time lately finding data, extracting it from pdfs and re-formatting it, rather than doing actual science.
So, the only further point I'd like to add is that data needs to not only be made available when published [the bare-minimum] but also to be made available in a useable, machine-readable/searchable, appropriate format.
'Human-readable' tables of data locked inside pdf's [which seem to be the standard atm for cladistic data in some journals; an atavism from days before the Internet age] only fulfill the bare-minimum requirements of availability – it's published but it's not useable without further re-formatting; at a needless expense of time, effort and possible introduction of error. This is my experience; a lot of published data is only 'pseudo-available' – it's technically there but barely useable.
Thus I think it's important to stress that free 'availability' is a great thing, but care and thought MUST also be taken with regard to the useability of the 'available-data'.
BioTorrents, Morphobank, Treebase, Genbank etc… might have imperfections but they make data available AND useable. Long may they continue 😀
LikeLike
Completely agreed Ross -usability is critical.
LikeLike
I found a 2008 paper that discusses how BiTorrent could be of use to share biological data in developing countries with low bandwidth:
http://bioinformatics.oxfordjournals.org/cgi/content/full/24/2/299
LikeLike
Another little web story about Biotorrents: http://www.computeach.co.uk/IT-news/IT-Computer-Technology-News/Cheap-option-for-file-transfers-launched/19733193
LikeLike
I was skeptical at the beginning, but I have been convinced that it is a very good idea after answering a question on biostar (http://biostar.stackexchange.com/questions/391/how-do-i-import-data-from-a-torrent-into-a-bioperl-r-bioclipse-or-taverna-appl)
The only drawback I see is that many databases update frequently, so they will need to maintain a torrent per each release.
LikeLike
I was skeptical at the beginning, but I have been convinced that it is a very good idea after answering a question on biostar (http://biostar.stackexchange.com/questions/391/how-do-i-import-data-from-a-torrent-into-a-bioperl-r-bioclipse-or-taverna-appl)
The only drawback I see is that many databases update frequently, so they will need to maintain a torrent per each release.
LikeLike