Tag Archives: openaccess

Really? Nature put the #HeLa genome paper behind a paywall? Time for Nature Publishing Group to return ALL money obtained from genome papers

This is just fucking ridiculous.  As I have written about many many times – Nature Publishing Group many years ago promised to make papers reporting genome sequence data freely available.  They do not generally live up to this promise well.  See for example

Today I discovered that not only are some important genome papers not freely available but one for the ages – the paper on the HeLa genome – reported with much fanfare recently as a triumph of an agreement with the family of Henrietta Lacks – is only available if you pay.

Once again I call on Nature Publishing Group to publicly disclose all financial gains that have come from people paying for the these genome papers and for the money to be returned.

How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

For many many years I have been raising a key questions in relation to open access publishing – how can we measure how open someone’s publications are.  Ideally we would have a way of measuring this in some sort of index.  A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.

When Aaron Swartz died I started drafting some ideas on this topic.  Here is what I wrote (in January 2013) but never posted:

With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution).  One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.

I have looked around the interwebs to see if there is some existing metric for this and I could not find one.  So I have decided to develop one – which I call the Swartz Openness Index (SOI).

Let A = # of objects being assessed (could be publications, data sets, software, or all of these together). 

Let B = # of objects that are released to the commons with a broad, open license. 

A simple (and simplistic) metric could be simply 

OI = B / A


This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.

A and B as above. 

Let C = # of objects available free of charge but not openly 

OI = ( B + (C/D) ) / A  

where D is the “penalty” for making material in C not openly available


This still seems not detailed enough.  A more detailed approach might be to weight diverse aspects of the openness of the objects.  Consider for example the “Open Access Spectrum.”  This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability.  And each of these is given different categories that assess the level of openness.  Seems like a useful parsing in ways.  Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND  license I cannot technically make derivatives of it.  So I will not.  Mostly because I am pissed at PLoS and SPARC for releasing something in this way.  Inane.

But I can make my own openness spectrum.

And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it’s use.  I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy.  Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill.  And it only just now came back to me. (Though I note – I did not find the Draft post I made until AFTER I wrote the rest of this post below … ).

To get some measure of openness in publications maybe a simple metric would be useful.  Something like the following

  • P = # of publications
  • A = # of fully open access papers
  • OI = Openness index

A simple OI would be

  • OI = 100 * A/P
However, one might want to account for relative levels of openness in this metric.  For example
  • AR = # of papers with a open but somewhat restricted license
  • F = # of papers that are freely available but not with an open license
  • C = some measure of how cheap the non freely available papers are
And so on.
Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).
From Pubmed one can pull out some simple data. 
  • # of publications (for a person or Institution)
  • # of those publications in PubMed Central (a measure of free availability)
Thus one could easily measure the “Pubmed Central” index as
PMCI = 100 * (# publications in PMC / # of publications in Pubmed)
Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.
            Name                        #s                 PMCI    
Eisen JA
224/269  
83.2
Eisen MB 
76/104
73.1
Collins FS
192/521
36.8
Lander ES
160/377
42.4
Lipman DJ
58/73
79.4
Nussinov R
170/462
36.7
Mardis E
127/187
67.9
Colwell RR
237/435
54.5
Varmus H
165/408
40.4
Brown PO
164/234
70.1
Darling AE
20/27
74.0
Coop G
23/39
59.0
Salzberg SL
107/162
61.7
Venter JC
53/237
22.4
Ward NL
24/58
41.4
Fraser CM
78/262
29.8
Quackenbush J
95/225
42.2
Ghedin E
47/82
57.3
Langille MG
10/14
71.4

And so on.  Obviously this is of limited value / accuracy in many ways.  Many papers are freely available but not in Pubmed Central.  Many papers are not covered by Pubmed or Pubmed Central.  Times change, so some measure of recent publications might be better than measuring all publications.  Author identification is challenging (until systems like ORCID get more use).  And so on.

Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC).  This can be useful for cases where material is not put into PMC for some reason.  And then with a similar search one can narrow this to just the last five years.  As openaccess has become more common maybe some people have shifted to it more and more over time (I have — so this search should give me a better index).

Lets call the % of publications with free full text somewhere the “Free Index” or FI.  Here are the values for the same authors.

Name
PMC 
%
Pudmed 
PMCI 
Free
%
Pubmed
5 years
FI – 5 
Free
%
Pubmed
All
FI-ALL
Eisen JA
224/269
83.2
178/180
98.9
237
88.1
Eisen MB 
76/104
73.1
32/34
94.1
83 79.8
Collins FS
192/521
36.8
104/128
81.3
263 50.5
Lander ES
160/377
42.4
78/104
75.0
200 53.1
Lipman DJ
58/73
79.4
20/22
90.9
59 80.8
Mardis E
127/187
67.9
90/115
78.3
135 72.2
Colwell RR
237/435
54.5
31/63
49.2
258 59.3
Varmus H
165/408
40.4
21/28
75.0
206 50.5
Brown PO
164/234
70.1
20/21
95.2
185 79.0
Darling AE
20/27
74.0
18/21
85.7
21 77.8
Coop G
23/39
59.0
16/20
80.0
28 71.8
Salzberg SL
107/162
61.7
54/58
93.1
128 79.0
Venter JC
53/237
22.4
20/33
60.6
85 35.9
Ward NL
24/58
41.4
18/27
66.6
30 51.7
Fraser CM
78/262
29.8
9/13
69.2
109 41.6
Quackenbush J
95/225
42.2
54/75
72.0
131 58.2
Ghedin E
47/82
57.3
30/36
83.3
56 68.3
Langille MG
10/14
71.4
11/13
84.6
11 78.6

Very happy to see that I score very well for the last five years. 180 papers in Pubmed.  178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements.  But most of my non genome report papers are also freely available.

I think in general it would be very useful to have measures of the degree of openness.  And such metrics should take into account sharing of other material like data, methods, etc.  In a way this could be a form of the altmetric calculations going on.

But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything.  (Although I do remember someone out there – maybe Carl Bergstrom – saying there were some metrics that might be relevant – but can’t figure out who / what this information in the back of my head is).

So I decided to do some searching anew.  And lo and behold there was something directly relevant. There is a paper in the Journal of Librarianship and Scholarly Communication called: The Accessibility Quotient: A New Measure of Open Access.  By Mathew A. Willmott, Katharine H. Dunn, and Ellen Finnie Duranceau from MIT.

Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025. http://dx.doi.org/10.7710/2162-3309.1025
Here is the abstract:

Abstract
INTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.
METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.
RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.
DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.
CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.

I completely love it.  After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics.  I hope more things like this come out and are readily available for anyone to calculate.  Just how open someone is could be yet another metric used to evaluate them …

And then I did a little more searching and found the following which also seem directly relevant

So – it is good to see various people working on such metrics.  And I hope there are more and more.
Anyway – I know this is a bit incomplete but I simply do not have time right now to turn this into a full study or paper and I wanted to get these ideas out there.  I hope someone finds them useful …

A good thing: More and more biology papers showing up in arXiv

Good to see some more papers in microbiology & genomics and related topics going to the preprint server arXiv.

If you are interested in population and evolutionary genetics a good place to keep up with papers on this topic in arXiv is Haldane’s Sieve.  The good folks there in essence make a separate post about each paper of interest and then people can comment there on the papers, since the commenting functions at arXiv are, well, challenged.

In areas related to this blog, here are some recent papers in arXiv:

Am hoping more and more biologists start depositing papers in arXiv.  My brother has started doing it for all papers in his lab so I guess that means I should too.  And so should everyone else …

Email from Biomed Central pointing to ways to get #altmetrics for recent sFAMS paper

Just received from Biomed Central and thought some people might be interested in some of the ways they try to help you gather metrics about your papers.

Dear Prof Eisen,

We thought you might be interested to know how many people have read your article:

Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource
Thomas J Sharpton, Guillaume Jospin, Dongying Wu, Morgan GI Langille, Katherine S Pollard and Jonathan A Eisen
BMC Bioinformatics, 13:264   (13 Oct 2012)
http://www.biomedcentral.com/1471-2105/13/264

Continue reading Email from Biomed Central pointing to ways to get #altmetrics for recent sFAMS paper

Ten simple ways to share PDFs of your papers #PDFtribute

There is a spreading surge of PDF sharing going on in relation to a tribute to Aaron Swartz who died a few days ago.  For more on Aaron and tributes to him see the collection I am making here: The Tree of Life: RIP: Aaron Swartz.  For more on the PDF sharing see this CNET story for example: Researchers honor Swartz’s memory with PDF protest and http://pdftribute.net.

I should say, sharing your PDFs is not necessarily clearly not enough (the license on the PDF may affect what people can do with them if they feel constrained to follow the law).  It is also critical to think about the level of openness of a paper, but I will save most of the comments on that for another time. What I wanted to do here is point out various ways to share PDFs for people who don’t know how …

UPDATE 1/14: See follow up post 10 things you can do to REALLY support #OpenAccess #PDFTribute

Ten simple ways to share PDFs of your papers.

1. Publish your paper in a fully #openaccess journal (so called GOLD OpenAccess).

Such journals immediately post your paper online for all to see and frequently also post your paper in various formats to repositories like Pubmed Central.  For a list of such journals see the “Directory of Open Access Journals“.  In my opinion, this is the best, and, well, really only viable long term option.  This is what I do for papers from my lab.

2. Publish your paper in a non #openaccess journal that has the option of selecting / paying for #openaccess on a case by case basis. 

Many journals that are not fully #openaccess have the option of paying extra to have your paper be published in an #openaccess manner and then the journal handles not only posting the paper on their site but also frequently depositing in a repository of their or your choosing.  UPDATE: Note – in many cases the licenses used by journals for such one-off “open” publishing are not fully open, despite what some of the journals claim so proceed with caution (see PLOS Biology: Why Full Open Access Matters for example).

3. Publish in a non #openaccess journal that releases papers to a repository after a delay.

Many journals put papers behind a paywall initially but then “free”them up in some way after a set period of delay.  For example a large number in biomedicine will deposit papers to Pubmed Central and also make them freely available on their website after 6 months.  Frequently as with #2 above, the licenses associated with such release of papers are not fully open, but this is a way to have your papers be at least accessible to others after a period of time.

4. Deposit your paper in a preprint server before you submit it for publication.  

For more on preprint servers see

Examples of commonly used preprint servers include

5. Self-archive your PDF in a repository (so called GREEEN OpenAccess).

Various repositories out there exist for posting ones papers.  They work in essence like a preprint server though some people use them more for posting papers after they have been published so I am listing them separately here.  More detail on self-archiving can be found here.  A good source of information about repositories is the Registry of Open Access repositories.  Also the Directory of Open Access repositories.  Another good source is SPARC. Also see here.

One repository commonly used in biomedicine in Pubmed Central.  Alas one is only allowed to post papers there by oneself if the work in the paper was funded by an NIH grant.

Another approach is to use arXiv as a repository where you can post things even after they are published.

Another growing venue for self-archiving is an institutional repository.  As many universities expand their commitment to open access or access university repositories are becoming a source of more and more publications.  Check to see if your institution has a repository and use it.

UPDATE: Note, just depositing your paper in a repository or preprint server does not necessarily mean your paper is open access.  Look in detail at the license and copyright policies of the archives you are considering before using them.

6. Self post your PDFs to a website you control.

If you do not have a personal website and/or do not know how to post a paper to your website, well, you should learn more about this.  A few simple ways to quickly post a PDF for others to get access to include

Create a new blog / website with a system that allows posting PDFs.  There are many many options for this.  One is Posterous.  Another is WordPress.Com.  There are certainly a million other ways.  Upload a PDF to Google Docs and then share the Google Doc link.  Post to Dropbox and share the link there.  Etc. etc. etc.  I ended up using WordPress.Com to create my lab page and to post all my PDFs.

7. Post your PDFs to an online reference collection.

Many systems now exist for collecting and collating and sharing reference collections online.  They include CiteULike, Zotero, and Mendeley.  I particularly like Mendeley right now in part because it makes it very easy to share PDFs privately or publicly.  I for example have posted all my own papers on Mendeley as well as papers of my father’s (for more on this see The Tree of Life: Freeing My Father’s Publications and Free Science, One Paper at a Time | Wired Science | Wired.com).

8. Create an academic profile page and post PDFs there.

Many systems now exist for creating a personal Academic profile of sorts.  One example is Academia.Edu. I have created a page here  Jonathan Eisen | University of California, Davis – Academia.edu although I confess I have not been updating it much.

9. Post to Slideshare.

Though many people end up only posting slideshows to Slideshare, and I use it for that purpose, I have posted many of my papers there as well. See for example:

10. Post to “Data” archives.

There is a large growing collection of places to post “Data” to share it with others.  Some of these sites also allow posting of papers.  For example, I have posted multiple papers to Figshare, a great data sharing site that can be used to post and share just about anything. I have also used Figshare for this (for example – here is my PhD thesis there).

11. Ask a Librarian. (Yes it goes to 11)

Probably the best way to figure out how to better share your PDFs if the options above don’t work for you (or even if they do) is to talk to a librarian.  They are the most knowledgable people in regard to methods and systems and other issues for sharing academic work.


Some related posts from The Tree of Life



Other ideas? Please post in comments …


RIP: Aaron Swartz (collection of news stories, articles, etc)

Aaron Swartz from the AWL

Compiling links to stories, posts, information about Aaron Swartz and his untimely death. RIP Aaron.

About Aaron

News and Posts about his death
More from 1/14
More from 1/17-22

Storifies about Aaron Swartz

PDF upload tribute

Draft of a Proposal for a UC #OpenAccess policy – comments wanted

Just got sent this email

Dear Colleagues, 

On behalf of the Academic Senate Library Committee (ASLC), I am asking for your comments on the attached proposed Open Access Publishing Policy for the University of California.. All faculty, including Academic Federation members are invited to post their comments on the Academic Senate web-forum site at http://academicsenate.ucdavis.edu/Forums/index.cfm?Forum_ID=67. Please go to this site to submit your feedback. 

Briefly, the issue is this: the faculty of the University of California, in conjunction with the University Committee on Libraries and Scholarly Communication (UCOLASC), is proposing a new OPEN ACCESS PUBLISHING POLICY that will apply to the dissemination of all scholarly work. UCOLASC is seeking feedback from all campuses on this issue in order to inform a final version of the policy which will be presented to the Universitywide Academic Senate sometime this calendar year. 

The ASLC would appreciate your comments by Wednesday, May 9, 2012. Your ideas will then be shared with UCOLASC in time for its May 25th meeting. The web-forum will remain open substantially past May 9, and we will endeavor to include as many comments up to May 25 as possible. 

Sincerely, 

Brian H. Kolner 

Academic Senate Library Committee

The relate to a draft of a proposal for a new Open Access Publishing Policy being circulated at the University of California. The draft of the proposal can be found here.

UC Davis (and I presume other UCs) are now soliciting comments on the proposal. I would love to here / read comments from anyone. Personally, I think the policy is way to weak as it allows exceptions to be granted …

Calling on Nature Publishing Group to return all money received for genome papers and article corrections

Well, let’s see if Nature Publishing Group actually does the right thing here.  A few days ago I showed that they were charging for access to “genome sequencing” papers that were supposed to be freely available (see Hey Nature Publishing Group – When are you going to live up to your promises about “free” genome papers? #opengate #aaaaaarrgh).  And in researching this I then discovered that Nature Publishing Group has been charging for access to corrections of articles (see Nature’s access absurdity: Human Genome Paper free but access to corrections will costs $64 and Corrections Scamming at Nature: Tantalizing clues, to see errors just pay more money #Seriously?).  

Multiple people from NPG have posted on my blog and twitter that they are working on “fixing” these issues.  By which I think they mean “We will make these freely available again.”  But this is not a full fix.  NPG really needs to do a self audit and return ALL money that anyone has paid for access to these articles.  Charging for something that is supposed to be free is not a good thing … and if they want to really fix the issue they need to give any money they got for these papers back.  Note – I already called for them to do this last year when I wrote about the genome papers not being free.  But I never heard back.  Please help put the pressure on them to do the right thing this time.