How Open Are You? Part 1: Metrics to Measure Openness and Free Availability of Publications

For many many years I have been raising a key questions in relation to open access publishing – how can we measure how open someone’s publications are.  Ideally we would have a way of measuring this in some sort of index.  A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.

When Aaron Swartz died I started drafting some ideas on this topic.  Here is what I wrote (in January 2013) but never posted:

With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution).  One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.

I have looked around the interwebs to see if there is some existing metric for this and I could not find one.  So I have decided to develop one – which I call the Swartz Openness Index (SOI).

Let A = # of objects being assessed (could be publications, data sets, software, or all of these together). 

Let B = # of objects that are released to the commons with a broad, open license. 

A simple (and simplistic) metric could be simply 

OI = B / A


This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.

A and B as above. 

Let C = # of objects available free of charge but not openly 

OI = ( B + (C/D) ) / A  

where D is the “penalty” for making material in C not openly available


This still seems not detailed enough.  A more detailed approach might be to weight diverse aspects of the openness of the objects.  Consider for example the “Open Access Spectrum.”  This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability.  And each of these is given different categories that assess the level of openness.  Seems like a useful parsing in ways.  Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND  license I cannot technically make derivatives of it.  So I will not.  Mostly because I am pissed at PLoS and SPARC for releasing something in this way.  Inane.

But I can make my own openness spectrum.

And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it’s use.  I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy.  Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill.  And it only just now came back to me. (Though I note – I did not find the Draft post I made until AFTER I wrote the rest of this post below … ).

To get some measure of openness in publications maybe a simple metric would be useful.  Something like the following

  • P = # of publications
  • A = # of fully open access papers
  • OI = Openness index

A simple OI would be

  • OI = 100 * A/P
However, one might want to account for relative levels of openness in this metric.  For example
  • AR = # of papers with a open but somewhat restricted license
  • F = # of papers that are freely available but not with an open license
  • C = some measure of how cheap the non freely available papers are
And so on.
Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).
From Pubmed one can pull out some simple data. 
  • # of publications (for a person or Institution)
  • # of those publications in PubMed Central (a measure of free availability)
Thus one could easily measure the “Pubmed Central” index as
PMCI = 100 * (# publications in PMC / # of publications in Pubmed)
Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.
            Name                        #s                 PMCI    
Eisen JA
224/269  
83.2
Eisen MB 
76/104
73.1
Collins FS
192/521
36.8
Lander ES
160/377
42.4
Lipman DJ
58/73
79.4
Nussinov R
170/462
36.7
Mardis E
127/187
67.9
Colwell RR
237/435
54.5
Varmus H
165/408
40.4
Brown PO
164/234
70.1
Darling AE
20/27
74.0
Coop G
23/39
59.0
Salzberg SL
107/162
61.7
Venter JC
53/237
22.4
Ward NL
24/58
41.4
Fraser CM
78/262
29.8
Quackenbush J
95/225
42.2
Ghedin E
47/82
57.3
Langille MG
10/14
71.4

And so on.  Obviously this is of limited value / accuracy in many ways.  Many papers are freely available but not in Pubmed Central.  Many papers are not covered by Pubmed or Pubmed Central.  Times change, so some measure of recent publications might be better than measuring all publications.  Author identification is challenging (until systems like ORCID get more use).  And so on.

Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC).  This can be useful for cases where material is not put into PMC for some reason.  And then with a similar search one can narrow this to just the last five years.  As openaccess has become more common maybe some people have shifted to it more and more over time (I have — so this search should give me a better index).

Lets call the % of publications with free full text somewhere the “Free Index” or FI.  Here are the values for the same authors.

Name
PMC 
%
Pudmed 
PMCI 
Free
%
Pubmed
5 years
FI – 5 
Free
%
Pubmed
All
FI-ALL
Eisen JA
224/269
83.2
178/180
98.9
237
88.1
Eisen MB 
76/104
73.1
32/34
94.1
83 79.8
Collins FS
192/521
36.8
104/128
81.3
263 50.5
Lander ES
160/377
42.4
78/104
75.0
200 53.1
Lipman DJ
58/73
79.4
20/22
90.9
59 80.8
Mardis E
127/187
67.9
90/115
78.3
135 72.2
Colwell RR
237/435
54.5
31/63
49.2
258 59.3
Varmus H
165/408
40.4
21/28
75.0
206 50.5
Brown PO
164/234
70.1
20/21
95.2
185 79.0
Darling AE
20/27
74.0
18/21
85.7
21 77.8
Coop G
23/39
59.0
16/20
80.0
28 71.8
Salzberg SL
107/162
61.7
54/58
93.1
128 79.0
Venter JC
53/237
22.4
20/33
60.6
85 35.9
Ward NL
24/58
41.4
18/27
66.6
30 51.7
Fraser CM
78/262
29.8
9/13
69.2
109 41.6
Quackenbush J
95/225
42.2
54/75
72.0
131 58.2
Ghedin E
47/82
57.3
30/36
83.3
56 68.3
Langille MG
10/14
71.4
11/13
84.6
11 78.6

Very happy to see that I score very well for the last five years. 180 papers in Pubmed.  178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements.  But most of my non genome report papers are also freely available.

I think in general it would be very useful to have measures of the degree of openness.  And such metrics should take into account sharing of other material like data, methods, etc.  In a way this could be a form of the altmetric calculations going on.

But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything.  (Although I do remember someone out there – maybe Carl Bergstrom – saying there were some metrics that might be relevant – but can’t figure out who / what this information in the back of my head is).

So I decided to do some searching anew.  And lo and behold there was something directly relevant. There is a paper in the Journal of Librarianship and Scholarly Communication called: The Accessibility Quotient: A New Measure of Open Access.  By Mathew A. Willmott, Katharine H. Dunn, and Ellen Finnie Duranceau from MIT.

Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025. http://dx.doi.org/10.7710/2162-3309.1025
Here is the abstract:

Abstract
INTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.
METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.
RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.
DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.
CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.

I completely love it.  After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics.  I hope more things like this come out and are readily available for anyone to calculate.  Just how open someone is could be yet another metric used to evaluate them …

And then I did a little more searching and found the following which also seem directly relevant

So – it is good to see various people working on such metrics.  And I hope there are more and more.
Anyway – I know this is a bit incomplete but I simply do not have time right now to turn this into a full study or paper and I wanted to get these ideas out there.  I hope someone finds them useful …

ADVANCE Journal Club: Developing Graduate Students of Color for the Professoriate in STEM

As I have posted about before – I am involved in the UC Davis ADVANCE project funded by NSF.  From the project website:

UC Davis ADVANCE is a newly funded Institutional Transformation grant that began in September of 2012. Our program is supported by the National Science Foundation’s ADVANCE Program which aims to increase the participation and advancement of women in academic science and engineering careers. 

My role in this project is as a member (and now Co-Chair) of one of the “Policies and Practices Review Initiative” Committee.  As part of my work on this committee I am trying to read various papers on related topics.  And I figured I would simultaneously post about these papers as much as I can because it would be great to get a broader discussion going on these topics.

So today I am reading the following:CSHE – Developing Graduate Students of Color for the Professoriate in Science, Technology, Engineering, and Mathematics (STEM) which I was pointed to in our Committee meeting yesterday.  It is quite interesting.  It is by Anne MacLachlan from the Center for Studies in Higher Education at UC Berkeley.

The abstract:

This paper presents part of the results of a completed study entitled A Longitudinal Study of Minority Ph.D.s from 1980-1990: Progress and Outcomes in Science and Engineering at the University of California during Graduate School and Professional Life. It focuses particularly on the graduate school experience and degree of preparation for the professoriate of African American doctoral students in the sciences and engineering, and presents the results of a survey of 33 African American STEM Ph.D.s from the University of California earned between 1980-1990. Relationships with thesis advisors and principal investigators are evaluated by the study participants in fifteen specific areas from highly-ranked intellectual development to low-ranked training in grant writing. Deficits in training and socialization are discussed along with the tension between being both an African American and a graduate student. Career choices and outcomes are presented. These findings, in conjunction with current analyses of graduate education in STEM, suggest ways in which graduate training for all could be improved.

Lots of interesting information in there.  Perhaps most important for my current goals is what she describes at the end in terms of a Proposed Development Program.  She starts this section by commenting on the general situation in terms of training scientists in the US today.  She then identifies what she refers to a “discontinuities” in federal and local policy which can hinder “developing faculty of color.”  These include “compartmentalized, externally mandated sets of programs” and the “nature of Ph.D. training”.  Of the 33 Ph.D.s surveyed in the study, nearly all of them recommended diversity training for faculty.  They also recommend better laying out of expectations and requirements for students and more involvement of current faculty in recruiting.  They also made many recommendations for improving the life of current students of color.

Anyway – a lot of this material and the concepts involved are bit new to me so I am still digesting the article.  But I thought I would share it with others in the hope that this will help catalyze more open discussion of issues involved women and underrepresented minorities in STEM fields.

Final Presentation of Aquarium Project

Today was the final presentation by the students working on the aquarium research project. Alex, Lakshmi, Sabreen, Andrew, and Kevin all talked about their recent work using QIIME to analyze the 16S sequence data from the project.

Their presentations, along with my introduction can be found here.

Now we need to take some time and really dig through the data to see which samples we should take out of the freezer and analyze in order to create the most useful dataset to look at community succession in these coral ponds.

Important read for those interested in gender, family & academia: Do Babies Matter

Just got pointed to this by Julie Huber on Facebook: New book on gender, family and academe shows how kids affect careers in higher education | Inside Higher Ed.  The book is “Do Babies Matter? Gender and Family in the Ivory Tower.”  This looks like a very important book and is especially relevant to me in my role in the UC Davis ADVANCE project where we are working on related issues.  It is from Mary Ann Mason at Berkeley Law School, Nicholas Wolfinger from Utah, and Marc Goulden from the UC Berkeley Office for Faculty Equity and Welfare.  It is definitely worth checking out.

I am ordering it right now …

//ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&OneJS=1&Operation=GetAdHtml&MarketPlace=US&source=ac&ref=tf_til&ad_type=product_link&tracking_id=thtrofli-20&marketplace=amazon&region=US&placement=0813560802&asins=0813560802&linkId=75TIN64EXB6DWWOT&show_border=true&link_opens_in_new_window=true

Tweets from Nancy Moran’s talk at #UCDavis on "Two sides of symbiosis" storified

I went to a talk yesterday by Nancy Moran at UC Davis.  Nancy is one of my science heroes.  I have worked on a few projects with her and am just a big fan of her body of work on symbioses.  I have written about her work her on this blog many times before including

Anyway – I live tweeted her talk and then tried to “Storify” those tweets but Storify was not working well.  Thankfully  Surya Saha made a storify which I then edited (with his permission).

//platform.twitter.com/widgets.js

#UCDavis Becomes Smoke and Tobacco Free January 1, 2014

Just got this email and thought it would be of interest to some out there:

I am highly skeptical of the CHORUS system proposed by scientific publishers as an end run around PubMed Central

Just read this news story … Scientific Publishers Offer Solution to White House’s Public Access Mandate – ScienceInsider

It reports on an effort by various scientific publishers to create something they call “CHORUS” which stands for “Clearinghouse for the Open Research of the United States.” They claim this will be used to meet the guidelines issued by the White House OSTP for making papers for which the work was supported by federal grants available for free within 12 months of being published.

This appears to be an attempt to kill databases like Pubmed Central which is where such freely available publications now are archived.  I am very skeptical of the claims made by publishers that papers that are supposed to be freely available will in fact be made freely available on their own websites.  Why you may ask am I skeptical of this?  I suggest you read my prior posts on how Nature Publishing Group continuously failed to fulfill their promises to make genome papers freely available on their website.

See for example:

We need to make sure such papers are freely available permanently and the only way to do this is via making them available outside of the publishers own sites.  Pubmed Central seems to be a good solution for this.  I would be happy to hear other possible solutions – but leaving “free” papers under the control of the publishers is a bad idea.

UPDATE 6/27/2013

Saw this Tweet

//platform.twitter.com/widgets.js Seemed potentially really interesting. Read the story and got pointed to a new Nature paper on the ancient horse genome. I guess not so surprisingly, despite the fact that they report a new genome sequence, it is not openly available. We really cannot trust Nature on this can we? They could say “Well, this is a draft genome, and we did not mean to apply our policy to draft genomes.” Well, that would be weird since, well, they have applied this to draft genomes before. And then I decided to search for other examples … and in about ten minutes I found a few. See

//platform.twitter.com/widgets.js

//platform.twitter.com/widgets.js

Learning How to Work Qiime

So we’ve done our data collection, we’ve done more PCR than we could ever imagine, and we finally got our sequences. Now what? We analyze our data using Qiime, a software that will help us see connections between our microbe data. Qiime is useful since it can handle the large amounts of data we’re throwing at it, something most other programs would crash just thinking about. As someone with average knowledge of computers, it’s entirely intimidating learning something based on programming from scratch, but it has also been a great learning experience. Now off to play with terminal and find some meaning in these strings of letters…

Re-reading this on "Why women leave academia and why universities should be worried"

Been reading some somewhat old material out there on women in academia.  I am getting more and more interested in this issue especially as I have become more involved in the UC Davis ADVANCE Program.  The ADVANCE program from the National Science Foundation “aims to increase the participation and advancement of women in academic science and engineering careers.”

I was pointed to this Guardian article from 2012 today based on “The chemistry PhD: the impact on women’s retention”: Why women leave academia and why universities should be worried | Higher Education Network | Guardian Professional.   This Guardian article has a lot of detail and links to other information.  Definitely worth checking out if you had not seen it or forgotten it.

Twisted Tree of Life Award #16: Nature & Authors doing taxonomic alchemy converting an archaeon to a bacterium

Well, this is one of the bigger screw ups in terms of evolution I have seen at a major journal in a while.  See the following paper in Nature: The catalytic mechanism for aerobic formation of methane by bacteria : Nature. The paper discusses some functions of “the ocean-dwelling bacterium Nitrosopumilus maritimus.” Some of what is reported in the paper is perhaps interesting (alas I do not have access).  But painfully, there is one big big big big mistake – you see Nitrosopumilus maritimus is not a bacterium.  It is an archaeon (see for example this paper on its genome).


I got pointed to this by Uri Gophna (in an email and in a comment on my blog)(all see this on Twitter)  Sure – some people debate the structure of the tree of life.  But I am pretty certain the authors here  (Siddhesh S. Kamat, Howard J. Williams, Lawrence J. Dangott, Mrinmoy Chakrabarti & Frank M. Raushel) are not trying to make a statement about monophyly of bacteria or just what archaea are.  They just made what seems to be a colossal screw up.  And Nature not only let them, but added to it with things like their “Editors Summary”:

Novel bacterial biosynthesis of methane
Aerobic marine organisms produce significant quantities of the potent greenhouse gas methane, much of it via the cleavage of the highly unreactive carbon–phosphorus bonds of alkylphosphonates. In this study the authors explore the mechanism of PhnJ, an unusual radical S-adenosyl-L-methionine (SAM) enzyme that appears to use a cysteine-based thiyl radical to help catalyse the conversion of the alkylphosphonate substrate to methane and ribose-1,2-cyclic phosphate-5-phosphate. This reaction, not previously encountered in biological chemistry, establishes a novel mechanism for cleaving carbon–phosphorus bonds to form methane and phosphate via a covalent thiophosphate intermediate.

And for this taxonomic alchemy (converting an archaeon to a bacterium) I am awarding them and Nature my coveted “Twisted Tree of Life Award #16″.

UPDATE 5/28 7AM

I love the ad that came up while I was writing this post and searching for some information.  I think Nature could use the services from this ad: